Skip to content

Named Entity Recognition

  • Consolidated list of fields using NER, Topics from Tagging System
  • We also care about a specific list of entities if they are mentioned
    • Top 500 ENR Construction company list or Others -- Contracting Firm
    • Top 2000 Infrastructure Investors or Others -- Investor
    • Government Agencies
    • Status, Stages
    • Various geospatial signals
    • Sector, sub-sector
    • List other fields specifically and define
    • Technical specification (for example, area, or energy used MW, or data center capacity in TB/ZBs)

Models considered:

  1. Spacy: Using the en_core_web_trf pipeline, it is defined to be efficient and configurable while providing us with end-to-end workflows built on pre-trained models such as transformers.
  2. FlairNLP: A powerful NLP library. Flair allows you to apply our state-of-the-art natural language processing (NLP) models to your text, such as named entity recognition (NER), part-of-speech tagging (PoS), special support for biomedical data, sense disambiguation and classification, with support for a rapidly growing number of languages. This model is built using PyTorch
  3. Google’s NL API: These APIs use powerful pre-trained models with features including sentiment analysis, entity analysis, entity sentiment analysis, content classification, and syntax analysis.