Named Entity Recognition
- Consolidated list of fields using NER, Topics from Tagging System
- We also care about a specific list of entities if they are mentioned
- Top 500 ENR Construction company list or Others -- Contracting Firm
- Top 2000 Infrastructure Investors or Others -- Investor
- Government Agencies
- Status, Stages
- Various geospatial signals
- Sector, sub-sector
- List other fields specifically and define
- Technical specification (for example, area, or energy used MW, or data center capacity in TB/ZBs)
Models considered:
- Spacy: Using the en_core_web_trf pipeline, it is defined to be efficient and configurable while providing us with end-to-end workflows built on pre-trained models such as transformers.
- FlairNLP: A powerful NLP library. Flair allows you to apply our state-of-the-art natural language processing (NLP) models to your text, such as named entity recognition (NER), part-of-speech tagging (PoS), special support for biomedical data, sense disambiguation and classification, with support for a rapidly growing number of languages. This model is built using PyTorch
- Google’s NL API: These APIs use powerful pre-trained models with features including sentiment analysis, entity analysis, entity sentiment analysis, content classification, and syntax analysis.