Projects and Tenders
The infrastructure industry is the largest global economic sector but advanced data and AI methods are yet to be applied to help increase efficiency and social impact. Infrastructure is Industry of Industries but there is a commonality, whether be bridges, data centers, solar, power, or roads, there’s a common pain point: How to (a) source deals, (b) evaluate deals more completely and quickly.
Primary stakeholders are Engineering, Procurement, Construction (EPCs) and Government. Third parties include Suppliers, Infrastructure investors, Insurance, Consultants.Generally, opportunities in the infrastructure space are fragmented across the internet and largely a human-network-driven tacit knowledge opportunity scoping.
Unstructured data from individual government tenders sites, public-private partnership project opportunities, private projects, and news sites, or industry-specific (for example, airports, or hospitals) construction project sites. We built a flexible live data stream accounting for global standards related to the sector, sub-sector, project stage, and over 30 specific parameters. We use a hybrid approach to use language models to generate tags associated with important industry fields.
Problem: Massive data and knowledge gaps lead to heavy reliance on consultants and an opaque marketplace to find relevant partners. Incomplete data with lack of standards on opportunity (world’s largest marketplace public procurement $13T annually) and associated risks. We interviewed 100s of people from over 55 organizations.
EPCs: Upto 15% of EPC’s operating expense is spent on identifying and benchmarking opportunities. Still they are only getting a window to 30-40% of the available opportunity set.
Project Owners (Government): Initiators have very little info to get them up to speed or learn on the projects they are about to undertake.
The primary sources for opportunity set include:
- Projects, such as World Bank, Asian Development Bank, or industry sites such as InfraPPPWorld, Inframation, Airport-Technology.
- Official Government Public Procurement Tenders Websites
- News data
- Country PPP sites, for example, Canada P3 Spectrum or India Investment Grid, and India PPP site
The opportunity set data offers two key use-cases:
- Opportunity Searching: Search, filter, and discover new projects, and aggregated trending tenders
- Opportunity Benchmarking: Find similar projects that are recently released, early detection, closed (successfully or distressed).
country_name | country_code_2 | country_code | region |
---|---|---|---|
Identifier | |||
aug_id | ID generated for unique identification | This resembles {source}_{original_id} | |
original_id | ID originally provided by the data source | If this ID is not present it is to be generated from the asset name or title | |
project_or_tender | Asset type Identifier | P = Project, T = Tender | |
Basic Specs | |||
name | Name or Title of the asset | ||
description | Any descriptive information present about the asset | Can come from basic description or abstract or development objective. The source may use different synonyms for description | We may also keep the similar sounding fields as secondary fields where description if present becomes primary one |
source | Source Abbr. | ||
Status and Stages | |||
status | Status of the asset provided by data source | Has to be present or to be generated if not | |
identified_status | Status identified after mapping orginal statuses from the source data | ||
Budget/Estimated Cost/Asset valuation | |||
budget | The cost/investment or estimate cost/investment for a project or tender in USD | ||
Links and URLs | |||
url | Link to the microsite of the asset | ||
document_urls | Link(s) to the asset's e-document | Can have multiple links so a list can be maintained if applicable for the respective source | |
Sector/Subsector or Industry Type | |||
sector | Sector name originally present on source | list | Can be named industry, category, sector, product_category |
subsector | Subsector name originally present on source | list | Can be named sub_category, subsector, product_sub_category |
identified_sector | Sectors identified by rule based system | Mismapping is highly possible | |
identified_subsector | Subsectors identified by rule based system | Mismapping is highly possible | |
identified_sector_subsector_tuple | Sector and subsector pairs identified by on rule based system | Mismapping is highly possible | |
keywords | Important keywords identified from overall textual content present about the asset | Includes both tags for sector and subsector and some other technical keywords | |
entities | Important keywords identified from overall textual content present about the asset | More details please | Yet to be classified |
Location Information | |||
country_name | The short country name | Should not be the official name For eg: Republic of India (this should be converted to India) | |
country_code | ISO 3166-1 alpha-3 country codes | ||
region_name | According to standards followed by WorldBank | ||
region_code | 3 ISO format | 3 digit region codes followed by World Bank | |
state | State Name | Generated after geocoding.It may or may not get identified | If the location value has multiple places listed in it then only the first one may be recognized in geocoding. If the location is descriptive and covering a larger area or region, for eg. Chambal River Valley then such locations may not be identified or get mislabeled with the one with approximately matching name. In such cases we use the coordinates of the most country, state, county or whatever precise info we can look upto |
county | County/District Name | Generated after geocoding. It may or may not get identified | |
locality | Nearest locality | Generated after geocoding. It may or may not get identified | |
nieghbourhood | Generated after geocoding. It may or may not get identified | ||
location | The exact or approx. location | May or may not be given by the source | |
map_coordinates | Geographical Coordinates in decimal degree system | [ [ latitude, longitude ], ... ] or [] if None for standardized data as GeoJSON in Elasticsearch | If only map coordinates are given then these can reverse geocoded upto city level For more information have a look here |
Critical Dates | |||
timestamps | All the timestamps are in format YYYY-MM-DD HH:MM:SS | 'epublished_date': '2021-08-23 16:00:00', 'epublished_date_': '2021-08-23 16:00:00', 'tender_opening_date': '2021-09-15 11:00:00', 'bid_opening_date': '2021-09-15 11:00:00', 'bid_submission_start_date': '2021-09-07 11:00:00', 'bid_submission_closing_date': '2021-09-14 11:00:00', 'bid_submission_end_date': '2021-09-14 11:00:00', 'document_download_start_date': '2021-08-23 16:00:00', 'document_download_end_date': '2021-09-14 11:00:00' | Should be precise upto atleast YYYY-MM-DD. For edge cases in can only have year |
timestamp_range | Min and Max timestamp for a document | 'min': '2021-08-23 16:00:00', 'max': '2021-09-15 11:00:00' | Should be precise upto atleast YYYY-MM-DD. For edge cases in can only have year |