World Bank Projects
Introduction
World Bank, in the entire World Bank Group, is an international organization affiliated with the United Nations (UN) and designed to finance projects that enhance the economic development of member states. Headquartered in Washington, D.C., the bank is the largest source of financial assistance to developing countries. It also provides technical assistance and policy advice and supervises—on behalf of international creditors—the implementation of free-market reforms. Together with the International Monetary Fund (IMF) and the World Trade Organization, it plays a central role in overseeing economic policy, reforming public institutions in developing countries, and defining the global macroeconomic agenda. The World Bank offers developing nations low-interest loans, zero- to low-interest credits, and grants. These funds a wide range of initiatives in education, health, public administration, infrastructure, financial and private sector growth, agriculture, and environmental and natural resource management, among other things. Governments, other international institutions, commercial banks, export credit agencies, and private sector investors also contribute to some of their programmes. The World Bank provides low-interest loans, zero to low-interest credits, and grants to developing countries. These support a wide array of investments in such areas as education, health, public administration, infrastructure, financial and private sector development, agriculture, and environmental and natural resource management. Some of our projects are cofinanced with governments, other multilateral institutions, commercial banks, export credit agencies, and private sector investors.
Source: World Bank Projects
Tags: Multilateral, Government Announcement, Public Procurement, Infrastructure, Construction
Modules
Scrapping:
Scrapper uses the World Bank Projects page to get a list of all projects listed there. It also scrapes the geospatial data. Scrapper goes through all the pages for each project, collecting all the data and stores it in a single CSV. This CSV is stored in the bucket.
Cleaning:
Currency conversion, handling null values, removing duplicate entries, timestamp formatting, and sector & subsector cleaning is done.
Geocoding:
Relevant geocoding information like map coordinates, country and region codes, etc are extracted from the scraped geospatial data.
Standardization:
Additional information like sample frequency, units, source and description are included in the metadata. Predefined sectors and subsectors are added in this step.
MetaData:
Metadata contains timestamp(approval date) range, sectors, subsectors and country codes. And is also stored in the s3 bucket.
Ingest:
Metadata and project data are ingested into an Elastic Search cluster. Index created based on fields - sector, subsector, map coordinates.
Metadata
Metadata Attributes
Attributes | Descriptions |
---|---|
access_level | Access Level |
api_url | API Endpoint for JSON data |
approvalfy | Approval fiscal year |
aug_id | ID genereated for unique Identification of asset |
board_approval_month | Board Approval Month |
board_approval_year | Board Approval Year |
boardapprovaldate | Board Approval Date |
borrower | Borrower |
borrowername | Borrower Name |
budget | Budget |
closingdate | Closing Date |
cmt_usd_amt | Commitment Usd Amount |
completion_riskdo | Completion Riskdo |
country_code | Country Code in 3-ISO format |
country_name | Country Name |
countrycode | Country Code in 2-ISO format |
countryid | Country ID |
countryiddesc | Country ID description |
source | Data Source Abbreviation |
disbursement | Disbursement Amount |
esrc_env_risk_rate_name | Esrc Env Risk Rate Name |
esrc_ovrl_risk_rate | Esrc Ovrl Risk Rate |
evaluation_riskdo | Evaluation Riskdo |
fincr_usd_amt | Financier Usd Amount |
fincrname | Financier Name |
fiscalyear | Fiscal year |
geojson | Geojson |
goal | Goal |
grant_usd_amt | Grant Usd Amount |
ibrd_cmt_usd_amt | IBRD Commitment Usd Amt |
icrdate | ICR date |
ida_cmt_usd_amt | IDA Commitment Usd Amt |
identified_sector | Identified Sector |
identified_sector_subsector_tuple | Identified Sector Subsector Tuple |
identified_subsector | Identified Subsector |
impagency | Impagency |
implementingname | Implementing Agency Name |
keywords | Keywords |
laststatusdate | Last Status Date |
lendinginstr | Lending Instrument Abbreviation |
lendinginstrumenttypename | Lending instrument type name |
lendprojectcost | Lending Project Cost |
locations | Locations |
map_coordinates | Map Coordinates |
mjsector | Major Sector |
original_id | Original ID used by source |
overall_comments | Overall Comments |
overall_currentrating | Overall Current Rating |
overall_prevrating | Overall Prev Rating |
overall_templatename | Overall Template Name |
p2a_updated_date | P2A Updated Date |
parentprojid | Parent Project ID |
performance_comments | Performance Comments |
performance_currentrating | Performance Current Rating |
performance_prevrating | Performance Prev Rating |
performance_templatename | Performance Template Name |
prodlinetext | Product Line Text |
productlinetypename | Product Line Type Name |
proj_last_upd_date | Project Last Update Date |
project_abstract | Project Abstract |
project_development_objective | Project Development Objective |
project_or_tender | Project Or Tender |
projectcost | Project Cost |
projectfinancialtype | Project Financial Type |
region_code | Region Code |
region_name | Region Name |
regionabbr | Region abbreviation from source |
regionlongname | Region Longname |
regionname | Region Name |
sector | Sector |
sector1 | Sector1 |
sector1_name | Sector 1 Name |
sector1_percent | Sector 1 Percent |
sector2 | Sector 2 |
sector2_name | Sector 2 Name |
sector2_percent | Sector 2 Percent |
sector3 | Sector 3 |
sector3_name | Sector 3 Name |
sector3_percent | Sector 3 Percent |
sector4 | Sector 4 |
sector4_name | Sector 4 Name |
sector4_percent | Sector 4 Percent |
sector5 | Sector 5 |
sector5_name | Sector 5 Name |
sector5_percent | Sector 5 Percent |
WBPROJ_data_source | Source from which WB Projects has gathered the data |
status | Status |
statusdate | Status Date |
teamleadname | Team Lead Name |
teammemfullname | Team Members Full Name |
theme1 | Theme 1 |
theme1_name | Theme 1 Name |
theme1_percent | Theme 1 Percent |
theme2 | Theme 2 |
theme2_name | Theme 2 Name |
theme2_percent | Theme 2 Percent |
theme3 | Theme 3 |
theme3_name | Theme 3 Name |
theme3_percent | Theme 3 Percent |
theme4 | Theme 4 |
theme4_name | Theme 4 Name |
theme4_percent | Theme 4 Percent |
theme5 | Theme 5 |
theme5_name | Theme 5 Name |
theme5_percent | Theme 5 Percent |
theme_list | Theme List |
totalcommamt | Total Commitment Amount |
url | Url to official page |
|
Data Flow
The above data pipeline runs on Argo and it will be executed on a daily frequency (except Sunday).
DAGs:
- WorldBank: Total No of DAGs file is 1
Scope for Improvement
The following can be improved in the next version of the data product:
- In future, we might want to improve it only to scrap the data that we don’t already have.
Useful Links
- https://projects.worldbank.org/en/projects-operations/projects-home