EIA
Introduction
The Energy Information Administration (EIA) is the statistical agency of the Department of Energy. It provides policy-independent data, forecasts, and analyses to promote sound policymaking, efficient markets, and public understanding regarding energy, and its interaction with the economy and the environment.
- Data Source Details
Data can be Fetch through API, Bulk download( manifest.txt has accessURL), spreadsheet Add-ons, etc. We have incorporated Bulk download in our Data Product. the bulk download has a manifest.txt
file attached to it which holds information of all the category_id and associated links for a data source with a tag associated with it.
Source: EIA
Tags: Time-series, Risk, Daily
Modules
Scrapping:
The Scrapper module takes category\_id
and frequency
as input. If category_id matches with the predefined category then it starts downloading the bulk data for the specific category_id. frequency
helps filter out the data from the bulk in accordance with the frequency[“Daily”, “Weekly”, “Monthly”, “Annually”, “Quarterly” ] required. once the data is filtered at the frequency level it’s converted into a pandas data frame and is passed to the cleaning module.
Cleaning:
The cleaning module takes care of the removal of duplicate or unwanted columns, renaming columns, and changing the DateTime format of the data. the clean CSV file is generated as the output.
Geocoder:
Coordinates are added to the metadata for the location. sub_division, sub_division_level, country appended. geopy library is used for getting coordinates.
Standardization:
The standardization module just reads the data given by the geocoder module. because in scraper, cleaning, geocoder module data output was generated keeping standard output format in mind
Ingest:
Metadata and time-series data is ingested in the MongoDB and latest timestamp id (MongoDB id for latest timestamp) is appended to metadata for decreasing search for the latest data point.
Data Format
Timeseries Attributes
Attributes | Descriptions |
---|---|
ts_ref_id | Id used to connect timeseries data to the metadata. |
value | Time Series information stored for specified category_id (e.g Petroleum, Natural gas, etc) |
timestamp | standard timestamp used for the timeseries |
Metadata Attributes
Attributes | Descriptions |
---|---|
original_id | seres_id (a unique identifier) assigned by the EIA for different products in the same category.e.g original_id "PET.EER_EPD2DXL0_PF4_Y35NY_DPG.D" |
name | name of the data product . one category can have different data categories which can be uniquely identified by original_id |
description | a brief description of the data e.g description = "New York Harbor Ultra-Low Sulfur No 2 Diesel Spot Price" |
sample_frequency | sample_frequency indicates how often the data is updated. sample_frequency can be equal to any value from the list [“Daily” “Weekly” “Monthly” “Annually” “Quarterly”]e.g sample_frequency: "Daily" |
source | name of the site from where the data was extracted. e.g source: "EIA, U.S. Energy Information Administration |
domain | Domain of the data. e.g domain: "Energy" |
sub_domain | subdomain associated with parent domain e.g subdomain: "Petroleum" |
tag | tag associated with the data eg tag = “PET” |
timestamp | timestamp when the data point/data value was recorded e.g timestamp : "2022-04-04T00:00:00+0000" |
value | data point / data value on timestamp e.g value: 3.846 |
units | units at which recorded value is measured. e.g units: "Dollars per Gallon" |
aug_id | An unique identifier for the time series data. It's defined by Taiyo.e.g aug_id = EIA_PET.WPLSTUS1.W |
time_of_sampling | timestamp at which the data was collected after running the script.e.g time_of_sampling: "05:51:05 AM" |
date_of_sampling | date at which the data was collected after running the script e.g date_of_sampling: "25/04/2022" |
url | Link where the data set was extracted. e.g url:"https://api.eia.gov/bulk/PET.zip" |
last_timestamp | last updated timestamp of the data. e.g last_updated: "2022-04-20T17:33:11+0000" |
map_coordinates | latitude and longitude of the country e.g map_cordinates: "{'type': 'Point', 'coordinates': [43.1561681, -75.8449946]}" |
country | country name associated with the map_coordinates eg country = USA |
country_code | code associate with the country name e.g "country_code": "USA" |
region | region associated with country name e.g region: North America |
region_code | region code for a region e.g region_code: USA |
income_level | income level associates to the region e.g income_level = High-Income Economies |
sub_division_name | sub_divison_name associated to country e.g sub_division_name : "New York" |
sub_division_code | code associated to sub_division_name e.g "sub_division_code": "USA-NY", |
sub_division_level | level of sub_division e.g sub_division_level "State" |
Useful Links
Total count of data points last extracted: 21/04/2022