Skip to content

EIA

Introduction

The Energy Information Administration (EIA) is the statistical agency of the Department of Energy. It provides policy-independent data, forecasts, and analyses to promote sound policymaking, efficient markets, and public understanding regarding energy, and its interaction with the economy and the environment.

  • Data Source Details

Data can be Fetch through API, Bulk download( manifest.txt has accessURL), spreadsheet Add-ons, etc. We have incorporated Bulk download in our Data Product. the bulk download has a manifest.txt file attached to it which holds information of all the category_id and associated links for a data source with a tag associated with it.

Source: EIA

Tags: Time-series, Risk, Daily

Modules

Scrapping:

The Scrapper module takes category\_id and frequency as input. If category_id matches with the predefined category then it starts downloading the bulk data for the specific category_id. frequency helps filter out the data from the bulk in accordance with the frequency[“Daily”, “Weekly”, “Monthly”, “Annually”, “Quarterly” ] required. once the data is filtered at the frequency level it’s converted into a pandas data frame and is passed to the cleaning module.

Cleaning:

The cleaning module takes care of the removal of duplicate or unwanted columns, renaming columns, and changing the DateTime format of the data. the clean CSV file is generated as the output.

Geocoder:

Coordinates are added to the metadata for the location. sub_division, sub_division_level, country appended. geopy library is used for getting coordinates.

Standardization:

The standardization module just reads the data given by the geocoder module. because in scraper, cleaning, geocoder module data output was generated keeping standard output format in mind

Ingest:

Metadata and time-series data is ingested in the MongoDB and latest timestamp id (MongoDB id for latest timestamp) is appended to metadata for decreasing search for the latest data point.

Data Format

Timeseries Attributes

Attributes Descriptions
ts_ref_id Id used to connect timeseries data to the metadata.
value Time Series information stored for specified category_id (e.g Petroleum, Natural gas, etc)
timestamp standard timestamp used for the timeseries

Metadata Attributes

Attributes Descriptions
original_id seres_id (a unique identifier) assigned by the EIA for different products in the same category.e.g original_id "PET.EER_EPD2DXL0_PF4_Y35NY_DPG.D"
name name of the data product . one category can have different data categories which can be uniquely identified by original_id
description a brief description of the data e.g description = "New York Harbor Ultra-Low Sulfur No 2 Diesel Spot Price"
sample_frequency sample_frequency indicates how often the data is updated. sample_frequency can be equal to any value from the list [“Daily” “Weekly” “Monthly” “Annually” “Quarterly”]e.g sample_frequency: "Daily"
source name of the site from where the data was extracted. e.g source: "EIA, U.S. Energy Information Administration
domain Domain of the data. e.g domain: "Energy"
sub_domain subdomain associated with parent domain e.g subdomain: "Petroleum"
tag tag associated with the data eg tag = “PET”
timestamp timestamp when the data point/data value was recorded e.g timestamp : "2022-04-04T00:00:00+0000"
value data point / data value on timestamp e.g value: 3.846
units units at which recorded value is measured. e.g units: "Dollars per Gallon"
aug_id An unique identifier for the time series data. It's defined by Taiyo.e.g aug_id = EIA_PET.WPLSTUS1.W
time_of_sampling timestamp at which the data was collected after running the script.e.g time_of_sampling: "05:51:05 AM"
date_of_sampling date at which the data was collected after running the script e.g date_of_sampling: "25/04/2022"
url Link where the data set was extracted. e.g url:"https://api.eia.gov/bulk/PET.zip"
last_timestamp last updated timestamp of the data. e.g last_updated: "2022-04-20T17:33:11+0000"
map_coordinates latitude and longitude of the country e.g map_cordinates: "{'type': 'Point', 'coordinates': [43.1561681, -75.8449946]}"
country country name associated with the map_coordinates eg country = USA
country_code code associate with the country name e.g "country_code": "USA"
region region associated with country name e.g region: North America
region_code region code for a region e.g region_code: USA
income_level income level associates to the region e.g income_level = High-Income Economies
sub_division_name sub_divison_name associated to country e.g sub_division_name : "New York"
sub_division_code code associated to sub_division_name e.g "sub_division_code": "USA-NY",
sub_division_level level of sub_division e.g sub_division_level "State"

Total count of data points last extracted: 21/04/2022