EIA

Introduction

The Energy Information Administration (EIA) is the statistical agency of the Department of Energy. It provides policy-independent data, forecasts, and analyses to promote sound policymaking, efficient markets, and public understanding regarding energy, and its interaction with the economy and the environment.

Data Source Details

Data can be Fetch through API, Bulk download( manifest.txt has accessURL), spreadsheet Add-ons, etc. We have incorporated Bulk download in our Data Product. the bulk download has a manifest.txt file attached to it which holds information of all the category_id and associated links for a data source with a tag associated with it.

Source: EIA

Tags: Time-series, Risk, Daily

Modules

Scrapping:

The Scrapper module takes category\_id and frequency as input. If category_id matches with the predefined category then it starts downloading the bulk data for the specific category_id. frequency helps filter out the data from the bulk in accordance with the frequency[“Daily”, “Weekly”, “Monthly”, “Annually”, “Quarterly” ] required. once the data is filtered at the frequency level it’s converted into a pandas data frame and is passed to the cleaning module.

Cleaning:

The cleaning module takes care of the removal of duplicate or unwanted columns, renaming columns, and changing the DateTime format of the data. the clean CSV file is generated as the output.

Geocoder:

Coordinates are added to the metadata for the location. sub_division, sub_division_level, country appended. geopy library is used for getting coordinates.

Standardization:

The standardization module just reads the data given by the geocoder module. because in scraper, cleaning, geocoder module data output was generated keeping standard output format in mind

Ingest:

Metadata and time-series data is ingested in the MongoDB and latest timestamp id (MongoDB id for latest timestamp) is appended to metadata for decreasing search for the latest data point.

Data Format

Timeseries Attributes

Attributes	Descriptions
ts_ref_id	Id used to connect timeseries data to the metadata.
value	Time Series information stored for specified category_id (e.g Petroleum, Natural gas, etc)
timestamp	standard timestamp used for the timeseries

Metadata Attributes

Attributes	Descriptions
original_id	seres_id (a unique identifier) assigned by the EIA for different products in the same category.e.g original_id "PET.EER_EPD2DXL0_PF4_Y35NY_DPG.D"
name	name of the data product . one category can have different data categories which can be uniquely identified by original_id
description	a brief description of the data e.g description = "New York Harbor Ultra-Low Sulfur No 2 Diesel Spot Price"
sample_frequency	sample_frequency indicates how often the data is updated. sample_frequency can be equal to any value from the list [“Daily” “Weekly” “Monthly” “Annually” “Quarterly”]e.g sample_frequency: "Daily"
source	name of the site from where the data was extracted. e.g source: "EIA, U.S. Energy Information Administration
domain	Domain of the data. e.g domain: "Energy"
sub_domain	subdomain associated with parent domain e.g subdomain: "Petroleum"
tag	tag associated with the data eg tag = “PET”
timestamp	timestamp when the data point/data value was recorded e.g timestamp : "2022-04-04T00:00:00+0000"
value	data point / data value on timestamp e.g value: 3.846
units	units at which recorded value is measured. e.g units: "Dollars per Gallon"
aug_id	An unique identifier for the time series data. It's defined by Taiyo.e.g aug_id = EIA_PET.WPLSTUS1.W
time_of_sampling	timestamp at which the data was collected after running the script.e.g time_of_sampling: "05:51:05 AM"
date_of_sampling	date at which the data was collected after running the script e.g date_of_sampling: "25/04/2022"
url	Link where the data set was extracted. e.g url:"https://api.eia.gov/bulk/PET.zip"
last_timestamp	last updated timestamp of the data. e.g last_updated: "2022-04-20T17:33:11+0000"
map_coordinates	latitude and longitude of the country e.g map_cordinates: "{'type': 'Point', 'coordinates': [43.1561681, -75.8449946]}"
country	country name associated with the map_coordinates eg country = USA
country_code	code associate with the country name e.g "country_code": "USA"
region	region associated with country name e.g region: North America
region_code	region code for a region e.g region_code: USA
income_level	income level associates to the region e.g income_level = High-Income Economies
sub_division_name	sub_divison_name associated to country e.g sub_division_name : "New York"
sub_division_code	code associated to sub_division_name e.g "sub_division_code": "USA-NY",
sub_division_level	level of sub_division e.g sub_division_level "State"

Useful Links

https://www.eia.gov/

Total count of data points last extracted: 21/04/2022