USGS: Earthquake
Introduction
The USGS monitors and reports on earthquakes, assesses earthquake impacts and hazards, and conducts targeted research on the causes and effects of earthquakes. The USGS undertake these activities as part of the larger National Earthquake Hazards Reduction Program (NEHRP), a four-agency partnership established by Congress.
Source: USGS Earthquake
Tags: Climate and Environment, Disasters, Earthquake, Risk, Event reocrds, Daily
Modules
Scrapping:
We are using a USGS API to fetch the data. We are using “updateafter” parameters to get the data for the events after this particular date. In this way, we are making sure we are not getting duplicate data.
Below is the API endpoints and the parameters that we can pass to get the data:
1) format - Specify the output format. 1) starttime - Limit to events on or after the specified start time 1) endtime - Limit to events on or before the specified end time 1) updateafter - Limit to events updated after the specified time
After getting the response we store them as timeseries value for each of the country in csv file.
Cleaning:
Duplicate and additional columns are removed from the data. Location names are rectified and country names are formatted correctly.
Geocoder:
Coordinates are added to the metadata for the country. Region and region code are also appended. Geocoder library is used for getting coordinates. We also have a separate JSON file for country’s coordinates to avoid calling third party library to make geocoding process more efficient and faster.
Standardization:
Additional information like sample frequency, units, source and description are included in the metadata. Function for fetching ISO country code and appending it is present in standardization. Predefined domain and subdomain are added in this step. We are creating one single metadata file that includes all the country and the respective keywords.
MetaData:
Timeseries reference id (ts_ref_id) is added to the timeseries data and final timeseries is stored in the bucket. Metadata format is finalized and also stored in the s3 bucket.
Ingest:
Metadata and timeseries data are ingested in the mongoDB and latest timestamp id (mongoDB id for latest timestamp) is appended to metadata for decreasing search for latest data point.
Metadata
GeoJson Data:
o_id: MongoDB unique document id
otype: “Features” (geojson standard attribute)
Attributes | Descriptions |
---|---|
url | url for the event for more details |
source | source of the dataset |
region_name | region for a country according to World Bank Standards. |
region_code | region code for a region according to World Bank Standards. |
country | country name of the data |
country_code | ISO 3-letter country code |
description | url for more detail of the event |
location | location of the place |
distance_from_city_km | distance of origin from the city |
original_id | original_id of the event from the Source |
magnitude | magnitude of the earthquake |
mag_type | type of magnitude |
depth | depth of origin |
types | Network that originally authored the reported magnitude for this event |
units | unit of the earthquake measure |
rms | The root-mean-square (RMS) |
dmin | Horizontal distance from the epicenter to the nearest station |
felt | The total number of felt reports submitted to the DYFIsystem |
other_ids | other id associated to this event |
usgs_source | source |
sample_frequency | frequency of the data |
timestamp | date and time of the event occurrence |
updated | date and time of the event update |
gap | The largest azimuthal gap between azimuthally adjacent stations |
significance | |
net | The ID of a data contributor. Identifies the network considered to be the preferred source of information for this event |
nst | Number of seismic stations which reported P- and S-arrival times |
timezone | Timezone for the time and date |
domain | Predefined domain by Taiyo. |
subdomain | Predefined subdomain by Taiyo. |
time_of_sampling | time of data collection |
date_of_sampling | date of data collection |
community_determined_intensity | |
modified_mercalli_intensity | |
alert | alert sent for the event |
status | Status is either automatic or reviewed |
tsunami | if the event resulted to a Tsunami (0 for no, 1 for yes) |
commmunity_determined_intensity_description | description of the cdid |
modified_mercalli_intensity_description | description of mmid |
dmin_description | description of dmin |
felt_description | description of felt |
rms_description | description of rms |
nst_description | description of nst |
net_description | description of net |
significance_description | description of significance |
type_description | description of type |
status_description | description of status |
gap_description | description of gap |
geometry: | |
type | “Point” |
coordinates | Latitude and Longitude of the event |
Data Flow
The above data pipeline runs on Argo and it will be executed on a periodic frequency.
DAGs:
- USGS-Earthquake: Total No of DAGs file is 1
Taiyō Data Format
Entity | USGS-Earthquake |
---|---|
Frequency | Daily/ Event Based |
Updated On | 29-04-2022 UTC 01:27:19 PM |
Coverage | Around the world |
Uncertainties | For some keywords, the older data might not be available. |
---|---|
Scope for Improvemen
Every time Argo Workflow run, we over write existing data on the S3 bucket. In future we might want to improve it to only scrap the data that we don’t already have.
Useful Links
- https://earthquake.usgs.gov/fdsnws/event/1/
- https://www.usgs.gov/programs/earthquake-hazards