USGS - Water
Introduction
The United States Geological Survey (USGS) has collected water-resources data at approximately 1.5 million sites in all 50 States, the District of Columbia, Puerto Rico, the Virgin Islands, Guam, American Samoa and the Commonwealth of the Northern Mariana Islands. The USGS investigates the occurrence, quantity, quality, distribution, and movement of surface and underground waters and disseminates the data to the public, State and local governments, public and private utilities, and other Federal agencies involved with managing our water resources.Water-quality data are available for both surface water and groundwater. Examples of water-quality data collected are temperature, specific conductance, pH, nutrients, pesticides, and volatile organic compounds.
Source: USGS Water Data
Tags: Climate and Environment, Water, Time-series, Risk, Daily
Modules
Scrapping:
Below is the API endpoints and the parameters that we need to pass to get the data.
https://waterservices.usgs.gov/nwis/dv/?format=json&stateCd={state}&startDT={start_date}&endDT={end_date}&siteStatus=all
Geocoder:
Coordinates are added to the metadata for the country. Region and region code are also appended. Geocoder library is used for getting coordinates. We also have a separate JSON file for country’s coordinates to avoid calling third party library to make geocoding process more efficient and faster.
Standardization:
Additional information like sample frequency, units, source and description are included in the metadata. Function for fetching ISO country code and appending it is present in standardization. Predefined domain and subdomain are added in this step.
Cleaning:
Duplicate and additional columns are removed from the data. Location names are rectified and country names are formatted correctly.
Metadata
Metadata Attributes
Attributes | Descriptions |
---|---|
timestamp | standard timestamp used for the timeseries , tsunami was observed |
map_coordinates | Latitude and Longitude of the station location (geojson format |
country | The country where the tsunami effects were observed. |
country_code | ISO 3-letter country code |
domain | Predefined domain by Taiyo. |
name | name of the data |
region | region for a country according to World Bank Standards} |
region_code | region code for a region according to World Bank Standards. |
sample_frequency | frequency in which data gets updated on the source |
sub_domain | Predefined subdomain by Taiyo. |
time_of_sampling | time of data collection |
date_of_sampling | date of data collection |
timezone | Timezone for the time and date |
units | Type of value stored in timeseries |
measurement_type | type of measure (min, max, median) |
url | url for the each of the datasets. |
agency_code | |
site_no | unique site number |
site_name | name of the site |
variable_description | description of the timseries value |
description | description of the dataset |
variable_name | name of the variable for timseries |
hydrogic_unit_code | |
county_code | country fips code for United States |
site_type_code | tyep of site code |
measurement_type | |
sub_division_code | ISO 3166-2 code of subdivision strictly followed by ISO |
value | value of timeseries |
qualifiers | USGS unique code: (e) Value has been edited or estimated by USGS personnel and is write protected; (&)Value was computed from affected unit values (E) Value was computed from estimated unit values. (A) Approved for publication -- Processing and review completed. (P) Provisional data subject to revision. (<) The value is known to be less than reported value and is write protected. (>) The value is known to be greater than reported value and is write protected. (1) Value is write protected without any remark code to be printed (2)Remark is write protected without any remark code to be printed No remark (blank) |
income_level | It defines which economic income group country belongs to. |
sub_division_name | Name of the subdivision. |
sub_division_level | this includes the subdivision level (e.g. states, Union Territory, Province, Economic Region etc.) |
county | County/District Name |
Data Flow
The above data pipeline runs on Argo and it will be executed on a periodic frequency.
DAGs:
- USGS-Water: Total No of DAGs file is 1
Taiyo Data Format
Entity | USGS-Water |
---|---|
Frequency | Daily |
Updated On | 01-06-2022 UTC 12:14:16 PM |
- | - |
Coverage | covering all the states in USA |
Uncertainties | - |
## Scope for Improvement |
Following can be improved in the next version of the data product:
- In future we might want to improve it to only scrap the data that we don’t already have.
Useful Links
- https://waterdata.usgs.gov/nwis/rt
- https://help.waterdata.usgs.gov/codes-and-parameters/daily-value-qualification-code-dv_rmk_cd
- https://help.waterdata.usgs.gov/faq/about-the-usgs-water-data-for-the-nation-site