AISHub
Introduction
AIS tracking system has been the mariner’s most significant development in navigation safety since the introduction of the radar. AIS tracking system was originally developed as collision avoidance tool which enables commercial vessels to ‘see’ each other more clearly in any conditions and to improve the helmsman’s information about the surrounding environment. AIS does this by continuously transmitting vessels’ position, identity, speed and course, along with other relevant information, to all other AIS equipped vessels within range. Combined with a shore station, this system also offers port authorities and maritime safety bodies the ability to manage maritime traffic and reduce the hazards of marine navigation.
Aishub website has data of stations as well as vessels associated with a particular station. The data product that has been built gets its information from the station's page. For every station, there is a yearly data chart which is the required timeseries and metadata will be obtained from the station page itself.
Source: AISHub
Tags: Supply Chain, Logistics, Shipping traffic, Time-series, Risk, Daily
Modules
Scrapping:
Aishub scrapper gets the data of stations along with URL for the particular station. This data will be used as metadata for the timeseries data that will be collected. The collected URLs will be used in prepare step to fetch timeseries data. Data is scrapped using Beautifulsoup and the request library.
Cleaning:
Duplicate stations and additional columns ["Null", "contributor", "ships", "distinct"] are removed from the data. Location names are rectified and country names are formatted correctly.
Prepare:
Yearly data is fetched in the prepare step. Hence, timeseries is created along with the required timestamp format ("%Y-%m-%dT%H:%M:%S%z"). Indicators are added in the metadata.
Geocoder:
Coordinates are added to the metadata for the station location. Region and region code are also appended. Geocoder library is used for getting coordinates.
Standardization:
Additional information like sample frequency, units, source and description are included in the metadata. Function for fetching ISO country code and appending it is present in standardization. Predefined domain and subdomain are added in this step.
MetaData:
Timeseries reference id (ts_ref_id) is added to the timeseries data and the final timeseries is stored in the bucket. Metadata format is finalized and also stored in the s3 bucket.
Ingest:
Metadata and timeseries data is ingested in the mongoDB and latest timestamp id (mongoDB id for latest timestamp) is appended to metadata for decreasing search for latest data point.
Location Risk:
Three different files are available in the LocationRisk. Model.py is used for validating data in the mongoDB and risk_model.py is used to calculate risk score for the data. Risk for Aishub is calculated using z-score and z-score is classified into the risk categories. Pipeline.py fetches data from the mongoDB and implements model.py and risk_model.py. Risk data is ingested into the location risk database.
Metadata
Timeseries Data
Attributes | Descriptions |
---|---|
ts_ref_id | Id used to connect timeseries data to the metadata |
value | Timeseries information stored for Aishub (all ships and distinct ships) |
timestamp | standard timestamp used for the timeseries |
MetaData Attributes
Attributes | Descriptions |
---|---|
ts_ref_id | Id used to connect metadata to the timeseries |
coordinates | Latitude and Longitude of the station location (geojson format) |
country | country in which the Aishub station is located |
country_code | ISO 3-letter country code |
date_of_sampling | date on which data is collected |
description | detail on Aishub data which has been collected |
domain | Predefined domain by Taiyo |
Indicator | Two types of indicators are stored for Aishub |
all_ships | all the ships that are at the station |
unique_ships | all the unique type of ships at the station |
location | location of the station |
name | Name of the Station |
original_id | station id assigned by the Aishub |
region | region for a country according to World Bank Standards |
region_codev region code for a region according to World Bank Standards | |
sample_frequency | frequency in which data gets updated on the source |
source | website from which data is taken from |
state | state in which station is located |
status | latest status of the station |
sub_domain | Predefined subdomain by Taiyo |
time_of_sampling | time of data collection |
timezone | Timezone for the time and date |
units | Type of value stored in timeseries |
uptime | online % of the station |
url | url for the each of the station in the Aishub |
latest_timestamp_id | mongoDB id for latest timestamp in the timeseries |
Data Flow
The above data pipeline runs on Argo and it will be executed on a periodic frequency.
Taiyō Data Format
Entity | Aishub Stations |
---|---|
Frequency | Daily |
Updated On | 29-04-2022 UTC 11:50:00 AM |
Coverage | 700+ stations spread over 50+ countries |
Uncertainties | Calculated risk is not absolute but dependent on data collected & according to the standard formula used by Taiyo. It can be subjective depending on the business use case. |
Scope of Improvement
Following can be improved in the next version of the data product:
- Bulk data ingestion lacks logic of including previous data in the current dataset. Hence, MetaData and Ingest step needs to be updated for the Aishub data product.
- Data for the individual ships can be collected and stored in the database. It will help in the better visualization of the Aishub data.
Useful Links
- https://www.aishub.net/stations
- https://www.aishub.net/