Skip to content

alt

AISHub

Introduction

AIS tracking system has been the mariner’s most significant development in navigation safety since the introduction of the radar. AIS tracking system was originally developed as collision avoidance tool which enables commercial vessels to ‘see’ each other more clearly in any conditions and to improve the helmsman’s information about the surrounding environment. AIS does this by continuously transmitting vessels’ position, identity, speed and course, along with other relevant information, to all other AIS equipped vessels within range. Combined with a shore station, this system also offers port authorities and maritime safety bodies the ability to manage maritime traffic and reduce the hazards of marine navigation.

Aishub website has data of stations as well as vessels associated with a particular station. The data product that has been built gets its information from the station's page. For every station, there is a yearly data chart which is the required timeseries and metadata will be obtained from the station page itself.

Source: AISHub

Tags: Supply Chain, Logistics, Shipping traffic, Time-series, Risk, Daily

Modules

Scrapping:

Aishub scrapper gets the data of stations along with URL for the particular station. This data will be used as metadata for the timeseries data that will be collected. The collected URLs will be used in prepare step to fetch timeseries data. Data is scrapped using Beautifulsoup and the request library.

Cleaning:

Duplicate stations and additional columns ["Null", "contributor", "ships", "distinct"] are removed from the data. Location names are rectified and country names are formatted correctly.

Prepare:

Yearly data is fetched in the prepare step. Hence, timeseries is created along with the required timestamp format ("%Y-%m-%dT%H:%M:%S%z"). Indicators are added in the metadata.

Geocoder:

Coordinates are added to the metadata for the station location. Region and region code are also appended. Geocoder library is used for getting coordinates.

Standardization:

Additional information like sample frequency, units, source and description are included in the metadata. Function for fetching ISO country code and appending it is present in standardization. Predefined domain and subdomain are added in this step.

MetaData:

Timeseries reference id (ts_ref_id) is added to the timeseries data and the final timeseries is stored in the bucket. Metadata format is finalized and also stored in the s3 bucket.

Ingest:

Metadata and timeseries data is ingested in the mongoDB and latest timestamp id (mongoDB id for latest timestamp) is appended to metadata for decreasing search for latest data point.

Location Risk:

Three different files are available in the LocationRisk. Model.py is used for validating data in the mongoDB and risk_model.py is used to calculate risk score for the data. Risk for Aishub is calculated using z-score and z-score is classified into the risk categories. Pipeline.py fetches data from the mongoDB and implements model.py and risk_model.py. Risk data is ingested into the location risk database.

Metadata

Timeseries Data

Attributes Descriptions
ts_ref_id Id used to connect timeseries data to the metadata
value Timeseries information stored for Aishub (all ships and distinct ships)
timestamp standard timestamp used for the timeseries

MetaData Attributes

Attributes Descriptions
ts_ref_id Id used to connect metadata to the timeseries
coordinates Latitude and Longitude of the station location (geojson format)
country country in which the Aishub station is located
country_code ISO 3-letter country code
date_of_sampling date on which data is collected
description detail on Aishub data which has been collected
domain Predefined domain by Taiyo
Indicator Two types of indicators are stored for Aishub
all_ships all the ships that are at the station
unique_ships all the unique type of ships at the station
location location of the station
name Name of the Station
original_id station id assigned by the Aishub
region region for a country according to World Bank Standards
region_codev region code for a region according to World Bank Standards
sample_frequency frequency in which data gets updated on the source
source website from which data is taken from
state state in which station is located
status latest status of the station
sub_domain Predefined subdomain by Taiyo
time_of_sampling time of data collection
timezone Timezone for the time and date
units Type of value stored in timeseries
uptime online % of the station
url url for the each of the station in the Aishub
latest_timestamp_id mongoDB id for latest timestamp in the timeseries

Data Flow

The above data pipeline runs on Argo and it will be executed on a periodic frequency.

Taiyō Data Format

Entity Aishub Stations
Frequency Daily
Updated On 29-04-2022 UTC 11:50:00 AM
Coverage 700+ stations spread over 50+ countries
Uncertainties Calculated risk is not absolute but dependent on data collected & according to the standard formula used by Taiyo. It can be subjective depending on the business use case.

Scope of Improvement

Following can be improved in the next version of the data product:

  • Bulk data ingestion lacks logic of including previous data in the current dataset. Hence, MetaData and Ingest step needs to be updated for the Aishub data product.
  • Data for the individual ships can be collected and stored in the database. It will help in the better visualization of the Aishub data.