Datamesh

Welcome to the Taiyō Data Mesh. This repository contains all the resources and source code for build data products which are part of taiyō data mesh infrastructure.

Traditional data infrastructure maintained for different product case scenarios are usually very bulky and involves a lot of effort to build and maintain them. The old architectures are very centralized and can enforce a lot of rigidity when working to meet the demands of data stakeholders
Data mesh is a new data infrastructure trend where data serving and governance is decentralized based on domain specific data products. The approach is very scalable and also helps in generation and movement of data across the organization much smoothly.

Below are some references to help better understand the Data mesh and Products and also some other resources for building and submitting potential Data Products for our infrastructure.

The above diagram describes a Data Product and it consists of different layers for collection, processing and serving of data.
During the ingestion phase polyglot input sources are taken in by a data product and are first ingested into an S3 bucket for serving or other processing if required.
Once the data is ready it can be further enriched using GLUE, Lambda and other ETL software flows to be repopulated.
Using Serving stack the data product gives a Polyglot output which can be consumed by different apps.
Logging is done by Cloudwatch as a universal sink for all data products. IAM with API Gateway is used to regulate the access of the api with certain apps or users.
A catalog is maintained to understand the data to be fetched by the api and also see the knowledge graph associated with the info across other data products
All the infrastructure for respective aspects of Data Product are maintained using terraform cloud and Cloud Formation for easier management.
Data Products are chained together to generate new data products and this is to ensure a more distributed data serving architecture. This pattern is similar to microservices architecture.
Uses serverless offerings so its mostly cost efficient and scales based on demand.
This can also adapt to real time data processing using Kinesis, GLUE for collection of data.
Service discovery is implemented using ECS and API Gateway so its easy to connect to an existing data product as ployglot input source and further build new data products.

References:

Data Product consists of different layers for collection, processing, and serving of data. A catalog is maintained to understand the data to be fetched by the API and also see the knowledge graph associated with the info across other data products