Skip to content

Standards Library — Space, Time, and Tags (STT)

For scalable AI in a global business context, there is the need for standardized centralized data and that is not available today. It's super hard to curate, standardize, and aggregate data. There are trillions of new data points every day and the cost of ignoring it is too high because the world is changing even faster. As most companies are not set up to ingest and synthesize disparate data sources, our work is founded to help people and organizations manage local and global uncertainty in a more proactive manner.

Our standards are based on connecting existing open-source standards to inform a more systematic and scalable data, analytical, and decision workflow. These standards are dimensions collection that are targeted for three access point in system design:

  • Data engineers who are interacting and contributing data to the data system
  • System-wide validation and curation
    • Machine-driven. Part of additional curation is led by automated workflows using machine learning methods to augment tags and related mapping to our global and local standards. The system flags if the data is delayed or missing values.
    • Expert curation. To ensure reliability we have manual curation steps to validate individual downsampled dataset with the original source.
  • Scientists and analysts using our data for analysis within our environment, TaiyōIQ or working independently with the data

Knowledge Vault (KV) is an interesting project worth mentioning. KV was a Google Project that contained three major components. First, a system that extracts triples from a huge number of Web sources. Each extractor assigns a confidence score to an extracted triple, representing uncertainty about the identity of the relation and its corresponding arguments. The second system learns the graph based prior probability of each possible triple, based on triples stored in an existing KB. The third system does knowledge fusion. This system computes the probability of a triple being true, based on agreement between different extractors and priors (Dong et al. 2014).

In both cases of KV and YAGO, and many others, they are based on natural language programming (NLP) related entity extractions. In our system architecture, the instances of concern are continuous random variables at various time-series types and complex hierarchical structure.

There is essentially a three-step process for the Knowledge Base (KB) construction. Raw data from various information sources are ingested in our data system. The environment provides a standardization of data ingestion in the source tables. The source tables are the raw data extraction at scale from various sources such as GDELT, IMF, USGS etc. The source tables from individual sources are taken as an input to create secondary tables. The secondary tables integrate the individual source using a global schema, ontology, and data dictionary. The secondary tables provide a single end-point for accessing the multiple information sources through a single import based on analysts’ data preferences. Figure below provides an abstraction of the 3d data global standards .