Overview

    Why do you need CS (Context Service)?

    CS is used to solve the problem of data and information sharing across multiple systems in a data application development process.

    For example, system B needs to use a piece of data generated by system A. The usual practice is as follows:

    1. B system calls the data access interface developed by A system;

    2. System B reads the data written by system A into a shared storage.

    With CS, the A and B systems only need to interact with the CS, write the data and information that need to be shared into the CS, and read the data and information that need to be read from the CS, without the need for an external system to develop and adapt. , Which greatly reduces the call complexity and coupling of information sharing between systems, and makes the boundaries of each system clearer.

    Metadata context

    The metadata context defines the metadata specification.

    Metadata context relies on data middleware, and its main functions are as follows:

    1. Open up the relationship with the data middleware, and get all user metadata information (including Hive table metadata, online database table metadata, and other NOSQL metadata such as HBase, Kafka, etc.)

    2. When all nodes need to access metadata, including existing metadata and metadata in the application template, they must go through the metadata context. The metadata context records all metadata information used by the application template.

    3. The new metadata generated by each node must be registered with the metadata context.

    Metadata context is the basis of interactive workflows and the basis of application templates. Imagine: When Widget is defined, how to know the dimensions of each indicator defined by DataWrangler? How does Qualitis verify the graph report generated by Widget?

    Data context

    The data context depends on data middleware and Linkis computing middleware. The main functions are as follows:

    1. Get through the data middleware and get all user data information.

    2. Get through the computing middleware and get the data storage information of all nodes.

    3. When all nodes need to write temporary results, they must pass through the data context and be uniformly allocated by the data context.

    4. When all nodes need to access data, they must pass the data context.

    5. The data context distinguishes between dependent data and generated data. When the application template is extracted, all dependent data is abstracted and packaged for the application template.

    The resource context defines the resource specification.

    The resource context mainly interacts with Linkis computing middleware. The main functions are as follows:

    1. User resource files (such as Jar, Zip files, properties files, etc.)

    2. User UDF

    3. User algorithm package

    4. User script

    Environmental context

    The environmental context defines the environmental specification.

    1. Operating System

    2. Software, such as Hadoop, Spark, etc.

    Object context

    The runtime context is all the context information retained when the application template (workflow) is defined and executed.

    It is used to assist in defining the workflow/application template, prompting and perfecting all necessary information when the workflow/application template is executed.

    The runtime workflow is mainly used by Linkis.

    Overview - 图2

    1. Client

    The entrance of external access to CS, Client module provides HA function; Enter Client Architecture Design

    Provide a Restful interface to encapsulate and process CS requests submitted by the client;

    3. ContextSearch

    The context query module provides rich and powerful query capabilities for the client to find the key-value key-value pairs of the context;

    4. Listener

    The CS listener module provides synchronous and asynchronous event consumption capabilities, and has the ability to notify the Client in real time once the Zookeeper-like Key-Value is updated;

    5. ContextCache

    The context memory cache module provides the ability to quickly retrieve the context and the ability to monitor and clean up JVM memory usage;

    Provide CS high availability capability; Enter HighAvailable architecture design

    7. Persistence