SatelliteGraph Details

    SatelliteGraphs enforce and rely on special properties of the underlying collections and hence can only work with collections that are either created implicitly through the SatelliteGraph interface, or manually with the correct properties:

    • There needs to be a prototype collection with replicationFactor set to "satellite"
    • All other collections need to have distributeShardsLike set to the name of the prototype collection

    Collections can be part of multiple SatelliteGraphs. This means that in contrast to SmartGraphs, SatelliteGraphs can be overlapping. If you have a larger SatelliteGraph and want to create an additional SatelliteGraph which only covers a part of it, then you can do so.

    To create a SatelliteGraph in arangosh, use the satellite-graph module:

    Show execution results

    Hide execution results

    1. {[SatelliteGraph]
    2. }

    In contrast to General Graphs and SmartGraphs, you do not need to take care of the sharding and replication properties. The properties distributeShardsLike, replicationFactor and numberOfShards will be set automatically.

    Adding vertex collections is analogous to General Graphs:

    1. arangosh> var graph = satelliteGraphModule._create("satelliteGraph");
    2. arangosh> graph._addVertexCollection("aVertexCollection");

    Show execution results

    Hide execution results

    If the collection "aVertexCollection" doesn’t exist yet, then the SatelliteGraph module will create it automatically with the correct properties. If it exists already, then its properties must be suitable for a SatelliteGraph (see ). Otherwise it will not be added.

    Adding edge collections works the same as with General Graphs, but again, the collections are created by the SatelliteGraph module with the right properties if they don’t exist already.

    1. arangosh> var graph = satelliteGraphModule._create("satelliteGraph");
    2. arangosh> var relation = satelliteGraphModule._relation("isFriend", ["person"], ["person"]);
    3. arangosh> graph._extendEdgeDefinitions(relation);

    Show execution results

    Existing edge collections can be added, but they require the distributeShardsLike property to reference the prototype collection.

    Every SatelliteGraph needs exactly one document collection with replicationFactor set to "satellite". This automatically leads to the collection having an exact amount of one shard per collection. This collection is selected as prototype.

    All other collections of the SatelliteGraph need to inherit its properties by referencing its name in the distributeShardsLike property.

    If collections are created implicitly through the SatelliteGraph module, then this is handled for you automatically. If you want to create the collections manually before adding them to the SatelliteGraph, then you need to take care of these properties.

    Creating an empty SatelliteGraph: No prototype collection is present.

    Show execution results

    Hide execution results

    1. {[SatelliteGraph]
    2. }

    Creating an empty SatelliteGraph, then adding a document (vertex) collection. This leads to the creation of a prototype collection "myPrototypeColl" (assuming that no collection with this name existed before):

    1. arangosh> var satelliteGraphModule = require("@arangodb/satellite-graph");
    2. arangosh> var graph = satelliteGraphModule._create("satelliteGraph");
    3. arangosh> graph._addVertexCollection("myPrototypeColl");

    Show execution results

    Hide execution results

    Creating an empty SatelliteGraph, then adding an edge definition. This will select the collection "person" as prototype collection, as it is the only document (vertex) collection. If you supply more than one document collection, then one of the collections will be chosen arbitrarily as prototype collection.

    1. arangosh> var satelliteGraphModule = require("@arangodb/satellite-graph");
    2. arangosh> var graph = satelliteGraphModule._create("satelliteGraph");
    3. arangosh> var relation = satelliteGraphModule._relation("isFriend", ["person"], ["person"]);
    4. arangosh> graph._extendEdgeDefinitions(relation);

    Show execution results

    The prototype collection can and also will be automatically selected during the graph creation process if at least one document (vertex) collection is supplied directly. If more then one are available, they will be chosen randomly as well, regardless whether they are set inside the edge definition itself or set as a vertex/orphan collection.

    Obviously, a SatelliteGraph must be created before it can be queried. Valid operations that can then be optimized are (k-)shortest path(s) computations and traversals. Both also allow for combination with local joins or other SatelliteGraph operations.

    Here is an example showing the difference between the execution of a General Graph and a SatelliteGraph traversal query:

    1. First we setup our graphs and collections.

    Show execution results

    Hide execution results

    1. {[Graph]
    2. "edges" : [ArangoCollection 10073, "edges" (type edge, status loaded)],
    3. "vertices" : [ArangoCollection 10072, "vertices" (type document, status loaded)]
    4. }
    5. {[SatelliteGraph]
    6. "satEdges" : [ArangoCollection 10079, "satEdges" (type edge, status loaded)],
    7. "satVertices" : [ArangoCollection 10077, "satVertices" (type document, status loaded)]
    8. }
    9. [ArangoCollection 10081, "collection" (type document, status loaded)]
    1. Let us analyze a query involving a traversal:

      Show execution results

      Hide execution results

      1. Query String (99 chars, cacheable: true):
      2. FOR doc in collection FOR v,e,p IN OUTBOUND "vertices/start" GRAPH "normalGraph" RETURN [doc,v,e,p]
      3. Execution plan:
      4. Id NodeType Site Est. Comment
      5. 1 SingletonNode DBS 1 * ROOT
      6. 2 EnumerateCollectionNode DBS 0 - FOR doc IN collection /* full collection scan, 8 shard(s) */
      7. 8 RemoteNode COOR 0 - REMOTE
      8. 9 GatherNode COOR 0 - GATHER /* unsorted */
      9. 3 TraversalNode COOR 1 - FOR v /* vertex */, e /* edge */, p /* paths: vertices, edges */ IN 1..1 /* min..maxPathDepth */ OUTBOUND 'vertices/start' /* startnode */ GRAPH 'normalGraph'
      10. 4 CalculationNode COOR 1 - LET #6 = [ doc, v, e, p ] /* simple expression */ /* collections used: doc : collection */
      11. 5 ReturnNode COOR 1 - RETURN #6
      12. Indexes used:
      13. By Name Type Collection Unique Sparse Selectivity Fields Ranges
      14. 3 edge edge edges false false 100.00 % [ `_from` ] base OUTBOUND
      15. Traversals on graphs:
      16. Id Depth Vertex collections Edge collections Options Filter / Prune Conditions
      17. 3 1..1 vertices edges uniqueVertices: none, uniqueEdges: path
      18. Optimization rules applied:
      19. Id RuleName
      20. 1 scatter-in-cluster
      21. 2 remove-unnecessary-remote-scatter
      22. Optimization rules with highest execution times:
      23. RuleName Duration [s]
      24. scatter-in-cluster 0.00001
      25. remove-unnecessary-remote-scatter 0.00001
      26. restrict-to-single-shard 0.00001
      27. optimize-traversals 0.00000
      28. use-indexes 0.00000
      29. 59 rule(s) executed, 1 plan(s) created

      You can see that the TraversalNode is executed on a Coordinator, and only the EnumerateCollectionNode is executed on DB-Servers. This will happen for each of the 8 shards in collection.

      1. Let us now have a look at the same query using a SatelliteGraph:

      Show execution results

      Hide execution results

      1. Query String (102 chars, cacheable: true):
      2. [doc,v,e,p]
      3. Execution plan:
      4. Id NodeType Site Est. Comment
      5. 1 SingletonNode DBS 1 * ROOT
      6. 2 EnumerateCollectionNode DBS 0 - FOR doc IN collection /* full collection scan, 8 shard(s) */
      7. 10 TraversalNode DBS 1 - FOR v /* vertex */, e /* edge */, p /* paths: vertices, edges */ IN 1..1 /* min..maxPathDepth */ OUTBOUND 'vertices/start' /* startnode */ satEdges /* local graph node, used as satellite */
      8. 4 CalculationNode DBS 1 - LET #6 = [ doc, v, e, p ] /* simple expression */ /* collections used: doc : collection */
      9. 13 RemoteNode COOR 1 - REMOTE
      10. 14 GatherNode COOR 1 - GATHER /* parallel, unsorted */
      11. 5 ReturnNode COOR 1 - RETURN #6
      12. Indexes used:
      13. By Name Type Collection Unique Sparse Selectivity Fields Ranges
      14. 10 edge edge satEdges false false 100.00 % [ `_from` ] base OUTBOUND
      15. Traversals on graphs:
      16. Id Depth Vertex collections Edge collections Options Filter / Prune Conditions
      17. 10 1..1 satVertices satEdges uniqueVertices: none, uniqueEdges: path
      18. Optimization rules applied:
      19. Id RuleName
      20. 1 scatter-in-cluster
      21. 2 scatter-satellite-graphs
      22. 3 remove-satellite-joins
      23. 4 distribute-filtercalc-to-cluster
      24. 5 remove-unnecessary-remote-scatter
      25. 6 parallelize-gather
      26. Optimization rules with highest execution times:
      27. RuleName Duration [s]
      28. scatter-satellite-graphs 0.00002
      29. scatter-in-cluster 0.00001
      30. remove-satellite-joins 0.00001
      31. parallelize-gather 0.00001
      32. use-indexes 0.00001

      Note that now the TraversalNode is executed on each DB-Server, leading to a great reduction in required network communication, and hence potential gains in query performance.

      If you want to transform an existing General Graph or SmartGraph into a SatelliteGraph, then you need to dump and restore your previous graph. This is necessary for the initial data replication and because some collection properties are immutable.