Different data models and scalability

    The key/value store data model is the easiest to scale. In ArangoDB, this is implemented in the sense that a document collection always has a primary key attribute and in the absence of further secondary indexes the document collection behaves like a simple key/value store.

    For the document store case even in the presence of secondary indexes essentially the same arguments apply, since an index for a sharded collection is simply the same as a local index for each shard. Therefore, single document operations still scale linearly with the size of the cluster, unless a special sharding configuration makes lookups or write operations more expensive.

    Nevertheless, for certain complicated joins, there are limits as to what can be achieved.

    However, if the vertices and edges along the occurring paths are distributed across the cluster, then a lot of communication is necessary between nodes, and performance suffers. To achieve good performance at scale, it is therefore necessary to get the distribution of the graph data across the shards in the cluster right. Most of the time, the application developers and users of ArangoDB know best, how their graphs are structured. Therefore, ArangoDB allows users to specify, according to which attributes the graph data is sharded. A useful first step is usually to make sure that the edges originating at a vertex reside on the same cluster node as the vertex.