Ranking View Query Results

ArangoSearch supports the two most popular ranking schemes:

Okapi BM25

Under the hood, both models rely on two main components:

Term frequency (TF): in the simplest case defined as the number of times a term occurs in a document
Inverse document frequency (IDF): a measure of how relevant a term is, i.e. whether the word is common or rare across all documents

See Ranking in ArangoSearch in the ArangoSearch Tutorial to learn more about the ranking model.

To sort View results from most relevant to least relevant, use a with a call to a Scoring function as expression and set the order to descending. Scoring functions expect the document emitted by a loop that iterates over a View as first argument.

You can also return the ranking score as part of the result.

FOR doc IN viewName
  SEARCH …
  RETURN MERGE(doc, { bm25: BM25(doc), tfidf: TFIDF(doc) })

Scoring functions cannot be used outside of SEARCH operations, as the scores can only be computed in the context of a View, especially because of the inverse document frequency (IDF).

View definition:

{
  "links": {
    "imdb_vertices": {
      "fields": {
        "description": {
          "analyzers": [
            "text_en"
          ]
        }
      }
    }
  }

AQL Queries:

Search for movies with certain keywords in their description and rank the results using the :

Do the same but with the TFIDF() function:

FOR doc IN imdb
  SEARCH ANALYZER(doc.description IN TOKENS("amazing action world alien sci-fi science documental galaxy", "text_en"), "text_en")
  SORT TFIDF(doc) DESC
  RETURN {
    title: doc.title,
    description: doc.description,
    score: TFIDF(doc)
  }

Query Time Relevance Tuning

You can fine-tune the scores computed by the Okapi BM25 and TF-IDF relevance models at query time via the BOOST() AQL function and also calculate a custom score. In addition, the BM25() function lets you adjust the coefficients at query time.

The BOOST() function is similar to the ANALYZER() function in that it accepts any valid SEARCH expression as first argument. You can set the boost factor for that sub-expression via the second parameter. Documents that match boosted parts of the search expression will get higher scores.

View definition:

{
  "links": {
    "imdb_vertices": {
      "fields": {
        "description": {
          "analyzers": [
            "text_en"
          ]
        }
      }
    }
  }

AQL Queries:

Prefer galaxy over the other keywords:

If you are an information retrieval expert and want to fine-tuning the weighting schemes at query time, then you can do so. The BM25() function accepts free coefficients as parameters to turn it into BM15 for instance:

FOR doc IN imdb
  SEARCH ANALYZER(doc.description IN TOKENS("amazing action world alien sci-fi science documental", "text_en")
      OR BOOST(doc.description IN TOKENS("galaxy", "text_en"), 5), "text_en")
  LET score = BM25(doc, 1.2, 0)
  SORT score DESC
  LIMIT 10
  RETURN {
    title: doc.title,
    description: doc.description,
    score
  }

You can also calculate a custom score, taking into account additional fields of the document.

Match movies with the (normalized) phrase star war in the title and calculate a custom score based on BM25 and the movie runtime to favor longer movies:

FOR doc IN imdb
  SEARCH PHRASE(doc.title, "Star Wars", "text_en")
  LET score = BM25(doc) * LOG(doc.runtime + 1)
  SORT score DESC
  RETURN {
    title: doc.title,
    runtime: doc.runtime,
    bm25: BM25(doc),
  }