Ranking View Query Results

    ArangoSearch supports the two most popular ranking schemes:

    Under the hood, both models rely on two main components:

    • Term frequency (TF): in the simplest case defined as the number of times a term occurs in a document
    • Inverse document frequency (IDF): a measure of how relevant a term is, i.e. whether the word is common or rare across all documents

    See Ranking in ArangoSearch in the ArangoSearch Tutorial to learn more about the ranking model.

    To sort View results from most relevant to least relevant, use a with a call to a Scoring function as expression and set the order to descending. Scoring functions expect the document emitted by a loop that iterates over a View as first argument.

    You can also return the ranking score as part of the result.

    1. FOR doc IN viewName
    2. SEARCH
    3. RETURN MERGE(doc, { bm25: BM25(doc), tfidf: TFIDF(doc) })

    Scoring functions cannot be used outside of SEARCH operations, as the scores can only be computed in the context of a View, especially because of the inverse document frequency (IDF).

    View definition:

    1. {
    2. "links": {
    3. "imdb_vertices": {
    4. "fields": {
    5. "description": {
    6. "analyzers": [
    7. "text_en"
    8. ]
    9. }
    10. }
    11. }
    12. }

    AQL Queries:

    Search for movies with certain keywords in their description and rank the results using the :

    Do the same but with the TFIDF() function:

    1. FOR doc IN imdb
    2. SEARCH ANALYZER(doc.description IN TOKENS("amazing action world alien sci-fi science documental galaxy", "text_en"), "text_en")
    3. SORT TFIDF(doc) DESC
    4. RETURN {
    5. title: doc.title,
    6. description: doc.description,
    7. score: TFIDF(doc)
    8. }

    Query Time Relevance Tuning

    You can fine-tune the scores computed by the Okapi BM25 and TF-IDF relevance models at query time via the BOOST() AQL function and also calculate a custom score. In addition, the BM25() function lets you adjust the coefficients at query time.

    The BOOST() function is similar to the ANALYZER() function in that it accepts any valid SEARCH expression as first argument. You can set the boost factor for that sub-expression via the second parameter. Documents that match boosted parts of the search expression will get higher scores.

    View definition:

    1. {
    2. "links": {
    3. "imdb_vertices": {
    4. "fields": {
    5. "description": {
    6. "analyzers": [
    7. "text_en"
    8. ]
    9. }
    10. }
    11. }
    12. }

    AQL Queries:

    Prefer galaxy over the other keywords:

    If you are an information retrieval expert and want to fine-tuning the weighting schemes at query time, then you can do so. The BM25() function accepts free coefficients as parameters to turn it into BM15 for instance:

    1. FOR doc IN imdb
    2. SEARCH ANALYZER(doc.description IN TOKENS("amazing action world alien sci-fi science documental", "text_en")
    3. OR BOOST(doc.description IN TOKENS("galaxy", "text_en"), 5), "text_en")
    4. LET score = BM25(doc, 1.2, 0)
    5. SORT score DESC
    6. LIMIT 10
    7. RETURN {
    8. title: doc.title,
    9. description: doc.description,
    10. score
    11. }

    You can also calculate a custom score, taking into account additional fields of the document.

    Match movies with the (normalized) phrase star war in the title and calculate a custom score based on BM25 and the movie runtime to favor longer movies:

    1. FOR doc IN imdb
    2. SEARCH PHRASE(doc.title, "Star Wars", "text_en")
    3. LET score = BM25(doc) * LOG(doc.runtime + 1)
    4. SORT score DESC
    5. RETURN {
    6. title: doc.title,
    7. runtime: doc.runtime,
    8. bm25: BM25(doc),
    9. }