Phrase and Proximity Search with ArangoSearch

    With phrase search, you can query for tokens in a certain order. This allows you to match partial or full sentences. You can also specify how many arbitrary tokens may occur between defined tokens for word proximity searches.

    Dataset: IMDB movie dataset

    View definition:

    AQL Queries:

    Search for movies that have the (normalized and stemmed) tokens and blockbust in their description, in this order:

    1. FOR doc IN imdb
    2. SEARCH ANALYZER(PHRASE(doc.description, "BIGGEST Blockbuster"), "text_en")
    3. RETURN {
    4. title: doc.title,
    5. description: doc.description
    6. }

    The search phrase can be handed in via a bind parameter, but it can also be constructed dynamically using a subquery for instance:

    1. FOR word IN ["tale", "of", "a", "woman"]
    2. SORT RAND()
    3. LIMIT 2
    4. RETURN word
    5. FOR doc IN imdb
    6. SEARCH ANALYZER(PHRASE(doc.description, p), "text_en")
    7. RETURN {
    8. title: doc.title,
    9. }

    You will get different results if you re-run this query multiple times.

    The PHRASE() functions lets you specify tokens and the number of wildcard tokens in an alternating order. You can use this to search for two words with one arbitrary word in between the two words, for instance.

    AQL Queries:

    Match movies that contain the phrase epic <something> film in their description, where <something> can be exactly one arbitrary token:

    The search phrase can also be dynamic. The following query looks up a particular movie with the title Family Business, tokenizes the title and then performs a proximity search for movies with the phrase family <something> business or in their description:

    1. LET title = DOCUMENT("imdb_vertices/39967").title // Family Business
    2. FOR doc IN imdb
    3. SEARCH ANALYZER(
    4. PHRASE(doc.description, INTERLEAVE(TOKENS(title, "text_en"), [1])) OR
    5. PHRASE(doc.description, INTERLEAVE(TOKENS(title, "text_en"), [2])), "text_en")
    6. RETURN {
    7. title: doc.title,
    8. description: doc.description

    Phrase search is not limited to finding full and exact tokens in a particular order. It also lets you search for prefixes, strings with wildcards, etc. in the specified order. See the object tokens description of the for a full list of options.

    AQL Queries:

    Match movies where the title has a token that starts with Härr (normalized to harr), followed by six arbitrary tokens and then a token that contains eni:

    The search terms used in object tokens need to be pre-processed manually as shown above with STARTS_WITH.