Term-level queries
The following table describes the differences between them:
OpenSearch uses a probabilistic ranking framework called Okapi BM25 to calculate relevance scores. To learn more about Okapi BM25, see Wikipedia.
Assume that you have the complete works of Shakespeare indexed in an OpenSearch cluster. We use a term-level query to search for the phrase “To be, or not to be” in the field:
Sample response
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
We don’t get back any matches (hits
). This is because the term “To be, or not to be” is searched literally in the inverted index, where only the analyzed values of the text fields are stored. Term-level queries aren’t suited for searching on analyzed text fields because they often yield unexpected results. When working with text data, use term-level queries only for fields mapped as keyword only.
Using a full-text query:
GET shakespeare/_search
{
"query": {
"match": {
"text_entry": "To be, or not to be"
}
}
}
The search query “To be, or not to be” is analyzed and tokenized into an array of tokens just like the text_entry
field of the documents. The full-text query performs an intersection of tokens between our search query and the text_entry
fields for all the documents, and then sorts the results by relevance scores:
Sample response
{
"took" : 19,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : 17.419369,
"hits" : [
{
"_index" : "shakespeare",
"_id" : "34229",
"_score" : 17.419369,
"_source" : {
"type" : "line",
"line_id" : 34230,
"play_name" : "Hamlet",
"speech_number" : 19,
"line_number" : "3.1.64",
"speaker" : "HAMLET",
"text_entry" : "To be, or not to be: that is the question:"
},
{
"_index" : "shakespeare",
"_id" : "109930",
"_score" : 14.883024,
"_source" : {
"type" : "line",
"line_id" : 109931,
"play_name" : "A Winters Tale",
"speech_number" : 23,
"line_number" : "4.4.153",
"speaker" : "PERDITA",
}
},
{
"_index" : "shakespeare",
"_id" : "103117",
"_score" : 14.782743,
"_source" : {
"type" : "line",
"line_id" : 103118,
"play_name" : "Twelfth Night",
"speech_number" : 53,
"line_number" : "1.3.95",
"speaker" : "SIR ANDREW",
"text_entry" : "will not be seen; or if she be, its four to one"
}
}
]
}
}
...
For a list of all full-text queries, see .
If you want to query for an exact term like “HAMLET” in the speaker field and don’t need the results to be sorted by relevance scores, a term-level query is more efficient:
GET shakespeare/_search
{
"query": {
"term": {
"speaker": "HAMLET"
}
}
}
Sample response
The term-level queries are exact matches. So, if you search for “Hamlet”, you don’t get back any matches, because “HAMLET” is a keyword field and is stored in OpenSearch literally and not in an analyzed form. The search query “HAMLET” is also searched literally. So, to get a match on this field, we need to enter the exact same characters.
GET shakespeare/_search
{
"query": {
"term": {
"line_id": {
"value": "61809"
}
}
}
}
Terms
Use the terms
query to search for multiple terms in the same field.
GET shakespeare/_search
{
"query": {
"terms": {
"line_id": [
"61809",
"61810"
]
}
}
}
You get back documents that match any of the terms.
Use the ids
query to search for one or more document ID values.
GET shakespeare/_search
{
"query": {
"values": [
34229,
91296
]
}
}
}
Range
Use the range
query to search for a range of values in a field.
To search for documents where the line_id
value is >= 10 and <= 20:
GET shakespeare/_search
{
"query": {
"range": {
"line_id": {
"lte": 20
}
}
}
}
Assume that you have a products
index and you want to find all the products that were added in the year 2019:
Specify relative dates by using basic math expressions.
To subtract 1 year and 1 day from the specified date:
GET products/_search
{
"query": {
"range": {
"created": {
"gte": "2019/01/01||-1y-1d"
}
}
}
}
The first date that we specify is the anchor date or the starting point for the date math. Add two trailing pipe symbols. You could then add one day (+1d
) or subtract two weeks (-2w
). This math expression is relative to the anchor date that you specify.
To find products added in the last year and rounded off by month:
GET products/_search
{
"query": {
"range": {
"created": {
"gte": "now-1y/M"
}
}
}
}
The keyword now
refers to the current date and time.
Use the prefix
query to search for terms that begin with a specific prefix.
GET shakespeare/_search
{
"query": {
"prefix": {
"speaker": "KING"
}
}
}
Exists
Use the exists
query to search for documents that contain a specific field.
GET shakespeare/_search
{
"query": {
"exists": {
"field": "speaker"
}
}
}
Use wildcard queries to search for terms that match a wildcard pattern.
To search for terms that start with H
and end with Y
:
If we change *
to ?
, we get no matches, because ?
refers to a single character.
Wildcard queries tend to be slow because they need to iterate over a lot of terms. Avoid placing wildcard characters at the beginning of a query because it could be a very expensive operation in terms of both resources and time.
Regex
Use the regexp
query to search for terms that match a regular expression.
GET shakespeare/_search
{
"query": {
"regexp": {
"play_name": "[a-zA-Z]amlet"
}
}
}
A few important notes:
- Regular expressions are applied to the terms in the field (i.e. tokens), not the entire field.
- Regular expressions use the Lucene syntax, which differs from more standardized implementations. Test thoroughly to ensure that you receive the results you expect. To learn more, see .
regexp
queries can be expensive operations and require thesearch.allow_expensive_queries
setting to be set totrue
. Before making frequent queries, test their impact on cluster performance and examine alternative queries for achieving similar results.