Regex Query for TEXT_MATCH is not working for input words with a space #14983

rahulkrishna-dev · 2025-02-04T06:38:14Z

Query being executed:

SELECT 
  keyword
FROM 
  keywords_rank_v1
WHERE 
  TEXT_MATCH(keyword, '/.*amul milk.*/')
  ORDER BY rank ASC
  LIMIT 100

The above query is expected to return all the keywords having amul milk but it is not returning anything. My hypothesis is that the regex expression is being treated as a term query because of the space character, I couldn't find a way to escape the space character. Please help!

The text was updated successfully, but these errors were encountered:

itschrispeck · 2025-02-06T00:48:34Z

Lucene's StandardAnalyzer (used by default) breaks on spaces and drops them to tokenize the input. Regex can only be applied on a single token.

You may use luceneAnalyzerClass to use a different Analyzer that retains spaces, or leverage span queries if substring search is the only regex pattern needed (see: https://docs.pinot.apache.org/basics/indexing/text-search-support#phrase-search-with-wildcard-term-matching)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regex Query for TEXT_MATCH is not working for input words with a space #14983

Regex Query for TEXT_MATCH is not working for input words with a space #14983

rahulkrishna-dev commented Feb 4, 2025

itschrispeck commented Feb 6, 2025

Regex Query for TEXT_MATCH is not working for input words with a space #14983

Regex Query for TEXT_MATCH is not working for input words with a space #14983

Comments

rahulkrishna-dev commented Feb 4, 2025

itschrispeck commented Feb 6, 2025