Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regex Query for TEXT_MATCH is not working for input words with a space #14983

Open
rahulkrishna-dev opened this issue Feb 4, 2025 · 1 comment

Comments

@rahulkrishna-dev
Copy link

Query being executed:

SELECT 
  keyword
FROM 
  keywords_rank_v1
WHERE 
  TEXT_MATCH(keyword, '/.*amul milk.*/')
  ORDER BY rank ASC
  LIMIT 100

The above query is expected to return all the keywords having amul milk but it is not returning anything. My hypothesis is that the regex expression is being treated as a term query because of the space character, I couldn't find a way to escape the space character. Please help!

@itschrispeck
Copy link
Collaborator

Lucene's StandardAnalyzer (used by default) breaks on spaces and drops them to tokenize the input. Regex can only be applied on a single token.

You may use luceneAnalyzerClass to use a different Analyzer that retains spaces, or leverage span queries if substring search is the only regex pattern needed (see: https://docs.pinot.apache.org/basics/indexing/text-search-support#phrase-search-with-wildcard-term-matching)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants