Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix single double quote breaking solr query #10405

Conversation

tomrod10
Copy link
Contributor

@tomrod10 tomrod10 commented Jan 30, 2025

Closes #10390

Searching with an unmatched double quote ", causes the Solr query to be invalid/break.

On line 484 in ~/openlibrary/plugins/worksearch/schemes/works.py it looks like it escapes already escaped ". Therefore turning the string:
'Compilation Group for the \\"History of Modern China' into 'Compilation Group for the \\\\"History of Modern China'.

I edited the code on line 484 to only escape " if \\" is not in the ed_q string. Otherwise, either all " will be escaped if ed_q is not falsy, else the kwarg v will be set to *:*.

Technical

Testing

Added tests:

  • ~/openlibrary/plugins/worksearch/schemes/tests/test_works.py

Manual testing:

  • Searched words and phrases with a single or an odd count of "
  • Checked that the query results in a 200 OK response

Screenshot

Stakeholders

@cdrini

@tomrod10 tomrod10 changed the title 10390/fix/sanitize single double quote in solr query [WIP] 10390/fix/sanitize single double quote in solr query Jan 30, 2025
@tomrod10
Copy link
Contributor Author

The following tests are failing:

  • openlibrary/plugins/worksearch/schemes/tests/test_works.py::test_process_user_query[Spaces after fields]
  • openlibrary/plugins/worksearch/schemes/tests/test_works.py::test_process_user_query[Quotes]
  • openlibrary/plugins/worksearch/schemes/tests/test_works.py::test_process_user_query[LCC: quotes preserved]

@tomrod10
Copy link
Contributor Author

tomrod10 commented Jan 30, 2025

Probably will move my test to tests/test_works.py and delete tests/test_SearchScheme.py

@tomrod10 tomrod10 force-pushed the 10390/fix/sanitize-single-double-quote-in-solr-query branch 7 times, most recently from b4c2ca5 to 8ea478d Compare February 7, 2025 02:36
@tomrod10 tomrod10 force-pushed the 10390/fix/sanitize-single-double-quote-in-solr-query branch from da168bc to cee2429 Compare February 14, 2025 04:05
@tomrod10 tomrod10 force-pushed the 10390/fix/sanitize-single-double-quote-in-solr-query branch from be4c806 to 477bae6 Compare February 14, 2025 04:12
@tomrod10 tomrod10 marked this pull request as ready for review February 14, 2025 04:12
@tomrod10 tomrod10 changed the title [WIP] 10390/fix/sanitize single double quote in solr query #10390/fix/single double quote breaking solr query Feb 14, 2025
Copy link
Collaborator

@cdrini cdrini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent some time researching/debugging this one, and I think I found a better solution.

Escaping the " to embed it in the larger edismax query is potentially a bit of a security risk, since if we don't escape it correctly, a user could effectively run any complicated query on our solr!

We can avoid escaping it to embed in the edismax query entirely, if we take advantage of a solr feature that lets us put edismax parameters in separate url params. E.g.

?q={!edismax v=$someParam}
&someParam=users raw "query

Testing this out on testing now...

@cdrini cdrini changed the title #10390/fix/single double quote breaking solr query FIx single double quote breaking solr query Feb 21, 2025
@cdrini cdrini changed the title FIx single double quote breaking solr query Fix single double quote breaking solr query Feb 21, 2025
Copy link
Collaborator

@cdrini cdrini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hit one issue about editions.q having a different scope for the template query parameters (?!?!) but appears to be working now!

Great work on finding the root cause of this @tomrod10 !

@cdrini cdrini merged commit 427f1f4 into internetarchive:master Feb 21, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Solr searches fail when query contains an unclosed " character
2 participants