Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BookWorm: allow imports via BookWorm again #10313

Merged

Conversation

scottbarnes
Copy link
Collaborator

@scottbarnes scottbarnes commented Jan 10, 2025

Closes #10230, #10316
Related: internetarchive/olsystem#253

This commit reenables /isbn on BookWorm and by extension openlibrary.org.

Note: this changes the signature of vendors.AmazonAPI() to add a proxy_url parameter for passing in an HTTP Proxy URL.

It will be passed by default when loaded by BookWorm, but if using AmazonAPI() in the REPL on ol-home0, then proxy_url will need to be passed when instantiating AmazonAPI().

I checked 1000 staged items from Amazon and every one had the cover hosted at m.media-amazon.com. I had fewer Google Books items to check, but they seem to all have covers coming from books.google.com.

Testing

The BookWorm//isbn part of this was tested by visiting https://openlibrary.org/isbn/6610742898 with the relevant part of the code patch deployed to BookWorm.

That resulted in importing https://openlibrary.org/books/OL57448379M/Environmental_Enrichment_for_Captive_Animals, as the Amazon record has no 'cover' field.

The HTTP Proxy having been updated, the following was imported (with cover) from /isbn/<isbn>: https://testing.openlibrary.org/books/OL57494524M/Postcolonial_Hauntings.

Possibly follow up: numerous ISBNs will have failed import over the past few months, when tried via /isbn. They will not now import as they've been marked as failed. It may be desirable to fix this in the database. A staff member will need to do it.

Screenshot

Stakeholders

@mekarpeles

@github-actions github-actions bot added the Priority: 2 Important, as time permits. [managed] label Jan 10, 2025
@scottbarnes scottbarnes marked this pull request as draft January 10, 2025 21:20
@mekarpeles
Copy link
Member

Product Notes:

Later, another thing we could do is see if a contributor can:

  • architect it so if someone enters a url, the image is fetched first client-side using js and then those bytes get uploaded to the server if we think it's an image. 🤷
  • and search for covers when the manager cover modal opens -- e.g. look on amazon, bwb.

@scottbarnes scottbarnes force-pushed the add-bookworm-proxy-support-redux branch 2 times, most recently from e1cfcf0 to 2fdb6f1 Compare January 12, 2025 01:20
@scottbarnes scottbarnes force-pushed the add-bookworm-proxy-support-redux branch 2 times, most recently from 5012940 to c07c819 Compare January 12, 2025 01:34
@scottbarnes scottbarnes marked this pull request as ready for review January 13, 2025 20:39
@mekarpeles mekarpeles marked this pull request as draft January 13, 2025 20:59
@scottbarnes scottbarnes force-pushed the add-bookworm-proxy-support-redux branch from 36eae72 to ad75ed4 Compare January 13, 2025 22:57
Scott Barnes added 11 commits January 14, 2025 19:40
This commit reenables `/isbn` on both BookWorm and by extension
openlibrary.org.

Note: this changes the signature of `vendors.AmazonAPI()` to add a
`proxy_url` parameter for passing in an HTTP Proxy URL.
Move setup_requests to a place less likely to cause circular imports.
`add_book/__init__()` calls `add_cover()`, which ultimately uses
`requests` to fetch covers from, e.g., Amazon.

This commit adds support for using the proxy during the `add_book`
process.
create_edition_from_amazon_metadata does not appear to be imported for
its side effects.
This simply moves some existing code to another section and is meant to
make code review easier, so moving code and substantive changes aren't
both in the same commit.
Because of the HTTP Proxy requirements, only known cover hosts are used
when importing via `load()` (e.g. from `/isbn` or via `/api/import`).

`'cover'` values not found in `ALLOWED_COVER_HOSTS` are simply ignored
during import to prevent timeouts when trying to connect to hosts not
added to the proxy.
Allow cover fetching from books.google.com
This will need an update to `olsystem` to `NO_PROXY` `.openlibrary.org`.
@scottbarnes scottbarnes force-pushed the add-bookworm-proxy-support-redux branch from ad75ed4 to 3c84e06 Compare January 15, 2025 03:45
Copy link
Member

@mekarpeles mekarpeles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm pending adding to coverstore

@scottbarnes scottbarnes marked this pull request as ready for review January 21, 2025 05:25
@scottbarnes scottbarnes merged commit 5b4a7fe into internetarchive:master Jan 21, 2025
4 checks passed
@scottbarnes scottbarnes deleted the add-bookworm-proxy-support-redux branch January 21, 2025 15:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority: 2 Important, as time permits. [managed]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Permit webservices.amazon.com via 44.215.138.164 to avoid blockage
2 participants