Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing OCR text for many images #66

Open
danvk opened this issue May 1, 2015 · 3 comments
Open

Missing OCR text for many images #66

danvk opened this issue May 1, 2015 · 3 comments
Labels

Comments

@danvk
Copy link
Owner

danvk commented May 1, 2015

A few examples:

  • 702198b — text on brown backing paper
  • 706410b — brown backing with text
  • 709457b — grayscale with text but no OCR
  • 729236b — grayscale with text but no OCR
  • 711642b — Missing text from color image
  • 711564b — Missing text from color image
  • 716490b — Missing text from color image
  • 731966b — Missing text from gray image (why?)
  • 703429b — Missing text from color image
@danvk danvk added the OCR label May 1, 2015
@danvk
Copy link
Owner Author

danvk commented May 1, 2015

Based on my survey, ~20% of images have text on the back that was not OCR'd.

@danvk
Copy link
Owner Author

danvk commented May 1, 2015

In that list, 8/9 were missing from the NYPL's S3 bucket. 731966b was actually the front of the image.

@riordan There were 30,413 back of the card images in the S3 bucket, but ~43,000 photos in the CSV file that Matt originally sent me. Is there any chance we could recover more of them?

@riordan
Copy link
Collaborator

riordan commented May 1, 2015

I'll try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants