Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve missing docid #116

Open
JasonLo opened this issue Mar 28, 2024 · 1 comment
Open

Resolve missing docid #116

JasonLo opened this issue Mar 28, 2024 · 1 comment

Comments

@JasonLo
Copy link
Collaborator

JasonLo commented Mar 28, 2024

During our routine weekly data ingestion, we encountered an unusual issue with a subset of document identifiers (docids). Specifically, we identified 3659 instances where docids could be successfully retrieved via the xdd API endpoint. However, attempting to locate these same docids through direct access to Elasticsearch resulted in 404 errors, indicating that the documents were not found.

@iross can you take a look at COSMOS1
/hdd/clo36/repo/ask-xDD/notebooks/housekeeping/docids_404.ipynb

@iross
Copy link
Collaborator

iross commented Apr 4, 2024

My gut was a bit wrong here... I'd said Monday that I suspected that this issue was due to desyncs due to duplication clean-ups, but it's really just that the ES7 has fallen behind the older instance.

The docids encode when they were added to xDD, so looking at /hdd/clo36/repo/ask-xDD/tmp/docids_404.txt made it clear that they're all recent (except that first one.. which remains a small mystery). Looking here, it's clear that nothing new has been added to the newer ES instance since mid-February. At that time, I'd been working on transitioning everything over so that ES7 and the kubernetes-backed mongodb was the source of truth, but paused that transition to stay stable through the ASKEM hackathon and never picked it back up :( .

Next week I'm hoping to finish up that cutover and make that instance the default everywhere because maintaining two separate instances is a recipe for endless issues like this.

(EDIT: Whoops, just noticed that it was an unsorted list. What I said holds true for most docids in that list. ~85 appear to be old enough so that I would have expected them to exist in ES7, so some digging still required)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants