-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error while loading the docx(KeyError: "There is no item named 'word/#_top' in the archive") #1351
Comments
Sounds like a corrupted docx file. Maybe open it with Word or LibreOffice and save as a new name so it rewrites the file. |
Hello, Thanks for you response. I tried saving with new name but still got the same error. I revalidated the docs it’s not corrupted |
@akash97715 if you can send the file I'll take a look at it. Otherwise I just don't have enough to go on. I've never seen this error before and I've been at it for over a decade, so this is something of an edge case. Do you know the provenance of the document? Was it generated by some package rather than being authored using Word or LibreOffice? |
@scanny I am facing the same issue. I tried renaming the file & saving it again but still getting the issue. I am able to open the file correctly in MS Word. |
I also encountered the same problem, unfortunately I couldn't find the cause, but I used the pypandoc library to read it and saved it again to read it normally, although it would lose my necessary file metadata. |
The fix below was inspired by other issues in this repo (here and here). You just have to add the below code snippet before running the It seems to be due to styled headers or headers with bookmarks. @scanny , I would be interested to know if that makes sense for you. On my use case, the header name was still present in the end list after the change. from docx.opc.oxml import parse_xml
from docx.opc.pkgreader import _SerializedRelationship, _SerializedRelationships
def load_from_xml_v2(baseURI, rels_item_xml):
"""
Return |_SerializedRelationships| instance loaded with the
relationships contained in *rels_item_xml*. Returns an empty
collection if *rels_item_xml* is |None|.
"""
srels = _SerializedRelationships()
if rels_item_xml is not None:
rels_elm = parse_xml(rels_item_xml)
for rel_elm in rels_elm.Relationship_lst:
print(rel_elm.target_ref)
if (
rel_elm.target_ref in ("../NULL", "NULL")
or rel_elm.target_ref.startswith("#_") # Styled headers
):
continue
srels._srels.append(_SerializedRelationship(baseURI, rel_elm))
return srels
_SerializedRelationships.load_from_xml = load_from_xml_v2 |
This is to workaround issue with loading relationships from XML. See python-openxml/python-docx#1351
Hello Team we are using below code to load the document
Getting below error:
Let me know am i doing anything wrong, also it will be helpful if u provide some suggestion to resolve this issue
The text was updated successfully, but these errors were encountered: