Skip to content

Polish&refactor code for lxml and etree gramplets #707

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 26 commits into
base: maintenance/gramps60
Choose a base branch
from

Conversation

romjeromealt
Copy link
Contributor

see also #700

@romjeromealt romjeromealt marked this pull request as ready for review April 11, 2025 18:50
@giotodibondone
Copy link

giotodibondone commented Apr 11, 2025

etree works 👍
2025-04-12 09_16_06-Example Family Tree - Dashboard - Gramps

lxml has an error ( parsing error ) probably related to that missing library on the windows aio ( xmllint ) ?
2025-04-12 09_17_13-Example Family Tree - Dashboard - Gramps

2025-04-12 09_15_28-Example Family Tree - Dashboard - Gramps

2025-04-12 09_17_42-Example Family Tree - Dashboard - Gramps

Apart from that maybe an English phrase can be improved for the etree gramplet that shows "XML: Last 5 editions since 09 March 2018; were on/at:" by the word "editions" I believe you mean additions which means to add something/change etc as editions is related to books , but I understand what you mean "Edit" + adding :) and the display output aligned a bit better?

Also update your attribution to yourself to 2025 as needed eg "# Copyright (C) 2012-2025 Jerome Rapinat"

@romjeromealt
Copy link
Contributor Author

romjeromealt commented Apr 12, 2025

@giotodibondone
Thanks!
I guess that the screenshot for etree gramplet is the old one? There was a tweak improvement since your previous post with this screenshot. Pseudo-count has 6 characters on last version. So, the separator "|" between data from XML and your Family Tree should be on the middle of the lines (or close to the same location after x number of characters).

Capture d’écran de 2025-04-12 10-27-40
... you can also run it without Family Tree loaded...

"XML: Last 5 editions since 09 March 2018; were on/at:"

Yes, there is also maybe a possible improvement as "on/at" is not very elegant.
Did you try to change the default number (5) to greater or smaller values? There is currently only one option on the config dialog for this gramplet. But, as python 3.8 now provides a XPATH 1.0 support for ElementTree built-in module, it should be possible to play with an XPATH selector-like or pretty print feature. Custom XPATH elements and attributes are listed on separated files into this addons-sources repository. A quick look at github repositories give an overview of advanced work (e.g., https://elementpath.readthedocs.io/en/latest/introduction.html )

I supposed that there was a fallback (skip if xmllint tools failed), but there is no real failure as your system was able to print something. I guess it is a Windows OS message or a sandbox stuff on python or AIO environment. There is an alternative by using lxml validation for DTD and rng. This might generate the same type of issues as for XSD. Also, the idea was to test multiple ways (xmllint is an alternative to validations via lxml only), but related to libxml2 and libxslt (the C stuff).

About additions, this might be also confusing because modifications are included too. So, the idea behind the sentence is rather : "Edit" + (adding or modify) :)
Edition was the simplest word for me, cause of actions via Editors. Once timestamp on records is recent (close to Today, 2025-04-12 : 1744380000), we can easily compute and play with numbers. We could also display the period (one day ago, one week ago, 6 months ago, etc.), provide a timeline or ASCII art graph, etc. ;-)

@PQYPLZXHGF
Copy link

Haven't tried the addon but thank you for it 👍

Edition was the simplest word for me, cause of actions via Editors

Edition is not really a word used in English for that purpose it just looks and reads wrong to an English speaker, I suggest as mentioned on stackoverflow to use the words like editing/edit(s) or revising/revision(s) so maybe just Edits in this case?

The etree gramplet screenshot also shows "XML: Number of records and relations: " what do you mean by "relations" as it also does not sound quite correct in English?

@romjeromealt
Copy link
Contributor Author

romjeromealt commented Apr 13, 2025

Edition is not really a word used in English for that purpose it just looks and reads wrong to an English speaker, I suggest as mentioned on stackoverflow to use the words like editing/edit(s) or revising/revision(s) so maybe just Edits in this case?

oh, ok, my bad as Edition sounds right in french (in fact Édition will be correct in french). Thanks.
The sentence has been fixed, replaced by something like:
Last {n} additions and modifications since {date}:

The etree gramplet screenshot also shows "XML: Number of records and relations: " what do you mean by "relations" as it also does not sound quite correct in English?

Ah, yes, relations in this context means "relations between records".
The gramplet will list primary objects, then will only provide the number of matching tags. e.g., events, persons, citations, notes, etc. The timestamp is also limited to primary object.

To go further, need to look at Database Difference Report or Import Merge Tool. I suppose that it should be possible to play with the diff.py (see
https://github.com/gramps-project/gramps/blob/maintenance/gramps60/gramps/gen/merge/diff.py ), but I do not want to go too far with this gramplet.

So, the additional records and relations is the total of relations between records without the top primary objects. As some records can be shared, we have more "relations" than records.
e.g., Person A <-> event 1 <-> Person B will generate at least 4 relations for 3 top primary objects (records).

Into the gramplet, the counters will list all matching "active sequences". A pseudo snapshot or skeleton of the Gramps XML file according to data detected.

It is like making a puzzle, you can group pieces by type (the sky, a tree, a house). All these groupes are related but you need to know the location (position or coordinates). So, the idea behind the word relation is to have a common word for illustrating the {locations, the type of records and the record itself}. Just a workaround for generating tuple (or json) on a flat raw Gramps XML file.

An other sample, with a photo after a marriage. You can see most (all?) people, you know the date and the place of the event, or the repository/source/citation/note. During this marriage event, there is multiple families, with many persons having different dates of birth and event places during their lifes. So, to make a large global timeline with multiple places, sources and date from a starting point like this marriage event, will be more than a challenge : this could be painfull. Now, with a basic model (format a marriage with only two persons), just list any minor variations (add parents, uncles and aunts), some families with or without children. The model will grow but you can still easily manage it. A basic set of lists, like counters on Primary objects. Now, the relations will be the connexions between all these sets of data. Could be secondary indices into a DB environment, but they are all very flexibles with many ways and "micro-reference" with sometimes simple switch or shared records "toghether on this marriage event" (eg., Persons into a Family, associations or Event with roles).

So, as records are all more and less related, I used the word "Relations". If need, you can use and replace by the word "inter-links", instead.

How to list them? Well, this should be "easy" via a simple print() statement, but does it make sense to generate a large list of more than 42 600 {tags, records, relations, paths/links} with example.gramps?

from xml.etree import ElementTree
tree = ElementTree.parse(Path(filename))
root = tree.getroot()
for one in root:
    for two in one.iter():
        print(two.tag, two.items())

Maybe if you can find the same type of global counter (total of tags and paths/links) on the active Family Tree, any minor difference between the content of a Gramps XML and the active database will be quickly displayed. As you can see with example.gramps, there is less than 10 000 records set as top Primary Objects, but there is more than 10 000 records missing from the total... I guess it is either related to shared objects or children records will be ignored once the parent tag has already matching. Anyway, the total counting will be the same whatever method (improved or not) for the files passed through this gramplet. A basic comparison stuff as POC or draft idea.

@GaryGriffin
Copy link
Member

Tried to run PR on gramps60 on Mac.

  1. Mac comes bundled with Python which does not include libxml, so cannot run lxml. Got the ErrorDialog as expected.

  2. There is no RUN button, so cannot execute it.
    In Dashboard:

Screenshot 2025-04-14 at 1 56 19 PM

Detached:
Screenshot 2025-04-14 at 1 57 08 PM

  1. Got warning msgs:
gramps60/plugins/lxml/etreeGramplet.py:115: SyntaxWarning: "is" with 'str' literal. Did you mean "=="?
  if os.name is 'nt':
gramps60/plugins/lxml/etreeGramplet.py:133: SyntaxWarning: "is" with 'str' literal. Did you mean "=="?
  if os.name is 'nt':
gramps60/plugins/lxml/etreeGramplet.py:221: SyntaxWarning: "is not" with 'str' literal. Did you mean "!="?
  if self.__file_name is not "":
gramps60/plugins/lxml/lxmlGramplet.py:148: SyntaxWarning: "is" with 'str' literal. Did you mean "=="?%

@romjeromealt
Copy link
Contributor Author

romjeromealt commented Apr 15, 2025

@GaryGriffin
Thanks!
Yes, I was not certain if "is not" was working there, as the use of "is not" or "!=" has only minor difference for very specific cases.

>>> () is ()
True
>>> 1 is 1
True
>>> (1,) == (1,)
True
>>> (1,) is (1,)
False
>>> a = (1,)
>>> b = a
>>> a is b
True

https://stackoverflow.com/questions/2209755/python-operation-vs-is-not

As it was working on Windows OS ('nt'), I still wonder why your version will complain. Anyway, you get the icons issue too. So, need to also add the check for 'mac' os. How do you get the SyntaxWarning and tips? Is it via the debug flag or your python version? I suppose that custom Gtk icons stock will be displayed under Gnome environment. So, it might also failed under KDE, xfce desktops (linux, BSD and so)...

A little bit strange as python 3.6 (and maybe python 3.11/3.12 for Windows AIO) can make these 'is/is not' tests. I will revert the refactoring and polish on these sections. Thanks!

`darwin` plateform and `nt` OS cannot use custom set of icons
This might also failed under KDE or xfce desktop manager under linux?
@GaryGriffin
Copy link
Member

GaryGriffin commented Apr 15, 2025

Uploaded the latest PR. When running etree on Mac, I no longer see warning messages (they appeared in the 1st instance of running previously). And I now see the Run button.

But I get an error:

Filename issue or white space character on filename path
No such file or directory: '/Users/gary/Library/Application Support/gramps/gramps60/plugins/xml/etree.xml

Note that the default Mac path for plugins and db is: ~/Library/Application Support/gramps.

Edit: Console error message:
gunzip: can't stat: Support/gramps/gramps60/plugins/lxml/etree.xml (Support/gramps/gramps60/plugins/lxml/etree.xml.gz): No such file or directory

@jralls
Copy link
Member

jralls commented Apr 16, 2025

There is in fact a whitespace character in the path. The fact that the code is warning about it suggests a defect: 21st century code should be able to handle whitespace and unicode codepoints in pathnames.

spacing on path remains on lxml only
@romjeromealt
Copy link
Contributor Author

romjeromealt commented Apr 16, 2025

Oh, happy to learn that Mas OS also has a preferred folder name (hierarchy path) with a whitespace character! 'Program Files', 'Google Drive', 'Application Support', etc.
Seriously, I am a little bit puzzled with this issue, as it is specific to lxml gramplet.

etree gramplet will handle whitespace on path or filename, without error.
And etree gramplet and lxml gramplet are sharing the same code for the UI stuff.

I think I was able to identify the cause //see #700 (comment) and next comments// :
shutil was not able to properly copy the file, so this will lead to a "lxml parsing and validating error". It seems that I need to check if the file exists once more on the sequence and call for a new copy somewhere. The fix should be easy, but there is many try/exception sections or too compact piece of code, making the identification of the right place for the fix a little bit more complicated, for now. Maybe need a hour to find a good fix.

@jralls
Copy link
Member

jralls commented Apr 16, 2025

IIRC shutil invokes the shell. Make sure that the path passed to it includes quotes in the string just like when writing a shell script.

part 2. will be added soon (I hope!)
Need to change the method as written on the fly is maybe not a good idea.
from gramps.gui.dialog import ErrorDialog
@romjeromealt romjeromealt marked this pull request as draft April 16, 2025 15:50
@romjeromealt
Copy link
Contributor Author

romjeromealt commented Apr 16, 2025

Well, I made some pieces of the code "upside down", but still got this strange issue.
It seems to occur during a second pass (not on the first part of the code). So, parsing cannot be complete but the content is validating! By design this was not expected, as validating should limit the parsing.

Maybe something in memory or buffer. A failure somewhere, which will generate an empty "test.xml" (secondary code or pass). Need more time for looking at refresh, clear, purge & co on read_xml() section.

$ gramps -d "lxml"
2025-04-16 23:19:37.017: INFO: lxmlGramplet.py: line 238: Space on filename or path
sh: 1: cannot open Download/a: No such file
2025-04-16 23:19:37.025: DEBUG: lxmlGramplet.py: line 318: .gramps/gramps52/plugins/lxml/test.xml
2025-04-16 23:19:38.556: INFO: lxmlGramplet.py: line 659: Correspond au schéma XSD.
.gramps/gramps52/plugins/lxml/test.xml:1: parser error : Document is empty

^
2025-04-16 23:19:38.567: DEBUG: lxmlGramplet.py: line 329: check validity for .gramps/gramps52/plugins/lxml/test.xml
2025-04-16 23:19:38.568: DEBUG: lxmlGramplet.py: line 342: xmllint (relaxng) : .gramps/gramps52/plugins/lxml/test.xml
.gramps/gramps52/plugins/lxml/test.xml:1: parser error : Document is empty

^
2025-04-16 23:19:39.708: DEBUG: lxmlGramplet.py: line 384: .gramps/gramps52/plugins/lxml/test.xml:1:1:FATAL:PARSER:ERR_DOCUMENT_EMPTY: Document is empty
.gramps/gramps52/plugins/lxml/test.xml:1:1:FATAL:PARSER:ERR_DOCUMENT_EMPTY: Document is empty
2025-04-16 23:19:39.708: DEBUG: lxmlGramplet.py: line 386: PARSER
2025-04-16 23:19:39.708: DEBUG: lxmlGramplet.py: line 387: ERR_DOCUMENT_EMPTY
2025-04-16 23:19:39.709: DEBUG: lxmlGramplet.py: line 388: .gramps/gramps52/plugins/lxml/test.xml

step by step, comments, limit cpu usage and Element in buffer (on the fly)
and handle, change, type, etc. attribute on primary objects
@romjeromealt
Copy link
Contributor Author

romjeromealt commented Apr 17, 2025

Make sure that the path passed to it includes quotes in the string just like when writing a shell script.

Previously, I forced use of Pathlib or quote module, everywhere.
I suppose I got it now...
LOG.debug(os.system(f'gunzip < {entry} > {filename}'))

This cannot explain why it is hidden via ElementTree built-in module (etree gramplet) and why it raises errors for lxml lib (lxml gramplet), but this could explain Gary's return for etree.xml as there is no logging on etree gramplet, only some print statements. So, this issue was hidden and as features are very basic on this gramplet, we do not need to look at XML copy during the run.

Edit: Console error message:
gunzip: can't stat: Support/gramps/gramps60/plugins/lxml/etree.xml (Support/gramps/gramps60/plugins/lxml/etree.xml.gz): No such file or directory

I guess that I will have fun to play with quoting, simple and multiple quote...
Cannot use format string with os.system()?

@romjeromealt
Copy link
Contributor Author

Oh, it means that I should at least backport the fix to gramps 5.2 branch and remove the warning about spacing and whitespace...

-os.system(f'gunzip < {entry} > {filename}')
+os.system(f'gunzip < "{entry}" > {filename}')

Maybe I will create a new local branch and a new PR as 24 commits sounds too much for a refactoring, polish and fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants