Polish&refactor code for lxml and etree gramplets #707

romjeromealt · 2025-04-11T18:49:50Z

giotodibondone · 2025-04-11T23:33:12Z

etree works 👍

lxml has an error ( parsing error ) probably related to that missing library on the windows aio ( xmllint ) ?

Apart from that maybe an English phrase can be improved for the etree gramplet that shows "XML: Last 5 editions since 09 March 2018; were on/at:" by the word "editions" I believe you mean additions which means to add something/change etc as editions is related to books , but I understand what you mean "Edit" + adding :) and the display output aligned a bit better?

romjeromealt · 2025-04-12T08:03:59Z

@giotodibondone
Thanks!
I guess that the screenshot for etree gramplet is the old one? There was a tweak improvement since your previous post with this screenshot. Pseudo-count has 6 characters on last version. So, the separator "|" between data from XML and your Family Tree should be on the middle of the lines (or close to the same location after x number of characters).

... you can also run it without Family Tree loaded...

"XML: Last 5 editions since 09 March 2018; were on/at:"

Yes, there is also maybe a possible improvement as "on/at" is not very elegant.
Did you try to change the default number (5) to greater or smaller values? There is currently only one option on the config dialog for this gramplet. But, as python 3.8 now provides a XPATH 1.0 support for ElementTree built-in module, it should be possible to play with an XPATH selector-like or pretty print feature. Custom XPATH elements and attributes are listed on separated files into this addons-sources repository. A quick look at github repositories give an overview of advanced work (e.g., https://elementpath.readthedocs.io/en/latest/introduction.html )

I supposed that there was a fallback (skip if xmllint tools failed), but there is no real failure as your system was able to print something. I guess it is a Windows OS message or a sandbox stuff on python or AIO environment. There is an alternative by using lxml validation for DTD and rng. This might generate the same type of issues as for XSD. Also, the idea was to test multiple ways (xmllint is an alternative to validations via lxml only), but related to libxml2 and libxslt (the C stuff).

About additions, this might be also confusing because modifications are included too. So, the idea behind the sentence is rather : "Edit" + (adding or modify) :)
Edition was the simplest word for me, cause of actions via Editors. Once timestamp on records is recent (close to Today, 2025-04-12 : 1744380000), we can easily compute and play with numbers. We could also display the period (one day ago, one week ago, 6 months ago, etc.), provide a timeline or ASCII art graph, etc. ;-)

PQYPLZXHGF · 2025-04-13T00:24:30Z

Haven't tried the addon but thank you for it 👍

Edition was the simplest word for me, cause of actions via Editors

Edition is not really a word used in English for that purpose it just looks and reads wrong to an English speaker, I suggest as mentioned on stackoverflow to use the words like editing/edit(s) or revising/revision(s) so maybe just Edits in this case?

The etree gramplet screenshot also shows "XML: Number of records and relations: " what do you mean by "relations" as it also does not sound quite correct in English?

romjeromealt · 2025-04-13T09:47:53Z

Edition is not really a word used in English for that purpose it just looks and reads wrong to an English speaker, I suggest as mentioned on stackoverflow to use the words like editing/edit(s) or revising/revision(s) so maybe just Edits in this case?

oh, ok, my bad as Edition sounds right in french (in fact Édition will be correct in french). Thanks.
The sentence has been fixed, replaced by something like:
Last {n} additions and modifications since {date}:

The etree gramplet screenshot also shows "XML: Number of records and relations: " what do you mean by "relations" as it also does not sound quite correct in English?

Ah, yes, relations in this context means "relations between records".
The gramplet will list primary objects, then will only provide the number of matching tags. e.g., events, persons, citations, notes, etc. The timestamp is also limited to primary object.

To go further, need to look at Database Difference Report or Import Merge Tool. I suppose that it should be possible to play with the diff.py (see
https://github.com/gramps-project/gramps/blob/maintenance/gramps60/gramps/gen/merge/diff.py ), but I do not want to go too far with this gramplet.

So, the additional records and relations is the total of relations between records without the top primary objects. As some records can be shared, we have more "relations" than records.
e.g., Person A <-> event 1 <-> Person B will generate at least 4 relations for 3 top primary objects (records).

Into the gramplet, the counters will list all matching "active sequences". A pseudo snapshot or skeleton of the Gramps XML file according to data detected.

It is like making a puzzle, you can group pieces by type (the sky, a tree, a house). All these groupes are related but you need to know the location (position or coordinates). So, the idea behind the word relation is to have a common word for illustrating the {locations, the type of records and the record itself}. Just a workaround for generating tuple (or json) on a flat raw Gramps XML file.

An other sample, with a photo after a marriage. You can see most (all?) people, you know the date and the place of the event, or the repository/source/citation/note. During this marriage event, there is multiple families, with many persons having different dates of birth and event places during their lifes. So, to make a large global timeline with multiple places, sources and date from a starting point like this marriage event, will be more than a challenge : this could be painfull. Now, with a basic model (format a marriage with only two persons), just list any minor variations (add parents, uncles and aunts), some families with or without children. The model will grow but you can still easily manage it. A basic set of lists, like counters on Primary objects. Now, the relations will be the connexions between all these sets of data. Could be secondary indices into a DB environment, but they are all very flexibles with many ways and "micro-reference" with sometimes simple switch or shared records "toghether on this marriage event" (eg., Persons into a Family, associations or Event with roles).

So, as records are all more and less related, I used the word "Relations". If need, you can use and replace by the word "inter-links", instead.

How to list them? Well, this should be "easy" via a simple print() statement, but does it make sense to generate a large list of more than 42 600 {tags, records, relations, paths/links} with example.gramps?

from xml.etree import ElementTree
tree = ElementTree.parse(Path(filename))
root = tree.getroot()
for one in root:
    for two in one.iter():
        print(two.tag, two.items())

Maybe if you can find the same type of global counter (total of tags and paths/links) on the active Family Tree, any minor difference between the content of a Gramps XML and the active database will be quickly displayed. As you can see with example.gramps, there is less than 10 000 records set as top Primary Objects, but there is more than 10 000 records missing from the total... I guess it is either related to shared objects or children records will be ignored once the parent tag has already matching. Anyway, the total counting will be the same whatever method (improved or not) for the files passed through this gramplet. A basic comparison stuff as POC or draft idea.

GaryGriffin · 2025-04-14T20:59:38Z

Tried to run PR on gramps60 on Mac.

Mac comes bundled with Python which does not include libxml, so cannot run lxml. Got the ErrorDialog as expected.
There is no RUN button, so cannot execute it.
In Dashboard:

Detached:

Got warning msgs:

gramps60/plugins/lxml/etreeGramplet.py:115: SyntaxWarning: "is" with 'str' literal. Did you mean "=="?
  if os.name is 'nt':
gramps60/plugins/lxml/etreeGramplet.py:133: SyntaxWarning: "is" with 'str' literal. Did you mean "=="?
  if os.name is 'nt':
gramps60/plugins/lxml/etreeGramplet.py:221: SyntaxWarning: "is not" with 'str' literal. Did you mean "!="?
  if self.__file_name is not "":
gramps60/plugins/lxml/lxmlGramplet.py:148: SyntaxWarning: "is" with 'str' literal. Did you mean "=="?%

romjeromealt · 2025-04-15T12:28:29Z

@GaryGriffin
Thanks!
Yes, I was not certain if "is not" was working there, as the use of "is not" or "!=" has only minor difference for very specific cases.

>>> () is ()
True
>>> 1 is 1
True
>>> (1,) == (1,)
True
>>> (1,) is (1,)
False
>>> a = (1,)
>>> b = a
>>> a is b
True

https://stackoverflow.com/questions/2209755/python-operation-vs-is-not

As it was working on Windows OS ('nt'), I still wonder why your version will complain. Anyway, you get the icons issue too. So, need to also add the check for 'mac' os. How do you get the SyntaxWarning and tips? Is it via the debug flag or your python version? I suppose that custom Gtk icons stock will be displayed under Gnome environment. So, it might also failed under KDE, xfce desktops (linux, BSD and so)...

A little bit strange as python 3.6 (and maybe python 3.11/3.12 for Windows AIO) can make these 'is/is not' tests. I will revert the refactoring and polish on these sections. Thanks!

`darwin` plateform and `nt` OS cannot use custom set of icons This might also failed under KDE or xfce desktop manager under linux?

GaryGriffin · 2025-04-15T15:30:05Z

Uploaded the latest PR. When running etree on Mac, I no longer see warning messages (they appeared in the 1st instance of running previously). And I now see the Run button.

But I get an error:

Filename issue or white space character on filename path
No such file or directory: '/Users/gary/Library/Application Support/gramps/gramps60/plugins/xml/etree.xml

Note that the default Mac path for plugins and db is: ~/Library/Application Support/gramps.

Edit: Console error message:
gunzip: can't stat: Support/gramps/gramps60/plugins/lxml/etree.xml (Support/gramps/gramps60/plugins/lxml/etree.xml.gz): No such file or directory

jralls · 2025-04-16T00:33:00Z

There is in fact a whitespace character in the path. The fact that the code is warning about it suggests a defect: 21st century code should be able to handle whitespace and unicode codepoints in pathnames.

spacing on path remains on lxml only

romjeromealt · 2025-04-16T13:23:36Z

Oh, happy to learn that Mas OS also has a preferred folder name (hierarchy path) with a whitespace character! 'Program Files', 'Google Drive', 'Application Support', etc.
Seriously, I am a little bit puzzled with this issue, as it is specific to lxml gramplet.

etree gramplet will handle whitespace on path or filename, without error.
And etree gramplet and lxml gramplet are sharing the same code for the UI stuff.

I think I was able to identify the cause //see #700 (comment) and next comments// :
shutil was not able to properly copy the file, so this will lead to a "lxml parsing and validating error". It seems that I need to check if the file exists once more on the sequence and call for a new copy somewhere. The fix should be easy, but there is many try/exception sections or too compact piece of code, making the identification of the right place for the fix a little bit more complicated, for now. Maybe need a hour to find a good fix.

jralls · 2025-04-16T14:16:25Z

IIRC shutil invokes the shell. Make sure that the path passed to it includes quotes in the string just like when writing a shell script.

part 2. will be added soon (I hope!)

Need to change the method as written on the fly is maybe not a good idea.

from gramps.gui.dialog import ErrorDialog

need to rewrite after a better review

still need to polish, but errors should be somewhere else...

romjeromealt · 2025-04-16T21:29:01Z

Well, I made some pieces of the code "upside down", but still got this strange issue.
It seems to occur during a second pass (not on the first part of the code). So, parsing cannot be complete but the content is validating! By design this was not expected, as validating should limit the parsing.

Maybe something in memory or buffer. A failure somewhere, which will generate an empty "test.xml" (secondary code or pass). Need more time for looking at refresh, clear, purge & co on read_xml() section.

$ gramps -d "lxml"
2025-04-16 23:19:37.017: INFO: lxmlGramplet.py: line 238: Space on filename or path
sh: 1: cannot open Download/a: No such file
2025-04-16 23:19:37.025: DEBUG: lxmlGramplet.py: line 318: .gramps/gramps52/plugins/lxml/test.xml
2025-04-16 23:19:38.556: INFO: lxmlGramplet.py: line 659: Correspond au schéma XSD.
.gramps/gramps52/plugins/lxml/test.xml:1: parser error : Document is empty

^
2025-04-16 23:19:38.567: DEBUG: lxmlGramplet.py: line 329: check validity for .gramps/gramps52/plugins/lxml/test.xml
2025-04-16 23:19:38.568: DEBUG: lxmlGramplet.py: line 342: xmllint (relaxng) : .gramps/gramps52/plugins/lxml/test.xml
.gramps/gramps52/plugins/lxml/test.xml:1: parser error : Document is empty

^
2025-04-16 23:19:39.708: DEBUG: lxmlGramplet.py: line 384: .gramps/gramps52/plugins/lxml/test.xml:1:1:FATAL:PARSER:ERR_DOCUMENT_EMPTY: Document is empty
.gramps/gramps52/plugins/lxml/test.xml:1:1:FATAL:PARSER:ERR_DOCUMENT_EMPTY: Document is empty
2025-04-16 23:19:39.708: DEBUG: lxmlGramplet.py: line 386: PARSER
2025-04-16 23:19:39.708: DEBUG: lxmlGramplet.py: line 387: ERR_DOCUMENT_EMPTY
2025-04-16 23:19:39.709: DEBUG: lxmlGramplet.py: line 388: .gramps/gramps52/plugins/lxml/test.xml

step by step, comments, limit cpu usage and Element in buffer (on the fly)

and handle, change, type, etc. attribute on primary objects

romjeromealt · 2025-04-17T09:15:39Z

Make sure that the path passed to it includes quotes in the string just like when writing a shell script.

Previously, I forced use of Pathlib or quote module, everywhere.
I suppose I got it now...
LOG.debug(os.system(f'gunzip < {entry} > {filename}'))

This cannot explain why it is hidden via ElementTree built-in module (etree gramplet) and why it raises errors for lxml lib (lxml gramplet), but this could explain Gary's return for etree.xml as there is no logging on etree gramplet, only some print statements. So, this issue was hidden and as features are very basic on this gramplet, we do not need to look at XML copy during the run.

Edit: Console error message:
gunzip: can't stat: Support/gramps/gramps60/plugins/lxml/etree.xml (Support/gramps/gramps60/plugins/lxml/etree.xml.gz): No such file or directory

I guess that I will have fun to play with quoting, simple and multiple quote...
Cannot use format string with os.system()?

Thank you all!

romjeromealt · 2025-04-17T14:04:35Z

Oh, it means that I should at least backport the fix to gramps 5.2 branch and remove the warning about spacing and whitespace...

-os.system(f'gunzip < {entry} > {filename}')
+os.system(f'gunzip < "{entry}" > {filename}')

Maybe I will create a new local branch and a new PR as 24 commits sounds too much for a refactoring, polish and fix.

not a very elegant hack

romjeromealt added 2 commits April 11, 2025 20:45

Polish&refactor code for lxml and etree gramplets

b656aba

see also gramps-project#700

Fix branch version

9aa7792

romjeromealt marked this pull request as ready for review April 11, 2025 18:50

fix wording on a sentence and date on copyright

4bda5d1

tweak improvement as $HOME was set twice

b1410dd

romjeromealt added 2 commits April 15, 2025 14:38

Fix "is" and "is not" as cannot test all config

6b88dd0

`darwin` plateform and `nt` OS cannot use custom set of icons This might also failed under KDE or xfce desktop manager under linux?

debug and print statement for desktop set of icons

dcb0922

More robust debug statement and exceptions

3240a47

spacing on path remains on lxml only

romjeromealt added 3 commits April 16, 2025 16:20

Fix write_back_xml (part 1.)

0943e8c

part 2. will be added soon (I hope!)

Fix write_back_xml (part 2.)

39bd024

Need to change the method as written on the fly is maybe not a good idea.

Fix GtkDialog mapped without a transient parent

2d54711

from gramps.gui.dialog import ErrorDialog

romjeromealt marked this pull request as draft April 16, 2025 15:50

romjeromealt added 7 commits April 16, 2025 17:55

tweak improv. for debugging the whitespace

a33478e

Force 'test.xml' file creation before copy

1ae1f1d

review the close_file method

87b64d7

work in progress, still need to complete writeback

04f08c5

disable write_back_xml

f91b18d

need to rewrite after a better review

Re-enable write_xml_back

7a261af

still need to polish, but errors should be somewhere else...

remaining print statement

5c27510

romjeromealt added 5 commits April 17, 2025 09:41

start to rewrite write_back_xml

82bb0b1

also write back XML content

10e02f9

more pythonic merge of content

7fbc0a0

write_back_xml() should be more clear now

7b0ba02

step by step, comments, limit cpu usage and Element in buffer (on the fly)

DTD excepts a gender on person

2ba4e88

and handle, change, type, etc. attribute on primary objects

romjeromealt added 2 commits April 17, 2025 11:22

Fix quoting on gunzip command

98aa842

Thank you all!

was the wrong line...

0346c47

romjeromealt added 2 commits April 18, 2025 15:48

Fix parent dialogs

29bda73

Fix a mistake during merging on write_back_xml()

7d43123

not a very elegant hack

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Polish&refactor code for lxml and etree gramplets #707

Polish&refactor code for lxml and etree gramplets #707

romjeromealt commented Apr 11, 2025

giotodibondone commented Apr 11, 2025 •

edited

Loading

romjeromealt commented Apr 12, 2025 •

edited

Loading

PQYPLZXHGF commented Apr 13, 2025

romjeromealt commented Apr 13, 2025 •

edited

Loading

GaryGriffin commented Apr 14, 2025

romjeromealt commented Apr 15, 2025 •

edited

Loading

GaryGriffin commented Apr 15, 2025 •

edited

Loading

jralls commented Apr 16, 2025

romjeromealt commented Apr 16, 2025 •

edited

Loading

jralls commented Apr 16, 2025

romjeromealt commented Apr 16, 2025 •

edited

Loading

romjeromealt commented Apr 17, 2025 •

edited

Loading

romjeromealt commented Apr 17, 2025

Polish&refactor code for lxml and etree gramplets #707

Are you sure you want to change the base?

Polish&refactor code for lxml and etree gramplets #707

Conversation

romjeromealt commented Apr 11, 2025

giotodibondone commented Apr 11, 2025 • edited Loading

romjeromealt commented Apr 12, 2025 • edited Loading

PQYPLZXHGF commented Apr 13, 2025

romjeromealt commented Apr 13, 2025 • edited Loading

GaryGriffin commented Apr 14, 2025

romjeromealt commented Apr 15, 2025 • edited Loading

GaryGriffin commented Apr 15, 2025 • edited Loading

jralls commented Apr 16, 2025

romjeromealt commented Apr 16, 2025 • edited Loading

jralls commented Apr 16, 2025

romjeromealt commented Apr 16, 2025 • edited Loading

romjeromealt commented Apr 17, 2025 • edited Loading

romjeromealt commented Apr 17, 2025

giotodibondone commented Apr 11, 2025 •

edited

Loading

romjeromealt commented Apr 12, 2025 •

edited

Loading

romjeromealt commented Apr 13, 2025 •

edited

Loading

romjeromealt commented Apr 15, 2025 •

edited

Loading

GaryGriffin commented Apr 15, 2025 •

edited

Loading

romjeromealt commented Apr 16, 2025 •

edited

Loading

romjeromealt commented Apr 16, 2025 •

edited

Loading

romjeromealt commented Apr 17, 2025 •

edited

Loading