Skip to content

Commit cd987b1

Browse files
committed
clean up
1 parent a2b8b59 commit cd987b1

10 files changed

+462
-44
lines changed

.vscode/launch.json

+24
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
{
2+
// Use IntelliSense to learn about possible attributes.
3+
// Hover to view descriptions of existing attributes.
4+
// For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
5+
"version": "0.2.0",
6+
"configurations": [
7+
{
8+
"name": "Python Debugger: Current File with Arguments",
9+
"type": "debugpy",
10+
"request": "launch",
11+
"program": "${file}",
12+
"console": "integratedTerminal",
13+
"args": "${command:pickArgs}"
14+
},
15+
{
16+
"name": "Python Debugger: Current File with Arguments",
17+
"type": "debugpy",
18+
"request": "launch",
19+
"program": "'${workspaceFolder}'\\${file}",
20+
"console": "integratedTerminal",
21+
"args": "${command:pickArgs}"
22+
}
23+
]
24+
}

README.md

+208-2
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,208 @@
1-
# xml_viewer_editor
2-
xml parser with editor capability
1+
# XML file processing with Python lxml Module
2+
3+
This article were to explain about lxml general functionality and demonstrate how lxml can provide XML content parsing and reading efficiently with the aim to make programmer life easier. lxml consider as one of the most feature-rich and easy-to-use library for processing XML and HTML in the Python language. This article we are going to walk through come of the core feature lxml can provides. lxml package has a quite different way of representing documents as trees.
4+
In the DOM, trees are built out of nodes represented as Node instances.
5+
Some nodes are Element instances, representing whole elements as lists.
6+
7+
![img](/images/lxml_design.png)
8+
9+
Example XML sample file, [sample.xml](sample.xml).
10+
11+
12+
13+
At its most fundamental, XML schema file needs to be [parse][1] and process. We may utilize parse function to quickly convert an XML file into an ElementTree.
14+
15+
**General way to import lxml as etree, and assign xml file name/path as source**
16+
17+
```python
18+
from lxml import etree
19+
tree = etree.parse('sample.xml', parser=etree.XMLParser())
20+
```
21+
22+
23+
**Top Level Element**
24+
25+
```python
26+
print(tree.getroot())
27+
```
28+
Output:
29+
> <Element PurchaseOrders at 0x1fb8c81ed40>
30+
31+
**Element nodes for its element children.**
32+
33+
```python
34+
print(tree.getroot().getchildren())
35+
```
36+
Output:
37+
> [<Element PurchaseOrder at 0x1dae421eec0>, <Element PurchaseOrder at 0x1dae421f200>, <Element PurchaseOrder at 0x1dae421f2c0>]
38+
39+
**Attribute nodes for its attributes.**
40+
41+
```python
42+
for e in tree.getroot().getchildren():
43+
print(e.attrib)
44+
```
45+
Output:
46+
> {'PurchaseOrderNumber': '99504', 'OrderDate': '2001-10-20'}
47+
> {'PurchaseOrderNumber': '99505', 'OrderDate': '2001-10-22'}
48+
> {'PurchaseOrderNumber': '99503', 'OrderDate': '2001-10-22'}
49+
50+
**Text nodes for textual content.**
51+
52+
```python
53+
for e in tree.getroot().getchildren()[0]:
54+
print(e.text)
55+
```
56+
Output:
57+
> Please leave packages in shed by driveway.
58+
59+
## XML schema structure
60+
61+
Each Element has an assortment of child nodes of various types:
62+
63+
![img](/images/xml_element_explained.png)
64+
65+
Supported XML [schema][2] format can refer to below link:
66+
67+
## Serialise XML element objects as string type
68+
69+
Serialize an element to an encoded string representation of its XML tree element.
70+
71+
```python
72+
print(etree.tostring(tree.getroot().getchildren()[0]).decode("utf-8"))
73+
```
74+
Output:
75+
```
76+
<PurchaseOrder PurchaseOrderNumber="99504" OrderDate="2001-10-20">
77+
<Address Type="Shipping">
78+
<Name>Amy Adams</Name>
79+
<Street>123 Maple Street</Street>
80+
<City>Mill Valley</City>
81+
<State>CA</State>
82+
<Zip>10999</Zip>
83+
<Country>USA</Country>
84+
</Address>
85+
<Address Type="Billing">
86+
<Name>Chong Wei</Name>
87+
<Street>8 Oak Avenue</Street>
88+
<City>Old Town</City>
89+
<State>PA</State>
90+
<Zip>95819</Zip>
91+
<Country>USA</Country>
92+
</Address>
93+
<DeliveryNotes>Please leave packages in shed by driveway.</DeliveryNotes>
94+
<Items>
95+
<Item PartNumber="872-AC">
96+
<ProductName>Lawnmower</ProductName>
97+
<Quantity>1</Quantity>
98+
<USPrice>148.95</USPrice>
99+
<Comment>Confirm this is electric</Comment>
100+
</Item>
101+
<Item PartNumber="926-AD">
102+
<ProductName>Dell Monitor</ProductName>
103+
<Quantity>2</Quantity>
104+
<USPrice>39.98</USPrice>
105+
<ShipDate>1999-05-21</ShipDate>
106+
</Item>
107+
</Items>
108+
</PurchaseOrder>
109+
```
110+
111+
## XML Content search
112+
113+
lxml provides multiple function to locate ElemenTree (ET) [element path][3]. For this particular demonstration findall seem to be a good fit to locate matching keyword within which child element, and return its index number.
114+
115+
Set search element path
116+
117+
```python
118+
roottree = tree.getroot()
119+
subelement = roottree[0].tag # PurchaseOrder
120+
findalltree = tree.findall(subelement)
121+
122+
print(findalltree)
123+
```
124+
125+
> [<Element PurchaseOrder at 0x16675a1f040>, <Element PurchaseOrder at 0x16675a1f380>, <Element PurchaseOrder at 0x16675a1f440>]
126+
127+
Setup search argument and enumeration expression and condition.
128+
For this particular use case, the objective were to identify interested information reside within which sub-element object. For demonstration purposes, "PartNumber" used as unique keyword to identify sub-element object index id, and sub-element objects.
129+
130+
```python
131+
keyword = 'PartNumber="456-NF"'
132+
for h, i in enumerate(findalltree):
133+
134+
if keyword in etree.tostring(i).decode("utf-8"):
135+
136+
print(f'index: {h} \n{etree.tostring(i, pretty_print=True).decode("utf-8")}')
137+
```
138+
```
139+
index: 1
140+
<PurchaseOrder PurchaseOrderNumber="99505" OrderDate="2001-10-22">
141+
<Address Type="Shipping">
142+
<Name>anna kendrick</Name>
143+
<Street>456 Main Street</Street>
144+
<City>Buffalo</City>
145+
<State>NY</State>
146+
<Zip>98112</Zip>
147+
<Country>USA</Country>
148+
</Address>
149+
<Address Type="Billing">
150+
<Name>anna kendrick</Name>
151+
<Street>456 Main Street</Street>
152+
<City>Buffalo</City>
153+
<State>NY</State>
154+
<Zip>98112</Zip>
155+
<Country>USA</Country>
156+
</Address>
157+
<DeliveryNotes>Please notify me before shipping.</DeliveryNotes>
158+
<Items>
159+
<Item PartNumber="456-NF">
160+
<ProductName>Power Supply</ProductName>
161+
<Quantity>1</Quantity>
162+
<USPrice>45.99</USPrice>
163+
</Item>
164+
</Items>
165+
</PurchaseOrder>
166+
```
167+
168+
## Element removal action:
169+
170+
The final product obtained allow us to work on the interested sub-element as we wish. For example, we may use the index number to remove unwanted element.
171+
172+
```python
173+
print(roottree.getchildren())
174+
175+
roottree.remove(findalltree[1])
176+
177+
print(roottree.getchildren())
178+
```
179+
180+
## The Result:
181+
182+
Available sub-elements:
183+
> [<Element PurchaseOrder at 0x23fee01f140>, <Element PurchaseOrder at 0x23fee01f000>, <Element PurchaseOrder at 0x23fee01f3c0>
184+
185+
Reduced sub-elements after remove sub-element index 1:
186+
> [<Element PurchaseOrder at 0x23fee01f140>, <Element PurchaseOrder at 0x23fee01f3c0>]
187+
188+
189+
190+
[1]: https://lxml.de/apidoc/lxml.etree.html#lxml.etree.parse
191+
[2]: https://www.w3schools.com/XML/schema_schema.asp
192+
[3]: https://lxml.de/tutorial.html#elementpath
193+
194+
---
195+
196+
## Activate virtual environment
197+
198+
```
199+
python -m venv .venv
200+
201+
Windows:
202+
.\.venv\Scripts\activate
203+
204+
Linux & Unix:
205+
source .venv/bin/activate
206+
207+
pip install -r requirements.txt
208+
```

images/lxml_design.png

164 KB
Loading

images/xml_element_explained.png

83.7 KB
Loading

main.py

-10
This file was deleted.

myfirmware.xml

+23
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
2+
<rss xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0">
3+
<channel>
4+
<title>Fortinet Firmware Updates</title>
5+
<link>http://support.fortinet.com/</link>
6+
<description><![CDATA[Updates for the latest firmware and patches for Fortinet products.]]></description>
7+
<lastBuildDate>Wed, 3 Apr 2024 19:06:28 GMT</lastBuildDate>
8+
<item xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
9+
<title>FortiSOAR 7.5.0</title>
10+
<link>https://support.fortinet.com/EndUser/FirmwareImages.aspx</link>
11+
<description>&lt;p&gt;FortiSOAR 7.5.0 B4015 and release notes are available for download from the Support site : &lt;a href="https://support.fortinet.com/EndUser/FirmwareImages.aspx"&gt;https://support.fortinet.com&lt;/a&gt;&lt;/p&gt;</description>
12+
<pubDate>Wed, 3 Apr 2024 19:06:28 GMT</pubDate>
13+
<guid isPermaLink="false">Wed, 3 Apr 2024 19:06:28 GMT</guid>
14+
</item>
15+
<item xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
16+
<title>FortiADC 7.4.3</title>
17+
<link>https://support.fortinet.com/EndUser/FirmwareImages.aspx</link>
18+
<description>&lt;p&gt;FortiADC 7.4.3 B0336 and release notes are available for download from the Support site : &lt;a href="https://support.fortinet.com/EndUser/FirmwareImages.aspx"&gt;https://support.fortinet.com&lt;/a&gt;&lt;/p&gt;</description>
19+
<pubDate>Tue, 2 Apr 2024 21:18:30 GMT</pubDate>
20+
<guid isPermaLink="false">Tue, 2 Apr 2024 21:18:30 GMT</guid>
21+
</item>
22+
</channel>
23+
</rss>

requirements.txt

28 Bytes
Binary file not shown.

sample.xml

+92-15
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,93 @@
1-
<?xml version="1.0" encoding="utf-8"?>
1+
<?xml version='1.0' encoding='UTF-8'?>
22

3-
<foods>
4-
<food category="Guest" name="Ali">
5-
<name>Nasi Lemak</name>
6-
<description>consists of fragrant rice cooked in coconut milk and pandan leaf.</description>
7-
<location>Kuala Lumpur/Malaysia</location>
8-
<isbest>true</isbest>
9-
</food>
10-
<food category="" name="aahmadbasri">
11-
<name>Mee Laksa</name>
12-
<description>consists of various types of noodles, most commonly thick rice noodles, with toppings such as chicken, prawn or fish.</description>
13-
<location>Kuala Lumpur/Malaysia</location>
14-
<isbest>true</isbest>
15-
</food>
16-
</foods>
3+
<PurchaseOrders>
4+
<PurchaseOrder PurchaseOrderNumber="99504" OrderDate="2001-10-20">
5+
<Address Type="Shipping">
6+
<Name>Amy Adams</Name>
7+
<Street>123 Maple Street</Street>
8+
<City>Mill Valley</City>
9+
<State>CA</State>
10+
<Zip>10999</Zip>
11+
<Country>USA</Country>
12+
</Address>
13+
<Address Type="Billing">
14+
<Name>Chong Wei</Name>
15+
<Street>8 Oak Avenue</Street>
16+
<City>Old Town</City>
17+
<State>PA</State>
18+
<Zip>95819</Zip>
19+
<Country>USA</Country>
20+
</Address>
21+
<DeliveryNotes>Please leave packages in shed by driveway.</DeliveryNotes>
22+
<Items>
23+
<Item PartNumber="872-AC">
24+
<ProductName>Lawnmower</ProductName>
25+
<Quantity>1</Quantity>
26+
<USPrice>148.95</USPrice>
27+
<Comment>Confirm this is electric</Comment>
28+
</Item>
29+
<Item PartNumber="926-AD">
30+
<ProductName>Dell Monitor</ProductName>
31+
<Quantity>2</Quantity>
32+
<USPrice>39.98</USPrice>
33+
<ShipDate>1999-05-21</ShipDate>
34+
</Item>
35+
</Items>
36+
</PurchaseOrder>
37+
<PurchaseOrder PurchaseOrderNumber="99505" OrderDate="2001-10-22">
38+
<Address Type="Shipping">
39+
<Name>anna kendrick</Name>
40+
<Street>456 Main Street</Street>
41+
<City>Buffalo</City>
42+
<State>NY</State>
43+
<Zip>98112</Zip>
44+
<Country>USA</Country>
45+
</Address>
46+
<Address Type="Billing">
47+
<Name>anna kendrick</Name>
48+
<Street>456 Main Street</Street>
49+
<City>Buffalo</City>
50+
<State>NY</State>
51+
<Zip>98112</Zip>
52+
<Country>USA</Country>
53+
</Address>
54+
<DeliveryNotes>Please notify me before shipping.</DeliveryNotes>
55+
<Items>
56+
<Item PartNumber="456-NF">
57+
<ProductName>Power Supply</ProductName>
58+
<Quantity>1</Quantity>
59+
<USPrice>45.99</USPrice>
60+
</Item>
61+
</Items>
62+
</PurchaseOrder>
63+
<PurchaseOrder PurchaseOrderNumber="99503" OrderDate="2001-10-22">
64+
<Address Type="Shipping">
65+
<Name>jessica alba</Name>
66+
<Street>4055 Madison Ave</Street>
67+
<City>Seattle</City>
68+
<State>WA</State>
69+
<Zip>98112</Zip>
70+
<Country>USA</Country>
71+
</Address>
72+
<Address Type="Billing">
73+
<Name>jessica alba</Name>
74+
<Street>4055 Madison Ave</Street>
75+
<City>Buffalo</City>
76+
<State>NY</State>
77+
<Zip>98112</Zip>
78+
<Country>USA</Country>
79+
</Address>
80+
<Items>
81+
<Item PartNumber="898-AR">
82+
<ProductName>Computer Keyboard</ProductName>
83+
<Quantity>1</Quantity>
84+
<USPrice>29.99</USPrice>
85+
</Item>
86+
<Item PartNumber="898-AK">
87+
<ProductName>Wireless Mouse</ProductName>
88+
<Quantity>1</Quantity>
89+
<USPrice>14.99</USPrice>
90+
</Item>
91+
</Items>
92+
</PurchaseOrder>
93+
</PurchaseOrders>

0 commit comments

Comments
 (0)