Skip to content

Commit fa19407

Browse files
committed
docs: Add README.md and appendix such as sample CSV files
1 parent 3ed7d27 commit fa19407

File tree

4 files changed

+242
-1
lines changed

4 files changed

+242
-1
lines changed

README.md

+232-1
Original file line numberDiff line numberDiff line change
@@ -1 +1,232 @@
1-
# csv-diff-python3
1+
2+
# csv-diff-python3
3+
4+
[![Python Version](https://img.shields.io/badge/Python-3.6%20%7C%203.7%20%7C%203.8%20%7C%203.9-blue)](README.md/#herb-requirements)
5+
[![testing](https://github.com/blue-monk/csv-diff-python3/actions/workflows/testing.yml/badge.svg)](https://github.com/blue-monk/csv-diff-python3/actions/workflows/testing.yml)
6+
[![coverage](https://github.com/blue-monk/csv-diff-python3/blob/gh-pages/coverage.svg)](https://blue-monk.github.io/csv-diff-python3/)
7+
[![License](https://img.shields.io/github/license/blue-monk/csv-diff-python3)](LICENSE)
8+
9+
10+
## :herb: Overview
11+
12+
A simple command-line tool to see the difference between two CSV files.
13+
14+
This tool reports in the following style, and you can choose how to report.
15+
16+
1. Report the number of differences and line numbers
17+
2. Report diff marks along with the contents of each CSV line
18+
* You can choose the following report styles
19+
* Horizontal (Side-by-side) display style
20+
* Vertical display style
21+
* You can choose to report only the lines with differences or all lines
22+
23+
24+
---
25+
:palm_tree: DEMO
26+
27+
![DEMO](appendix/csv-diff-animation.gif)
28+
29+
---
30+
31+
32+
## :herb: Table of Contents
33+
34+
* [**Why csv-diff?**](#herb-why-csv-diff)
35+
* [**Feature**](#herb-features)
36+
* [**Requirements**](#herb-requirements)
37+
* [Runtime](#runtime)
38+
* [CSV files](#csv-files)
39+
* [**Installation**](#herb-installation)
40+
* [With pip](#with-pip)
41+
* [Manual Installation](#manual-installation)
42+
* [**Run**](#herb-run)
43+
* [If installed with pip](#if-installed-with-pip)
44+
* [If installed manually](#if-installed-manually)
45+
* [**How to use**](#herb-how-to-use)
46+
* [Get Help](#get-help)
47+
* [One example](#one-example)
48+
* [**Notices**](#herb-notices)
49+
* [**Known Issues**](#herb-known-issues)
50+
* [**Contributing**](#herb-contributing)
51+
* [Reporting Bugs / Feature Requests](#reporting-bugs--feature-requests)
52+
* [**License**](#herb-license)
53+
54+
55+
## :herb: Why csv-diff?
56+
57+
The `diff` command that compares files is unaware of key columns (like primary keys in a database).
58+
Therefore, it may give undesired results in detecting differences in CSV files that have key columns.
59+
60+
For example, consider comparing the contents of tables in two databases that are inaccessible to each other.
61+
One way is to output each database's data as a CSV file and compare it.
62+
In this case, the `diff` command does not pay attention to the key columns, so lines with different keys may be compared.
63+
It is not possible to make an accurate judgment of the difference with the key in mind.
64+
65+
This tool, on the other hand, recognizes key columns and detects differences.
66+
Specify the key columns as an argument at the time of execution. You can get the comparison result you want.
67+
68+
69+
## :herb: Features
70+
71+
* CSV delimiter, line feed character, presence/absence of header, etc. are automatically determined (can be specified)
72+
* Make a comparison after matching with the key columns
73+
* You can specify columns that are not compared
74+
* Differences can be displayed side-by-side (more suitable when the number of columns is small)
75+
* Differences can be displayed in vertical order (more suitable when the number of columns is large)
76+
* Differences are indicated by the following marks, which we call DIFF-MARK
77+
* `!`: There is a difference
78+
* `<`: Exists only on the left side
79+
* `>`: Exists only on the right side
80+
* It is also possible to display only the number of differences and the line number with the difference
81+
* It is possible to compare one file with commas and one file with tabs
82+
* Low memory consumption
83+
* Only Python standard modules are used and provided as a single file, so it is easy to install even on an isolated environment
84+
85+
86+
## :herb: Requirements
87+
88+
### Runtime
89+
* Python3.6 or later
90+
91+
### CSV files
92+
* Must be sorted by key columns
93+
94+
95+
## :herb: Installation
96+
97+
### With pip
98+
99+
```sh
100+
pip install git+https://github.com/blue-monk/csv-diff-python3
101+
```
102+
It may be safer to install it on a virtual environment created with venv.
103+
104+
### Manual installation
105+
106+
Place `csvdiff.py` in any directory on the machine where Python 3 is installed.
107+
It will be easier to use if you place it in a directory defined on PATH.
108+
109+
## :herb: Run
110+
111+
### If installed with pip
112+
113+
```sh
114+
$ csvdiff3 -h
115+
```
116+
117+
### If installed manually
118+
119+
```sh
120+
$ python csvdiff.py -h
121+
```
122+
or
123+
```shell
124+
$ chmod +x csvdiff.py
125+
$ ./csvdiff.py -h
126+
```
127+
128+
## :herb: How to use
129+
130+
See the [Wiki](https://github.com/blue-monk/csv-diff-python3/wiki) for more details.
131+
* [Wiki/Command](https://github.com/blue-monk/csv-diff-python3/wiki/Command)
132+
* [Wiki/How to use](https://github.com/blue-monk/csv-diff-python3/wiki/How-to-use)
133+
134+
### Get help
135+
```sh
136+
$ ./csvdiff.py -h
137+
```
138+
139+
### One example
140+
141+
Here is one example with the following sample data in `appendix/csv_samples/`.
142+
See the [Wiki/How to use](https://github.com/blue-monk/csv-diff-python3/wiki/How-to-use) for more details.
143+
144+
#### Sample data
145+
146+
Suppose the keys are the 0th column and the 2nd column.
147+
148+
* sample_lhs.csv
149+
```csv
150+
head1, head2, head3, head4, head5
151+
key1-2, value1-2, key2-2, value2-2, 20201224T035908
152+
key1-3, value1-3, key2-3, value2-3, 20201224T180527
153+
key1-4, value1-4, key2-4, value2-4, 20201225T104851
154+
key1-5, value1-5, key2-5, value2-5, 20201225T142142
155+
```
156+
157+
* sample_rhs.csv
158+
```csv
159+
head1, head2, head3, head4, head5
160+
key1-1, value1-1, key2-1, value2-1, 20210108T142358
161+
key1-2, value1-3, key2-2, value2-z, 20210108T174216
162+
key1-4, value1-4, key2-4, value2-4, 20210109T090245
163+
key1-5, value1-v, key2-5, value2-5, 20210109T111231
164+
```
165+
166+
#### Example of use
167+
168+
To view the contents of different lines, Use the `-d` (`--show-difference-only`) option.
169+
If you also want to see the number of differences, put the `-c` option (`--show-count`).
170+
171+
```sh
172+
$ ../../src/csvdiff3/csvdiff.py sample_lhs.csv sample_rhs.csv -k 0,2 -dc
173+
174+
============ Report ============
175+
176+
* Differences
177+
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
178+
sample_lhs.csv sample_rhs.csv Column indices with difference
179+
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
180+
> 2 ['key1-1', 'value1-1', 'key2-1', 'value2-1', '20210108T142358']
181+
2 ['key1-2', 'value1-2', 'key2-2', 'value2-2', '20201224T035908'] ! 3 ['key1-2', 'value1-3', 'key2-2', 'value2-z', '20210108T174216'] @ [1, 3, 4]
182+
3 ['key1-3', 'value1-3', 'key2-3', 'value2-3', '20201224T180527'] <
183+
4 ['key1-4', 'value1-4', 'key2-4', 'value2-4', '20201225T104851'] ! 4 ['key1-4', 'value1-4', 'key2-4', 'value2-4', '20210109T090245'] @ [4]
184+
5 ['key1-5', 'value1-5', 'key2-5', 'value2-5', '20201225T142142'] ! 5 ['key1-5', 'value1-v', 'key2-5', 'value2-5', '20210109T111231'] @ [1, 4]
185+
186+
* Count & Row number
187+
same lines : 0
188+
left side only (<): 1 :-- Row Numbers -->: [3]
189+
right side only (>): 1 :-- Row Numbers -->: [2]
190+
with differences (!): 3 :-- Row Number Pairs -->: [(2, 3), (4, 4), (5, 5)]
191+
```
192+
* Differences are indicated by the following DIFF-MARKs
193+
* `!` : There is a difference
194+
* `<` : Exists only on the left side
195+
* `>` : Exists only on the right side
196+
197+
* The number displayed before each CSV line data is the line number of the actual file
198+
* line number is 1 based
199+
200+
* For rows with differences, the column indices with differences will be displayed after `@`
201+
* column index is 0 based
202+
203+
204+
## :herb: Notices
205+
206+
* *For large amounts of data*
207+
208+
In the case of a horizontal report,
209+
it takes longer than a vertical report because all lines are scanned in advance to collect information for report formatting.
210+
For large amounts of data, consider vertical reports.
211+
212+
## :herb: Known Issues
213+
214+
* *Workaround for only one line*
215+
216+
If the CSV file contains only one line, it will be recognized as a header.
217+
You need to specify the option `-H n` to be recognized as CSV without a header.
218+
219+
220+
## :herb: Contributing
221+
222+
### Reporting Bugs
223+
224+
We welcome you to use the GitHub issue tracker to report bugs or suggest features.
225+
226+
227+
## :herb: License
228+
229+
csv-diff-python3 is released under the MIT license. Please read the [LICENSE](LICENSE) file for more information.
230+
231+
232+

appendix/csv-diff-animation.gif

4.89 MB
Loading

appendix/csv_samples/sample_lhs.csv

+5
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
head1, head2, head3, head4, head5
2+
key1-2, value1-2, key2-2, value2-2, 20201224T035908
3+
key1-3, value1-3, key2-3, value2-3, 20201224T180527
4+
key1-4, value1-4, key2-4, value2-4, 20201225T104851
5+
key1-5, value1-5, key2-5, value2-5, 20201225T142142

appendix/csv_samples/sample_rhs.csv

+5
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
head1, head2, head3, head4, head5
2+
key1-1, value1-1, key2-1, value2-1, 20210108T142358
3+
key1-2, value1-3, key2-2, value2-z, 20210108T174216
4+
key1-4, value1-4, key2-4, value2-4, 20210109T090245
5+
key1-5, value1-v, key2-5, value2-5, 20210109T111231

0 commit comments

Comments
 (0)