|
1 |
| -# csv-diff-python3 |
| 1 | + |
| 2 | +# csv-diff-python3 |
| 3 | + |
| 4 | +[](README.md/#herb-requirements) |
| 5 | +[](https://github.com/blue-monk/csv-diff-python3/actions/workflows/testing.yml) |
| 6 | +[](https://blue-monk.github.io/csv-diff-python3/) |
| 7 | +[](LICENSE) |
| 8 | + |
| 9 | + |
| 10 | +## :herb: Overview |
| 11 | + |
| 12 | +A simple command-line tool to see the difference between two CSV files. |
| 13 | + |
| 14 | +This tool reports in the following style, and you can choose how to report. |
| 15 | + |
| 16 | +1. Report the number of differences and line numbers |
| 17 | +2. Report diff marks along with the contents of each CSV line |
| 18 | + * You can choose the following report styles |
| 19 | + * Horizontal (Side-by-side) display style |
| 20 | + * Vertical display style |
| 21 | + * You can choose to report only the lines with differences or all lines |
| 22 | + |
| 23 | + |
| 24 | +--- |
| 25 | +:palm_tree: DEMO |
| 26 | + |
| 27 | + |
| 28 | + |
| 29 | +--- |
| 30 | + |
| 31 | + |
| 32 | +## :herb: Table of Contents |
| 33 | + |
| 34 | +* [**Why csv-diff?**](#herb-why-csv-diff) |
| 35 | +* [**Feature**](#herb-features) |
| 36 | +* [**Requirements**](#herb-requirements) |
| 37 | + * [Runtime](#runtime) |
| 38 | + * [CSV files](#csv-files) |
| 39 | +* [**Installation**](#herb-installation) |
| 40 | + * [With pip](#with-pip) |
| 41 | + * [Manual Installation](#manual-installation) |
| 42 | +* [**Run**](#herb-run) |
| 43 | + * [If installed with pip](#if-installed-with-pip) |
| 44 | + * [If installed manually](#if-installed-manually) |
| 45 | +* [**How to use**](#herb-how-to-use) |
| 46 | + * [Get Help](#get-help) |
| 47 | + * [One example](#one-example) |
| 48 | +* [**Notices**](#herb-notices) |
| 49 | +* [**Known Issues**](#herb-known-issues) |
| 50 | +* [**Contributing**](#herb-contributing) |
| 51 | + * [Reporting Bugs / Feature Requests](#reporting-bugs--feature-requests) |
| 52 | +* [**License**](#herb-license) |
| 53 | + |
| 54 | + |
| 55 | +## :herb: Why csv-diff? |
| 56 | + |
| 57 | +The `diff` command that compares files is unaware of key columns (like primary keys in a database). |
| 58 | +Therefore, it may give undesired results in detecting differences in CSV files that have key columns. |
| 59 | + |
| 60 | +For example, consider comparing the contents of tables in two databases that are inaccessible to each other. |
| 61 | +One way is to output each database's data as a CSV file and compare it. |
| 62 | +In this case, the `diff` command does not pay attention to the key columns, so lines with different keys may be compared. |
| 63 | +It is not possible to make an accurate judgment of the difference with the key in mind. |
| 64 | + |
| 65 | +This tool, on the other hand, recognizes key columns and detects differences. |
| 66 | +Specify the key columns as an argument at the time of execution. You can get the comparison result you want. |
| 67 | + |
| 68 | + |
| 69 | +## :herb: Features |
| 70 | + |
| 71 | +* CSV delimiter, line feed character, presence/absence of header, etc. are automatically determined (can be specified) |
| 72 | +* Make a comparison after matching with the key columns |
| 73 | +* You can specify columns that are not compared |
| 74 | +* Differences can be displayed side-by-side (more suitable when the number of columns is small) |
| 75 | +* Differences can be displayed in vertical order (more suitable when the number of columns is large) |
| 76 | +* Differences are indicated by the following marks, which we call DIFF-MARK |
| 77 | + * `!`: There is a difference |
| 78 | + * `<`: Exists only on the left side |
| 79 | + * `>`: Exists only on the right side |
| 80 | +* It is also possible to display only the number of differences and the line number with the difference |
| 81 | +* It is possible to compare one file with commas and one file with tabs |
| 82 | +* Low memory consumption |
| 83 | +* Only Python standard modules are used and provided as a single file, so it is easy to install even on an isolated environment |
| 84 | + |
| 85 | + |
| 86 | +## :herb: Requirements |
| 87 | + |
| 88 | +### Runtime |
| 89 | +* Python3.6 or later |
| 90 | + |
| 91 | +### CSV files |
| 92 | +* Must be sorted by key columns |
| 93 | + |
| 94 | + |
| 95 | +## :herb: Installation |
| 96 | + |
| 97 | +### With pip |
| 98 | + |
| 99 | +```sh |
| 100 | +pip install git+https://github.com/blue-monk/csv-diff-python3 |
| 101 | +``` |
| 102 | +It may be safer to install it on a virtual environment created with venv. |
| 103 | + |
| 104 | +### Manual installation |
| 105 | + |
| 106 | +Place `csvdiff.py` in any directory on the machine where Python 3 is installed. |
| 107 | +It will be easier to use if you place it in a directory defined on PATH. |
| 108 | + |
| 109 | +## :herb: Run |
| 110 | + |
| 111 | +### If installed with pip |
| 112 | + |
| 113 | +```sh |
| 114 | +$ csvdiff3 -h |
| 115 | +``` |
| 116 | + |
| 117 | +### If installed manually |
| 118 | + |
| 119 | +```sh |
| 120 | +$ python csvdiff.py -h |
| 121 | +``` |
| 122 | +or |
| 123 | +```shell |
| 124 | +$ chmod +x csvdiff.py |
| 125 | +$ ./csvdiff.py -h |
| 126 | +``` |
| 127 | + |
| 128 | +## :herb: How to use |
| 129 | + |
| 130 | +See the [Wiki](https://github.com/blue-monk/csv-diff-python3/wiki) for more details. |
| 131 | +* [Wiki/Command](https://github.com/blue-monk/csv-diff-python3/wiki/Command) |
| 132 | +* [Wiki/How to use](https://github.com/blue-monk/csv-diff-python3/wiki/How-to-use) |
| 133 | + |
| 134 | +### Get help |
| 135 | +```sh |
| 136 | +$ ./csvdiff.py -h |
| 137 | +``` |
| 138 | + |
| 139 | +### One example |
| 140 | + |
| 141 | +Here is one example with the following sample data in `appendix/csv_samples/`. |
| 142 | +See the [Wiki/How to use](https://github.com/blue-monk/csv-diff-python3/wiki/How-to-use) for more details. |
| 143 | + |
| 144 | +#### Sample data |
| 145 | + |
| 146 | +Suppose the keys are the 0th column and the 2nd column. |
| 147 | + |
| 148 | +* sample_lhs.csv |
| 149 | + ```csv |
| 150 | + head1, head2, head3, head4, head5 |
| 151 | + key1-2, value1-2, key2-2, value2-2, 20201224T035908 |
| 152 | + key1-3, value1-3, key2-3, value2-3, 20201224T180527 |
| 153 | + key1-4, value1-4, key2-4, value2-4, 20201225T104851 |
| 154 | + key1-5, value1-5, key2-5, value2-5, 20201225T142142 |
| 155 | + ``` |
| 156 | +
|
| 157 | +* sample_rhs.csv |
| 158 | + ```csv |
| 159 | + head1, head2, head3, head4, head5 |
| 160 | + key1-1, value1-1, key2-1, value2-1, 20210108T142358 |
| 161 | + key1-2, value1-3, key2-2, value2-z, 20210108T174216 |
| 162 | + key1-4, value1-4, key2-4, value2-4, 20210109T090245 |
| 163 | + key1-5, value1-v, key2-5, value2-5, 20210109T111231 |
| 164 | + ``` |
| 165 | +
|
| 166 | +#### Example of use |
| 167 | +
|
| 168 | +To view the contents of different lines, Use the `-d` (`--show-difference-only`) option. |
| 169 | +If you also want to see the number of differences, put the `-c` option (`--show-count`). |
| 170 | +
|
| 171 | +```sh |
| 172 | +$ ../../src/csvdiff3/csvdiff.py sample_lhs.csv sample_rhs.csv -k 0,2 -dc |
| 173 | +
|
| 174 | +============ Report ============ |
| 175 | +
|
| 176 | +* Differences |
| 177 | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| 178 | +sample_lhs.csv sample_rhs.csv Column indices with difference |
| 179 | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| 180 | + > 2 ['key1-1', 'value1-1', 'key2-1', 'value2-1', '20210108T142358'] |
| 181 | +2 ['key1-2', 'value1-2', 'key2-2', 'value2-2', '20201224T035908'] ! 3 ['key1-2', 'value1-3', 'key2-2', 'value2-z', '20210108T174216'] @ [1, 3, 4] |
| 182 | +3 ['key1-3', 'value1-3', 'key2-3', 'value2-3', '20201224T180527'] < |
| 183 | +4 ['key1-4', 'value1-4', 'key2-4', 'value2-4', '20201225T104851'] ! 4 ['key1-4', 'value1-4', 'key2-4', 'value2-4', '20210109T090245'] @ [4] |
| 184 | +5 ['key1-5', 'value1-5', 'key2-5', 'value2-5', '20201225T142142'] ! 5 ['key1-5', 'value1-v', 'key2-5', 'value2-5', '20210109T111231'] @ [1, 4] |
| 185 | +
|
| 186 | +* Count & Row number |
| 187 | +same lines : 0 |
| 188 | +left side only (<): 1 :-- Row Numbers -->: [3] |
| 189 | +right side only (>): 1 :-- Row Numbers -->: [2] |
| 190 | +with differences (!): 3 :-- Row Number Pairs -->: [(2, 3), (4, 4), (5, 5)] |
| 191 | +``` |
| 192 | +* Differences are indicated by the following DIFF-MARKs |
| 193 | + * `!` : There is a difference |
| 194 | + * `<` : Exists only on the left side |
| 195 | + * `>` : Exists only on the right side |
| 196 | + |
| 197 | +* The number displayed before each CSV line data is the line number of the actual file |
| 198 | + * line number is 1 based |
| 199 | + |
| 200 | +* For rows with differences, the column indices with differences will be displayed after `@` |
| 201 | + * column index is 0 based |
| 202 | + |
| 203 | + |
| 204 | +## :herb: Notices |
| 205 | + |
| 206 | +* *For large amounts of data* |
| 207 | + |
| 208 | + In the case of a horizontal report, |
| 209 | + it takes longer than a vertical report because all lines are scanned in advance to collect information for report formatting. |
| 210 | + For large amounts of data, consider vertical reports. |
| 211 | + |
| 212 | +## :herb: Known Issues |
| 213 | + |
| 214 | +* *Workaround for only one line* |
| 215 | + |
| 216 | + If the CSV file contains only one line, it will be recognized as a header. |
| 217 | + You need to specify the option `-H n` to be recognized as CSV without a header. |
| 218 | + |
| 219 | + |
| 220 | +## :herb: Contributing |
| 221 | + |
| 222 | +### Reporting Bugs |
| 223 | + |
| 224 | +We welcome you to use the GitHub issue tracker to report bugs or suggest features. |
| 225 | + |
| 226 | + |
| 227 | +## :herb: License |
| 228 | + |
| 229 | +csv-diff-python3 is released under the MIT license. Please read the [LICENSE](LICENSE) file for more information. |
| 230 | + |
| 231 | + |
| 232 | + |
0 commit comments