You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
cloc is finding far more files and lines of code than scc when running both on the command line.
The discrepancy was so large with cloc finding nearly twice as many Swift files and lines of code that I even tried --exclude-dir=.git in case it was something silly like this.
2.7M lines vs 820k lines of code reported for the first repo which caught my eye.
To Reproduce
I noticed this on a work repo which I'll show first and then reproduce on open source public repos further below:
I don't have any public Swift repos but I'll reproduce it here on my other public GitHub repos to demonstrate, although the discrepancy isn't as large, a few thousand lines - 77k vs 81k - compared to the much larger discrepancy with the work repo above:
git clone https://github.com/HariSekhon/DevOps-Bash-tools bash-tools
cd bash-tools
I've tried it on some of my other smaller simpler public GitHub repos like Jenkins (Groovy)
and GitHub-Actions (YAML) and the results are very close in those cases.
I decided to try a random public Swift repo in case it was more pronounced there, bit of a discrepancy there 52k vs 58k:
git clone https://github.com/tensorflow/swift
cd swift
Expected the results from both tools to be closer than they were, especially for the work repo.
I appreciate there may be small differences in the way things are calculated between tools and it doesn't have to be perfectly accurate, more to just give a ballpark idea, but I'm trying to understand why this can be thousands of lines or in the top example 1.8M lines.
Desktop (please complete the following information):
OS: macOS
Version 14.1
The text was updated successfully, but these errors were encountered:
Have not got the time right now to look though this, but I suspect to comes down to either symlinks or proper .ignore/.gitignore file support. I am not sure if cloc does this correctly.
Looking at the last example though, the difference comes dow to the license file being picked up by scc which is not supported by cloc. Note that they support different languages, and as such there will be differences there. Also because cloc uses regex to count code, it can be wrong from time to time. scc by contrast uses a small state machine as it is designed to be more accurate.
Describe the bug
cloc
is finding far more files and lines of code thanscc
when running both on the command line.The discrepancy was so large with cloc finding nearly twice as many Swift files and lines of code that I even tried
--exclude-dir=.git
in case it was something silly like this.2.7M lines vs 820k lines of code reported for the first repo which caught my eye.
To Reproduce
I noticed this on a work repo which I'll show first and then reproduce on open source public repos further below:
$ cloc --exclude-dir=.git . 16615 text files. 10866 unique files. 9006 files ignored. github.com/AlDanial/cloc v 2.02 T=6.48 s (1676.7 files/s, 471886.4 lines/s) ------------------------------------------------------------------------------- Language files blank comment code ------------------------------------------------------------------------------- YAML 84 11 55 924650 JSON 1377 10 0 800474 Swift 4906 66126 109653 398884 XML 1376 1274 384 197462 C/C++ Header 1780 38122 85669 145296 Objective-C 688 19001 16382 98319 Python 107 4018 5766 56489 Markdown 114 7956 2 23243 Objective-C++ 65 3264 2005 18098 C 34 2243 1646 10394 C++ 38 1237 1012 8122 Bourne Shell 49 1041 478 5751 SVG 236 1 0 2522 Ruby 5 88 39 405 CSS 3 91 25 393 Text 2 19 0 42 JavaScript 2 2 22 3 ------------------------------------------------------------------------------- SUM: 10866 144504 223138 2690547 -------------------------------------------------------------------------------
I don't have any public Swift repos but I'll reproduce it here on my other public GitHub repos to demonstrate, although the discrepancy isn't as large, a few thousand lines - 77k vs 81k - compared to the much larger discrepancy with the work repo above:
git clone https://github.com/HariSekhon/DevOps-Bash-tools bash-tools cd bash-tools
$ cloc --exclude-dir=.git . 1712 text files. 1613 unique files. 101 files ignored. github.com/AlDanial/cloc v 2.02 T=0.26 s (6117.1 files/s, 524832.2 lines/s) -------------------------------------------------------------------------------- Language files blank comment code -------------------------------------------------------------------------------- Bourne Shell 1426 22989 33653 59446 JSON 8 0 0 7281 Text 42 254 0 4021 YAML 81 442 1746 2961 Markdown 12 377 33 1920 XML 8 0 0 827 Bourne Again Shell 19 263 767 503 make 2 92 51 323 INI 1 13 0 72 Groovy 7 22 156 20 Expect 1 2 1 17 Properties 3 9 21 15 Python 1 8 23 12 Ruby 1 1 23 7 SQL 1 2 15 4 -------------------------------------------------------------------------------- SUM: 1613 24474 36489 77429 --------------------------------------------------------------------------------
The code lines count seems off by a couple thousand lines in my DevOps-Python-tools repo - 27k vs 29k:
git clone https://github.com/HariSekhon/DevOps-Python-tools pytools cd pytools
$ cloc . 402 text files. 347 unique files. 56 files ignored. github.com/AlDanial/cloc v 2.02 T=0.12 s (2982.2 files/s, 352022.7 lines/s) -------------------------------------------------------------------------------- Language files blank comment code -------------------------------------------------------------------------------- Python 124 2743 5264 12484 JSON 26 0 0 5471 Bourne Shell 83 1322 1819 4467 YAML 70 371 1419 2802 Markdown 7 183 23 614 XML 4 0 3 566 Text 15 79 0 367 make 2 40 73 165 Bourne Again Shell 2 41 129 78 INI 2 14 3 77 Pig Latin 2 36 88 46 Expect 1 2 14 31 Jinja Template 1 1 0 17 TOML 1 8 0 17 Properties 2 9 20 14 CSV 4 0 0 10 Ruby 1 1 23 6 -------------------------------------------------------------------------------- SUM: 347 4850 8878 27232 --------------------------------------------------------------------------------
I've tried it on some of my other smaller simpler public GitHub repos like Jenkins (Groovy)
and GitHub-Actions (YAML) and the results are very close in those cases.
I decided to try a random public Swift repo in case it was more pronounced there, bit of a discrepancy there 52k vs 58k:
git clone https://github.com/tensorflow/swift cd swift
$ cloc . 88 text files. 82 unique files. 20 files ignored. github.com/AlDanial/cloc v 2.02 T=0.09 s (878.8 files/s, 657474.7 lines/s) ------------------------------------------------------------------------------- Language files blank comment code ------------------------------------------------------------------------------- Jupyter Notebook 15 0 6111 43429 Markdown 35 1721 0 5235 Swift 27 373 625 3610 YAML 2 5 1 90 Bourne Shell 2 14 26 64 Dockerfile 1 5 5 33 ------------------------------------------------------------------------------- SUM: 82 2118 6768 52461 -------------------------------------------------------------------------------
Expected behavior
Expected the results from both tools to be closer than they were, especially for the work repo.
I appreciate there may be small differences in the way things are calculated between tools and it doesn't have to be perfectly accurate, more to just give a ballpark idea, but I'm trying to understand why this can be thousands of lines or in the top example 1.8M lines.
Desktop (please complete the following information):
The text was updated successfully, but these errors were encountered: