Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cloc vs Scc giving different results - number of code files, lines of code etc. #580

Open
HariSekhon opened this issue Feb 6, 2025 · 1 comment

Comments

@HariSekhon
Copy link

HariSekhon commented Feb 6, 2025

Describe the bug

cloc is finding far more files and lines of code than scc when running both on the command line.

The discrepancy was so large with cloc finding nearly twice as many Swift files and lines of code that I even tried --exclude-dir=.git in case it was something silly like this.

2.7M lines vs 820k lines of code reported for the first repo which caught my eye.

To Reproduce

I noticed this on a work repo which I'll show first and then reproduce on open source public repos further below:

$ cloc --exclude-dir=.git .
   16615 text files.
   10866 unique files.
    9006 files ignored.

github.com/AlDanial/cloc v 2.02  T=6.48 s (1676.7 files/s, 471886.4 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
YAML                            84             11             55         924650
JSON                          1377             10              0         800474
Swift                         4906          66126         109653         398884
XML                           1376           1274            384         197462
C/C++ Header                  1780          38122          85669         145296
Objective-C                    688          19001          16382          98319
Python                         107           4018           5766          56489
Markdown                       114           7956              2          23243
Objective-C++                   65           3264           2005          18098
C                               34           2243           1646          10394
C++                             38           1237           1012           8122
Bourne Shell                    49           1041            478           5751
SVG                            236              1              0           2522
Ruby                             5             88             39            405
CSS                              3             91             25            393
Text                             2             19              0             42
JavaScript                       2              2             22              3
-------------------------------------------------------------------------------
SUM:                         10866         144504         223138        2690547
-------------------------------------------------------------------------------
$ scc .
───────────────────────────────────────────────────────────────────────────────
Language                 Files     Lines   Blanks  Comments     Code Complexity
───────────────────────────────────────────────────────────────────────────────
Swift                     2628    243515    23026     27060   193429      16431
JSON                      1220     38046        9         0    38037          0
SVG                          7        72        0         0       72          0
YAML                         6    589641       11        55   589575          0
Shell                        3        52       11        14       27          0
C Header                     2       122       31        54       37          0
Markdown                     2        43       17         0       26          0
Gemfile                      1         3        1         0        2          0
Objective C                  1       678      147        17      514         63
───────────────────────────────────────────────────────────────────────────────
Total                     3870    872172    23253     27200   821719      16494
───────────────────────────────────────────────────────────────────────────────
Estimated Cost to Develop (organic) $31,049,515
Estimated Schedule Effort (organic) 50.75 months
Estimated People Required (organic) 54.36
───────────────────────────────────────────────────────────────────────────────
Processed 65746657 bytes, 65.747 megabytes (SI)
───────────────────────────────────────────────────────────────────────────────

I don't have any public Swift repos but I'll reproduce it here on my other public GitHub repos to demonstrate, although the discrepancy isn't as large, a few thousand lines - 77k vs 81k - compared to the much larger discrepancy with the work repo above:

git clone https://github.com/HariSekhon/DevOps-Bash-tools bash-tools
cd bash-tools
$ cloc --exclude-dir=.git .
    1712 text files.
    1613 unique files.                                          
     101 files ignored.

github.com/AlDanial/cloc v 2.02  T=0.26 s (6117.1 files/s, 524832.2 lines/s)
--------------------------------------------------------------------------------
Language                      files          blank        comment           code
--------------------------------------------------------------------------------
Bourne Shell                   1426          22989          33653          59446
JSON                              8              0              0           7281
Text                             42            254              0           4021
YAML                             81            442           1746           2961
Markdown                         12            377             33           1920
XML                               8              0              0            827
Bourne Again Shell               19            263            767            503
make                              2             92             51            323
INI                               1             13              0             72
Groovy                            7             22            156             20
Expect                            1              2              1             17
Properties                        3              9             21             15
Python                            1              8             23             12
Ruby                              1              1             23              7
SQL                               1              2             15              4
--------------------------------------------------------------------------------
SUM:                           1613          24474          36489          77429
--------------------------------------------------------------------------------
$ scc . 
───────────────────────────────────────────────────────────────────────────────
Language                 Files     Lines   Blanks  Comments     Code Complexity
───────────────────────────────────────────────────────────────────────────────
Shell                     1393    113681    19154     32796    61731       9161
YAML                        80      5233      454      1779     3000          0
Plain Text                  42      4275      254         0     4021          0
BASH                        16      1299      218       625      456         98
Markdown                    12      2330      377         0     1953          0
JSON                         8      7281        0         0     7281          0
Groovy                       7       198       22       156       20          0
XML                          5       351        0         0      351          0
Zsh                          5       294       58       200       36          4
Properties File              3        45        9        21       15          0
Makefile                     2       466       92        51      323         22
Autoconf                     1       865      143        95      627         78
Bitbucket Pipeline           1        38        5        24        9          0
Docker ignore                1      4007     1022      1528     1457          0
Gemfile                      1        33        6        23        4          0
License                      1         7        3         0        4          0
Python                       1        43        1        20       22          0
Ruby                         1        31        1        24        6          0
SQL                          1        21        2        15        4          0
Vim Script                   1       797       98       252      447         28
───────────────────────────────────────────────────────────────────────────────
Total                     1582    141295    21919     37609    81767       9391
───────────────────────────────────────────────────────────────────────────────
Estimated Cost to Develop (organic) $2,752,975
Estimated Schedule Effort (organic) 20.21 months
Estimated People Required (organic) 12.10
───────────────────────────────────────────────────────────────────────────────
Processed 4365112 bytes, 4.365 megabytes (SI)
───────────────────────────────────────────────────────────────────────────────

The code lines count seems off by a couple thousand lines in my DevOps-Python-tools repo - 27k vs 29k:

git clone https://github.com/HariSekhon/DevOps-Python-tools pytools
cd pytools
$ cloc . 
     402 text files.
     347 unique files.                                          
      56 files ignored.

github.com/AlDanial/cloc v 2.02  T=0.12 s (2982.2 files/s, 352022.7 lines/s)
--------------------------------------------------------------------------------
Language                      files          blank        comment           code
--------------------------------------------------------------------------------
Python                          124           2743           5264          12484
JSON                             26              0              0           5471
Bourne Shell                     83           1322           1819           4467
YAML                             70            371           1419           2802
Markdown                          7            183             23            614
XML                               4              0              3            566
Text                             15             79              0            367
make                              2             40             73            165
Bourne Again Shell                2             41            129             78
INI                               2             14              3             77
Pig Latin                         2             36             88             46
Expect                            1              2             14             31
Jinja Template                    1              1              0             17
TOML                              1              8              0             17
Properties                        2              9             20             14
CSV                               4              0              0             10
Ruby                              1              1             23              6
--------------------------------------------------------------------------------
SUM:                            347           4850           8878          27232
--------------------------------------------------------------------------------
$ scc .
───────────────────────────────────────────────────────────────────────────────
Language                 Files     Lines   Blanks  Comments     Code Complexity
───────────────────────────────────────────────────────────────────────────────
Python                     124     20381     1125      4655    14601       1828
Shell                       83      7608     1299      1896     4413        579
YAML                        69      4554      366      1395     2793          0
JSON                        27      5506        0         0     5506          0
Plain Text                  16       448       79         0      369          0
Markdown                     7       820      183         0      637          0
CSV                          4        10        0         0       10          0
Makefile                     4       322       46        97      179         28
XML                          4       569        0         3      566          0
BASH                         2       248       41       131       76         17
Properties File              2        43        9        20       14          0
Bitbucket Pipeline           1        38        5        24        9          0
Expect                       1        47        2        14       31          1
INI                          1         9        1         3        5          0
Jinja                        1        18        1         0       17          0
License                      1         7        3         0        4          0
Ruby                         1        30        1        24        5          0
TOML                         1        25        8         0       17          0
───────────────────────────────────────────────────────────────────────────────
Total                      349     40683     3169      8262    29252       2453
───────────────────────────────────────────────────────────────────────────────
Estimated Cost to Develop (organic) $935,532
Estimated Schedule Effort (organic) 13.41 months
Estimated People Required (organic) 6.20
───────────────────────────────────────────────────────────────────────────────
Processed 1408944 bytes, 1.409 megabytes (SI)
───────────────────────────────────────────────────────────────────────────────

I've tried it on some of my other smaller simpler public GitHub repos like Jenkins (Groovy)
and GitHub-Actions (YAML) and the results are very close in those cases.

I decided to try a random public Swift repo in case it was more pronounced there, bit of a discrepancy there 52k vs 58k:

git clone https://github.com/tensorflow/swift
cd swift
$ cloc . 
      88 text files.
      82 unique files.                              
      20 files ignored.

github.com/AlDanial/cloc v 2.02  T=0.09 s (878.8 files/s, 657474.7 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Jupyter Notebook                15              0           6111          43429
Markdown                        35           1721              0           5235
Swift                           27            373            625           3610
YAML                             2              5              1             90
Bourne Shell                     2             14             26             64
Dockerfile                       1              5              5             33
-------------------------------------------------------------------------------
SUM:                            82           2118           6768          52461
-------------------------------------------------------------------------------
$ scc . 
───────────────────────────────────────────────────────────────────────────────
Language                 Files     Lines   Blanks  Comments     Code Complexity
───────────────────────────────────────────────────────────────────────────────
Markdown                    35      6956     1721         0     5235          0
Swift                       27      4608      370       625     3613        424
Jupyter                     15     49540        0         0    49540          0
Shell                        2       104       14        28       62          2
YAML                         2        96        5         1       90          0
Dockerfile                   1        43        5         5       33          4
License                      1       201       32         0      169          0
───────────────────────────────────────────────────────────────────────────────
Total                       83     61548     2147       659    58742        430
───────────────────────────────────────────────────────────────────────────────
Estimated Cost to Develop (organic) $1,945,321
Estimated Schedule Effort (organic) 17.71 months
Estimated People Required (organic) 9.76
───────────────────────────────────────────────────────────────────────────────
Processed 2637520 bytes, 2.638 megabytes (SI)
───────────────────────────────────────────────────────────────────────────────

Expected behavior

Expected the results from both tools to be closer than they were, especially for the work repo.

I appreciate there may be small differences in the way things are calculated between tools and it doesn't have to be perfectly accurate, more to just give a ballpark idea, but I'm trying to understand why this can be thousands of lines or in the top example 1.8M lines.

Desktop (please complete the following information):

  • OS: macOS
  • Version 14.1
@boyter
Copy link
Owner

boyter commented Feb 9, 2025

Have not got the time right now to look though this, but I suspect to comes down to either symlinks or proper .ignore/.gitignore file support. I am not sure if cloc does this correctly.

Looking at the last example though, the difference comes dow to the license file being picked up by scc which is not supported by cloc. Note that they support different languages, and as such there will be differences there. Also because cloc uses regex to count code, it can be wrong from time to time. scc by contrast uses a small state machine as it is designed to be more accurate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants