SLOC the Web
Notes on trying to measure the Web's client, aka the browser
Published: Saturday, Jul 4, 2020 Last modified: Wednesday, Oct 2, 2024
Since my video on measuring the SLOC using ohcount, I’ve been wanting to tackle the elephant in the room. The Web!!
You can moan about the Web all day, but if you are not looking at the codebase, then you’re just a consumer and you’re not really helping. Don’t worry, I’ve been guilty of this too, though I’m trying to change.
The gist of SLOC, is if there is more code, there is more complexity. As The Notorious B.I.G. said: Mo Code, Mo Problems.
There are too thorny issues whilst counting to be aware of before we get started:
- Tests, I made a testbloat.sh script to rip them out
- Dependencies (look at Arch’s PKGBUILD for some clues)
Firefox & chromium actually have a few dependencies compared to surf’s Webkit. This suggests that {Firefox,Chromium} pull in a lot of dependencies into their source distribution and compile it in. Hence we should see much bigger SLOC for {Firefox,Chromium}…
Blink
- Used in the Market share leader “Chrome” (chomium in OSS) by Google
- Hard Fork of Webkit in 2013, though I can’t find/trace a single similar file between Webkit and Blink in 2020
- Webkit can be seen to have merged in 2001
git clone https://chromium.googlesource.com/chromium/src blink
Warning: Takes forever- Web view: https://chromium.googlesource.com/chromium/src.git/+/refs/heads/master/content/
- Archlinux package
testbloat.sh chromium
:
6.2G src with tests
3.7G src without tests
Flamegraph via https://github.com/brendangregg/FlameGraph
Webkit
- Originated from Konquerer and the KDE project (grep for kde.org), Apple hired Antti Koivisto
git clone git://git.webkit.org/WebKit-https.git WebKit
- Web view: https://trac.webkit.org/browser
- Archlinux package
testbloat.sh webkit2gtk
:
227M src with tests
201M src without tests
Gecko
- Originated from Netscape in 1998
- Used in Firefox
git clone git@github.com:mozilla/gecko-dev.git
, the VCS backend is actually mercurial?!- Web view: https://github.com/mozilla/gecko-dev
- Archlinux package
testbloat.sh firefox
:
2.7G src with tests
1.2G src without tests
More than half the code base is… tests !
Source distribution
Using Archlinux, I grabbed the {firefox,webkit2gtk,chromium} distributions like so:
asp checkout $1
cd $1/trunk
makepkg -os --skippgpcheck
cd src/$1*
Lets look at:
- number of files
find chromium-83.0.4103.116/ -type f | wc -l
- cloc’s number of files
- number of lines
find chromium-83.0.4103.116 -type f -exec cat {} + | wc -l
- cloc’s number of lines
I chose cloc over ohcount since it was faster. It ignores files that it doesn’t deem as source. Tbh, I think every file in a source distribution should be considered source. Since pruning non-source files, tests and documentation is a slipperly slope!
I was surprised to see chromium/blink to be more LOC than Firefox!
Thank you to Stackoverflow for getting tips how to plot the above.
Git checkout
From which the source distribution (x86 target) is somehow 🤷 derived!
So with the git history, the projects weigh in at:
hendry@knuckles ~/sloc $ du -sh *
11G WebKit
26G blink
5.6G gecko
$Data <<EOD
browser srcurl rev commitcount files lines
WebKit git://git.webkit.org/WebKit-https.git 3a2f99102aca947abcf9f70d0785dc3e5c073560 226748 310036 43304162
blink https://chromium.googlesource.com/chromium/src fa66724154f74bab505fe38c4b6d0d31b5a83ed0 906439 330146 42244206
gecko git@github.com:mozilla/gecko-dev.git 668686ae0504450a8c93501d1eb115f201eb982d 716280 281292 43547855
EOD
set terminal svg
set datafile separator ' '
set title 'git source'
set yrange [1000:*]
set logscale y
set ytics format "%.0s%c"
set style data histogram
set style histogram cluster gap 1
set style fill solid border -1
set boxwidth 0.9
set key left top
plot $Data u 4:xtic(1) ti col,\
'' u 5 ti col,\
'' u 6 ti col
It’s an incredible coincidence how the sloc (without .git) is roughly the same between the three git checkouts!
blink/ 42244206
gecko/ 43547855
WebKit/ 43298218
Data was generated by:
hendry@knuckles ~/sloc $ cat wc.sh
echo browser srcurl rev commitcount files lines
for i in WebKit blink gecko
do
G="--git-dir ./$i/.git"
commit=$(git $G rev-parse HEAD)
commitcount=$(git $G rev-list --count $commit)
srcurl=$(git $G config --get remote.origin.url)
files=$(find $i/ -not -path '*/\.git/*' -type f | wc -l)
lines=$(find $i/ -not -path '*/\.git/*' -type f -exec cat {} + | wc -l)
echo $i $srcurl $commit $commitcount $files $lines
done
TODO: Check above
Blink
Webkit
Gecko
LOC over time
I’m also keen to examine their sloc over time using a tool I wrote https://github.com/kaihendry/graphsloc
Blink
Can anyone make collect-stats.sh faster because these took DAYS to gather the data.
45M lines of code when I add up all the lines of all the commits. However that falls considerably short of the 100M of find chromium-83.0.4103.116 -type f -exec cat {} + | wc -l
. However if you look at cloc’s analysis of 49M… it’s pretty close to 45M!
Gecko
I’m not quite sure why there are horizontal lines. Do please look at the gecko source CSV and collect-stats.sh.
The total is ~130M, but when I look at just the firefox-78.0.1 source
distribution… it’s just 43M or if you just want to look at source files with
cloc
… 23M. I suspect the git - mercurial bridge is problematic.
Webkit
40M doesn’t come close to the 5-3M of code I got from the https://webkitgtk.org sources… that’s because a release is cut from the git, which is documented upon https://trac.webkit.org/wiki/WebKitGTK/Releasing
Concluding remarks
Firefox has the cleanest git to source mapping. Firefox has rust / python tool chains in the source that bloat it quite a bit. Firefox has the largest amount of sloc in git, but Chromium has more in their source distribution because Blink’s git repo is not a “monorepo” like Firefox’s. For e.g. their JS runtime (V8?) is not in the Blink git repo.
Webkit is just a kit/library, and whilst the smallest code base, there a lot of dynamically linked dependencies that are difficult to sloc / measure.
Blink’s source esp when you take in consideration gclient
et al is
expansive and firmly under the grip of Google.
Sidenote: I wanted to follow the Webkit source through Webkit, for example git log --follow -- ./third_party/blink/renderer/core/layout/layout_table_cell.h | tig
but it appears firmly forked!