Context (otherwise known as the why)
After roughly a week of using nixOS full time on my laptop (and about a month on my desktop) I have been amazed at how, generally speaking, smooth the experience has been despite most accounts of new nixOS users complaining about the “horrible” switch. Yes it’s a different way to think about using a computer but after mucking about for a bit (and lots and lots of reading tutorials and whatnot) it really isn’t that bad and I can get pretty much everything done I need to with little-mid effort.
One thing I have noticed (at least on the laptop) is that the boot times are quite lengthy for an nvme ssd. After running the command sudo systemd-analyze we get the total boot time, from power button pressed to logged in, of 49 seconds!
sudo systemd-analyze
Startup finished in 5.340s (firmware) + 1.997s (loader) + 9.314s (kernel) + 32.873s (userspace) = 49.525s
graphical.target reached after 4.987s in userspace.Obviously this is far too long for a completely modern system and while I don’t mind the startup time, I think we can make it much better. I haven’t started the process yet so hopefully the rest of the post is a positive outcome. I feel I should mention that this adventure started becuase in my quest for maximum free RAM (see my previous two posts if interested) I had noticed that syncthing was always starting at boot and using more RAM than I wanted. As I was (and still am) trying to find a way to stop it running at boot, I stumbled across the systemd-analyze command and so here we are.
The process (otherwise known as the how)
The first logical step I think would be to not show the generation picker at boot so let’s add this line to the nix configuration file boot.loader.timeout = 0. The default time this picker shows up is 5 seconds so adding this line has removed 10% of the boot time! let’s run some more of those systemd-analyze commands and see where we can shave some time off.
sudo systemd-analyze critical-chain
The time when unit became active or started is printed after the "@" character.
The time the unit took to start is printed after the "+" character.
graphical.target @4.987s
└─display-manager.service @4.683s +303ms
└─systemd-user-sessions.service @4.674s +7ms
└─nss-user-lookup.target @16.962s
└─nscd.service @16.931s +30ms
└─basic.target @3.809s
└─sockets.target @3.809s
└─systemd-hostnamed.socket @3.809s
└─sysinit.target @3.808s
└─systemd-vconsole-setup.service @4.468s +44ms
└─systemd-journald.socket @258ms
└─system.slice @226ms
└─-.slice @226msBased on the following output it looks like the display-manager is taking the longest time but lets take a look at the other commands first.
sudo systemd-analyze blame
29.033s nix-gc.service
5.538s fstrim.service
5.099s NetworkManager-wait-online.service
3.573s firewall.service
653ms NetworkManager.service
312ms tlp.service
303ms display-manager.service
274ms mpd.service
264ms dev-disk-by\x2duuid-8cc603c9\x2d273e\x2d4718\x2dad8a\x2d2d3f7a44894f.device
199ms accounts-daemon.serviceHere are the top 10 longest running startup things according to the systemd-analyze blame command… Let’s focus on those top 4 as they are the ones taking longer than a second. Also for those curious, you can run sudo systemd-analyze plot > <insert image name> to get a picture of this and below is what I get.
I won’t really focus on this as it’s basically the same output as the blame command but in a picture, however the bright red lines are mostly the same ones as the top 4 we’re going to focus on. The longest one, nix-gc.service, is essential in keeping the nix store from inflating out of control and taking up all my disk space so thats got to stay sadly. Lets go through them in reverse order as I think that will be the order of easy.
firewall.service: Nothing to do here since my firewall configuration is commented out in my nix config file… off to a wonderful start.
NetworkManager-wait-online.service: According to the networkmanager service site this service is not the thing that’s slowing anything down so further investigation will be required. “In the best case, all services on the system can react to networking changes dynamically and no service orders itself after network-online.target. That way, NetworkManager-wait-online.service has no effect and, for example, does not delay the boot. That means, if the problem is a long boot time related to NetworkManager-wait-online.service, a possible solution is to investigate the services that claim to require network and fix those”. Another failure for now.
ftrim.service: The man pages say “fstrim is used on a mounted filesystem to discard (or”trim”) blocks which are not in use by the filesystem. This is useful for solid-state drives (SSDs) and thinly-provisioned storage.” and also mentions something about an offset so maybe the following block in my nix config has something to do with it?
# free up to 10gb when there is less than 1gb available
nix.extraOptions = ''
min-free = ${toString (1024 * 1024 * 1024)}
max-free = ${toString (10024 * 1024 * 1024)}
''; After commenting out the above code and getting rid of the generation timer, my nix rebuilt so lets reboot and see what happens.
Results
Good news! the total boot time went from 49 seconds to 28 seconds, which is still quite slow for an nvme ssd but way faster (almost half the time). Here are the new numbers.
sudo systemd-analyze critical-chain
The time when unit became active or started is printed after the "@" character.
The time the unit took to start is printed after the "+" character.
graphical.target @3.682s
└─display-manager.service @3.190s +490ms
└─systemd-user-sessions.service @3.182s +7ms
└─nss-user-lookup.target @10.946s
└─nscd.service @10.910s +35ms
└─basic.target @2.397s
└─sockets.target @2.397s
└─systemd-hostnamed.socket @2.397s
└─sysinit.target @2.395s
└─systemd-vconsole-setup.service @3.622s +56ms
└─systemd-journald.socket @273ms
└─system.slice @233ms
└─-.slice @233msThat looks pretty much identical but lets look at the blame.
sudo systemd-analyze blame
7.798s NetworkManager-wait-online.service
1.939s firewall.service
653ms NetworkManager.service
490ms display-manager.service
442ms mpd.service
361ms tlp.service
318ms network-addresses-vboxnet0.service
291ms dev-disk-by\x2duuid-8cc603c9\x2d273e\x2d4718\x2dad8a\x2d2d3f7a44894f.device
273ms user@1000.service
163ms systemd-udev-trigger.serviceThe new top 10 longest starting services are much more reasonable. And finally the total time spent booting.
sudo systemd-analyze
Startup finished in 8.185s (firmware) + 873ms (loader) + 8.318s (kernel) + 10.980s (userspace) = 28.359s
graphical.target reached after 3.682s in userspace.Conclusion
As is tradition, thank you for reading! I have kind of run out of time for the day but I will do some more research and probably make a follow up post documenting my findings for both the NetworkManager-wait-online.service behaviour as well as how I manage (if I do) to disable syncthing on boot. I also hope to have the table of contents thing working by the next post!