tl;dr code with ELI5 comments: https://gist.github.com/dj-mcculloch/9e097535ea35df8e2ec1e6e32f7f73ac
My media server is a constant target of my need to tinker and perfect. I cannot help but try to make it as fire-and-forget as I possibly can. I just want it to always work. Turns out making something that is reliable (consistently good) and functional (provides utility) is hard — there’s an entire profession dedicated to this (ehlo, SREs).
For my use-case I want my Synology DS1819+ NAS to go to sleep, like all the way, when it is not being used for extended periods of time and automatically wake up when I need it. I don’t want to waste electricity on 8 disks and some logic boards that hardly do anything between 5AM and 3PM. Sure, I could schedule the NAS to turn off and on via a schedule, but that’s dumb.
Enter Wake-on-LAN.
How does Wake-on-LAN (WOL) work? When properly configured, devices that support WOL will power-on when they receive a special packet called a Magic Packet. On Ubuntu this can be accomplished by running etherwake
, which is a utility for sending Magic Packets.
sudo etherwake 00:11:aa:bb:cc:55
etherwake
works by sending an AMD format Magic Packet to the target MAC address; in this case the 00:11:aa:bb:cc:55
dummy MAC address. So how does this work with NFS mounts in Ubuntu?
If you’re mounting NFS volumes in Ubuntu you could be using autofs
. I recently learned that autofs
does not handle WOL for you, but it gives you some flexibility to handle it automatically. autofs
describes its mounts and their local paths in/etc/auto.master
. For your NFS volume your auto.master
should end up with something like this in it:
/nfs /etc/auto.nfs --timeout=5
That line, /nfs /etc/auto.nfs --timeout=5
, tells autofs
where to go to find the details on where and how to mount the NFS volume at /nfs
.
—-timeout=5
tells autofs
when to unmount a volume, in this case after 5 seconds of inactivity. 5 seconds may seem quick, but if your NAS enters ACPI S3 power (otherwise known as sleep) and something tries to access /nfs
it may take a long time for /nfs
to become unmounted and the mounted again, or it may entirely fail to create a new mount. An aggressive timeout here is crucial for this to work.
/etc/auto.nfs
is a map file that details the mount options and target. In general they kinda look like this:
video -fstype=nfs4,retry=0,timeo=50,hard,intr,tcp 192.168.1.102:/volume1/video
Instead of pointing auto.master
to a file that simply describes the NFS volume like the example above, we point to an executable map that is an executable script that sends a Magic Packet to the NAS (or whatever) to wake the device up, and returns the options and target for the mount:
An important note on executable maps: autofs
handles these files differently than normal maps in some not so obvious ways.
autofs
passes in a key (sub-directory) as an argument to executable map- The executable map must return (
echo
) the mount options and target, but not the key (sub-directory)
What does this mean? echo
ing the following…
video -fstype=nfs4,retry=0,timeo=50,hard,intr,tcp 192.168.1.102:/volume1/video
…is illegal. Providing the key video
will give you the following error as seen via sudo journalctl -unit=autofs.service -f
:
validate_location: invalid character " " found in location video -fstype=nfs4 192.168.1.102:/volume1/video
So when you’re using an executable map anything you send to autofs
when accessing the mount, for example, ls /nfs/video
or cd /nfs/fubar
will create the subdirectories video
and fubar
as long as the executable map returns the mount options and target. Because of this, the script I’ve created will not echo
mount options and a target unless the key argument is what we expect.
Once you’ve configured all the variables in the executable map, set your executable map to be executable (surprise) via sudo chmod 755
and update the file’s ownership viasudo chown root:root
and you’re good to go.
If you’re successful, a quick look at the logs for autofs
will show you the executable mapping at work:
Oct 26 23:48:55 Media systemd[1]: Starting Automounts filesystems on demand...
Oct 26 23:48:55 Media automount[20560]: Starting automounter version 5.1.2, master map /etc/auto.master
Oct 26 23:48:55 Media automount[20560]: using kernel protocol version 5.02
...
Oct 26 23:48:55 Media automount[20560]: mounted indirect on /nfs with timeout 5, freq 2 seconds
Oct 26 23:48:55 Media sudo[20593]: root : TTY=unknown ; PWD=/nfs ; USER=root ; COMMAND=/usr/sbin/etherwake 00:11:aa:bb:cc:55
Oct 26 23:48:55 Media sudo[20593]: pam_unix(sudo:session): session opened for user root by (uid=0)
Oct 26 23:48:55 Media sudo[20593]: pam_unix(sudo:session): session closed for user root
Nice.
Caution
If you’re like me you probably thought, “I can programmatically grab the MAC address!”
arp -an | grep $nfs_ipv4 | awk '{print $4}'
In situations where your target NAS (or whatever) has been offline for more than 60 seconds,arp
may not have the MAC address of your NAS. arp
entries in Ubuntu have a time-to-live (TTL) of 60 seconds.
Troubleshooting
Turning on verbose logging output, or even debug logging output for autofs
is critical for problem-solving any issues with this.
sudo vim /etc/default/autofs
:
OPTIONS=”--verbose”
or OPTIONS=”--debug”
is your friend.
Additionally, dmesg
output will help you track the status or errors of things trying to interact with the NFS mount itself.
dmesg -H -w
If your --timeout
in auto.master
is too high or set to the default value you may see a lot of messages like this:
[Oct26 22:47] nfs: server 192.168.1.102 not responding, still trying
[ +6.144254] nfs: server 192.168.1.102 not responding, still trying
[Oct26 22:48] nfs: server 192.168.1.102 not responding, still trying
[Oct26 22:49] nfs: server 192.168.1.102 not responding, still trying
[ +36.860554] nfs: server 192.168.1.102 not responding, timed out
...
[ +24.576531] nfs: server 192.168.1.102 not responding, timed out
[Oct26 23:04] nfs: server 192.168.1.102 not responding, timed out
[ +0.000300] nfs: server 192.168.1.102 not responding, still trying
[ +15.360871] nfs: server 192.168.1.102 not responding, still trying
This means that autofs
has not executed umount
successfully on the mount and because it has gone stale and not been removed, anything trying to use it will timeout over and over and over… you may think, “oh, well I should use soft
for the mount options” but that’s not a good idea with read/write mounts. Instead that’s why we opt for hard
and intr
in our mount options which helps to alleviate this issue, but doesn’t totally solve it alone.
One more thing…
This took me a hilariously long time to figure out. If you find any bugs or can suggest improvements to this entire thing, including burning autofs
to the ground, I am open to feedback.
Thanks for stopping by!