Hephaestian Sea

`mount` in Python

The way people usually control filesystem mounts in a Linux environment is with the mount and umount command line tools. These are universally present on every possible computer since they are even part of busybox.

These tools are fine. Nothing wrong with them. But what if you wanted to mount a file system from Python? You could (and should?) just subprocess mount. This would be the efficient production-ready thing to do, all ready in like 5 seconds. Look:

import subprocess
subprocess.check_call(["mount", "/dev/sdb1", "/mnt"])

Very high quality engineering—I even took the time to use check_call instead of call so failures are not silent.


So, anyway, I hate it. I'm too good for subprocesses, they are not cool and aesthetic. They print garbage to my standard IO. What does mount actually do? Can we do it in a way that enlarges our bloated egos?

We turn to the masochistic art of reading source code. In the util-linux version of mount, there are like 100 layers of indirection with a "hooks" dispatch system that tries to guess something about filesystems or whatever and then calls one of a dozen different libc functions. This is convincing me that we are in fact morally correct and superior for trying to do things differently. Skipping all the pointless grepping, here is the "legacy" hook which calls mount(2). We're not gonna support all the other crap because, realistically, nobody uses it.

BTW the version from busybox is much much simpler, as it seems they entirely forgot to obfuscate their program and left the mount(2) call out in the open.

mount(2)

The "2" in the name means "System Call" according to the man man. This means it's a kernel thing and we have to stop our bicycle reinvention here.


OK so we need to call mount(2). You can't really do a syscall from Python because there is no inline assembly, but thankfully the C "standard" library has wrappers for this stuff.

To use the C "standard" library, we need to do FFI to it. There are a bunch of ways of doing that but they are all annoying except for ctypes. It'll be OK for really simple stuff like making one syscall but FYI it's probably the worst option for properly interfacing with a C library from Python.

One more thing before we start, man the mount to do some due diligence on the API. The discovery is entirely predictable but no less disgusting—errors are returned in the accursed errno global variable. Since we can't touch C variables with ctypes, we need to find and use the magical use_errno parameter.

At last, the spell:

from ctypes import CDLL, get_errno
import errno

libc = CDLL("libc.so.6", use_errno=True)
res = libc.mount(
    # source
    b"/dev/sdb1",
    # target
    b"/mnt",
    # file system type
    b"ext4",
    # flags
    0,
    # options (data)
    None,
)

if res == -1:
    err = get_errno()
    raise RuntimeError(
        f"failed to mount device: {errno.errorcode.get(err, '?')} ({err})"
    )

It is buzzing with efficiency gains!!! We probably saved like 30ns or something crazy like that. Think about it—no process structures allocated in the kernel, no memory management, no dynamic linker program, no command line parsing. CDLL might do some of those things but I'm not gonna spoil the celebration. It's definitely much faster, believe me. No I'm not gonna benchmark it.


Note that the parameters have to be bytes and not str so call encode("utf-8").

You also have to know your filesystem, but I'll give you this one for free:

libblkid = CDLL("libblkid.so.1")
libblkid.blkid_get_tag_value.restype = ctypes.c_char_p

fs_type = libblkid.blkid_get_tag_value(
    # cache
    None,
    # tag name
    b"TYPE",
    # device path
    "/dev/sdb1".encode("ascii"),
)

Everything is also very not type safe so if you try to pass in a sufficiently weird parameter it will simply SEGFAULT.

Yay! So efficient! No exception overhead! So exciting!

#programming #python #stupid