How to check temperature of Mellanox Infiniband NIC with non-root users (Harder than you think!)

Recently, I had some cooling issues with Mellanox Infiniband NICs. I installed passive cooling NICs to water-cooling servers. Due to lack of airflow, those NICs start to throttle after reaching 80°C. I had to check its temperature frequently so that bandwidth degradation does not interfere with my experimental results.

Mellanox provides a tool mget_temp that prints the current temperature of the NIC. You simply enters PCI BDF of the NIC to use the tool:

$ lspci | grep Mellanox
c4:00.0 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6]
$ mget_temp -d c4:00.0
52

The problem is that the tool touches sysfs, so non-root users will get permission errors. But Linux provides a solution for this exact situation: setuid. If a file owned by root is flagged with setuid, it will temporarily have root privileges even if the file is executed by non-root users. As mget_temp is merely a script that actually executes mget_temp_ext, I copied it to a public directory and gave setuid flag.

$ cp /usr/bin/mget_temp_ext /tmp/mget_temp
$ ls -al /tmp/mget_temp
-rwxr-xr-x 1 root root 514016 Jun 18 14:48 /tmp/mget_temp
$ chmod u+s /tmp/mget_temp
$ ls -al /tmp/mget_temp
-rwsr-xr-x 1 root root 514016 Jun 18 14:48 /tmp/mget_temp

Strangely, it still printed a permission error when executed by non-root users.

$ su heehoon
$ /tmp/mget_temp -d c4:00.0
mopen: Permission denied

How is this possible? After debugging, I found out the executable is checking whether the uid is zero or not, and prints the permission error message if not. While the effective uid (euid) is zero with the setuid flag, the uid obtained by getuid is still the original user’s uid. The following is the relevant code section:

419013:       c6 84 24 bf 02 00 00    movb   $0x0,0x2bf(%rsp)
41901a:       00 
41901b:       e8 50 b5 fe ff          callq  404570 <getuid@plt>
419020:       85 c0                   test   %eax,%eax
419022:       0f 85 08 03 00 00       jne    419330 <_ZNSs7reserveEm@plt+0x145c0>

So… if I remove the call to getuid, will it work? I patched the 5-byte call instruction with xor %eax, %eax (31 c0) and nop (90) paddings. I used hexedit to modify the binary. The code looks like this after the patch:

419013:       c6 84 24 bf 02 00 00    movb   $0x0,0x2bf(%rsp)
41901a:       00 
41901b:       31 c0                   xor    %eax,%eax
41901d:       90                      nop
41901e:       90                      nop
41901f:       90                      nop
419020:       85 c0                   test   %eax,%eax
419022:       0f 85 08 03 00 00       jne    419330 <_ZNSs7reserveEm@plt+0x145c0>

Now it works like a charm even with non-root accounts!

Leave a comment

Your email address will not be published. Required fields are marked *