Netlink Sockets: Linux Kernel-User communication (PART I)

  • Ubuntu 14.04, Kernel version 3.11
  • Netlink sockets provide full duplex, asynchronous, low-overhead communication channel between user-kernel space processes.
  • Other solutions such as ioctl(), sysfs, UDP sockets are either blocking (hence expensive) or slow (UDP has more overhead compared to Netlink) and complex.
  • Netlink can carry data buffers on a return trip from kernel to user and vice-verse.
  • By nature, netlink sockets are non-blocking.
  • It provides sender and receiver queues to handle burst of messages.
  • User space APIs are exactly like ordinary sockets. You have to specify socket family as AF_NETLINK.
  • Kernel space API is netlink_kernel_create().
  • Default netlink queue size is 208K. To set a higher size of queue, run the following commands:
#default buffer size = 212992
echo 425984 > /proc/sys/net/core/wmem_max
echo 425984 > /proc/sys/net/core/wmem_default  
echo 425984 > /proc/sys/net/core/rmem_default
echo 425984 > /proc/sys/net/core/rmem_max
  • Netlink do not care about the data buffer you wish to send/receive.
  • It allows unicasting, multicasting and broadcasting of messages.

Extents in “ext3” file system

My Ubuntu Linux ship with ext3 file-system. This FS is very similar to classical model explained in UNIX OS. A file is logically arranged in a set of blocks, managed through an array of block pointers. In ext3, each inode has an array of fifteen elements. Twelve elements of this array point to a disk block. Usually, a disk block is configured to 4KB. Thus twelve such 4KB blocks (L0/ level 0) could be pointed by these array entries (i.e. a file of 48BK is always contained in L0 blocks).

As soon as a new block is allocated for this file, thirteenth element of this array comes to play. It is called L1/level1 block and keep pointers to 1024 L0 blocks. Thirteenth request creates an entry in L1 blocks. Fourteenth and fifteenth entry serve L2 and L3 blocks.

Instead of allocating a block at a time, Linux has optimized the way disk blocks are allocated for a file with “extents”. An extent is a contiguous set of disk blocks. Kernel allocates an extent of blocks.

Lets create a file and see how does Linux handle block-allocation:

1 #include <fcntl.h>
2 int main()
3 {
4   int i;
5   char buf[4096];
7   memset(buf, ‘a’, 4096);
8   int fd = open(“foo.txt”, O_CREAT|O_WRONLY|O_TRUNC);
10   for (i =0; i < 2; i++) {
11     write(fd, buf, 4096);
12   }
13   write(fd, “end”, 3);
14   close(fd);
15 }

We have created a 8192 +3  = 8195 bytes file.

kanaujia@ubuntu:~/Desktop/ToKeep/cprogs$ ls -l foo.txt
———- 1 kanaujia kanaujia 8195 2011-10-16 10:42 foo.txt

How much space this file occupy on disk?

kanaujia@ubuntu:~/Desktop/ToKeep/cprogs$ du -h !$
du -h foo.txt
12K    foo.txt

That means I need three disk blocks of 4KB to store this file. Now let’s see how Linux allot these block in a single extent. The ioctl() call has FS_IOC_FIEMAP flag that provides facility to get access to this information from user space. A file extent map structure is defined as follows:


struct fiemap {
  28        __u64 fm_start;         /* logical offset (inclusive) at
  29                                           * which to start mapping (in) */
  30        __u64 fm_length;        /* logical length of mapping which
  31                                              * userspace wants (in) */
  32        __u32 fm_flags;         /* FIEMAP_FLAG_* flags for request (in/out) */
  33        __u32 fm_mapped_extents; /* number of extents that were mapped (out) */
  34        __u32 fm_extent_count;  /* size of fm_extents array (in) */
  35        __u32 fm_reserved;
  36        struct fiemap_extent fm_extents[0]; /* array of mapped extents (out) */

A simple C program to fill-up this structure would fetch you extent information.

For my file, I got the following data:

kanaujia@ubuntu:~/Desktop/ToKeep/cprogs$ ./fiemap ./foo.txt
File ./foo.txt has 1 extents:
#    Logical                        Physical                           Length           Flags
0:    0000000000000000 0000000000000000 0000000000003000 0007

We have only one extent to accommodate this file. This extent spans three disk block as the length is 0x3000 or 12KB.