Kubernetes for Dummies: StatefulSet

Kubernetes provides virtualized infrastructure components such as storage, compute and network. Imagine an operating system that allows users to allocate resources and run their applications.

StatefulSet is a type of application that needs to persist information across lifetimes 🙂

The storage is exposed as a class and name. Storage class mimic real world and has meta information such as size, type and provider of the storage. Kubernetes is following plugin based approcah like Linux kernel VFS :-). You can act as a provider of storage and define a class for such storage.

Storage type could be block or file. Like Linux, a file type storage gets a mountpoint.

The following snippet defines a storage class of type SSD, using Google’s storage. Retain means we keep the data even after the node (pod) is down.

  ---
  kind: StorageClass
  apiVersion: storage.k8s.io/v1
  metadata:
    name: my-storage-class
  provisioner: kubernetes.io/gce-pd
  reclaimPolicy: Retain
  parameters:
    type: pd-ssd
    replication-type: none
  ---

gce-pd means Google Cloud Engine- Persistent Disk.

The Kubernetes template for the instance would also specify how to use the storage with a PersistentVolumeClaim. The idea is that the instance first claims a persistent volume of a particular class. It also mentions access type and needed size.

volumeClaimTemplates:
  - metadata:
      name: a-new-claim
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: "my-storage-class"
      resources:
        requests:
          storage: 1Gi

ReadWriteOnce means that the volume is mounted once for read and write. It just menas a regular storage space.

Now just use the claim under containers section.

containers:
      - name: nginx
        image: k8s.gcr.io/nginx-slim:0.8
        volumeMounts:
        - name: a-new-claim
          mountPath: /usr/share/nginx/html

You will have a mounted path /usr/share/nginx/html in the container. Behavior is similar to local mount exported by NFS 🙂

REFERENCES

Written with StackEdit.

Advertisements

Why Using Golang sync Pool is a Bad Idea?

Not an absolutely bad idea, but you need a careful understanding before using sync Pool.

Golang sync Pool is used to create a self-managed object pool, with just New/Get/Set functions. The Pool implementation uses a mutex based locking for thread-safe operations from multiple Go routines.

Design Principles
  • Golang sync. Pool use garbage collection to manage objects. The pool is essentially a list of user-defined blob objects that are GC’ed periodically.
  • The application uses Get/Put for using an object from the Pool.
  • The application defines a New method to create a new object.
  • The application does not bother about object memory management.
  • Unreferenced that is objects present in the Pool at a given time is GC’ed.
  • There is no finalizer for objects deletion.
  • Get randomly selected object from the list; if no object is available, creates a new object and returns it.
  • Put places an object back to the Pool.
  • The design wants to avoid hotspots and randomize object selections in Put and Get operations. So, Put and Get calls may not be co-related.
    • A Put op may randomly return without placing the object back to the Pool.
    • This intentional delay helps the Get to pick a new object.
What are Potential Problems with Pool
  • If the object has allocated resources (socket, buffer), there is no way to free them!
  • If the application has got an object and starts processing it with async operations, there is a risk of data corruption if the application returns the object to the Pool.
Best Use cases for Pool
  • Plain buffer management
  • Non-composite objects lifecycle management
Not to Use If
  • Objects that dynamically allocate resources (buffer, sockets)
  • Objects that are processed asynchronously.
References

Notes on Python Decorator

Notes on Python Decorator

Python decorator is a function wrapper. It helps you wrap essential tasks around your functions such as logging, ratelimits, metrics updates etc. However, it has a peculiar behavior due to its dynamic implementation.

If you run the following code, what is the expected output?

def register(func):
    print("hello to decorator!")
    def another():
    ┆   print("wrapped now!")
    ┆   func()
    return another

@register
def test():
    print("I am a test function")
    return ""

The interesting thing about this code is that it prints the line “hello to decorator” even without any function invocation.

But Why?

The reason is that during the decorator application to the function test interpreter would run all the code inside decorator and prepare a new function object. Decorator essentially creates new functions. Since the new function is created dynamically, Python VM runs all the code inside the wrapper (decorator) function and ultimately retunrs a new function object.

Another example

import sys

def first(func):
    print("hello sir!")
    return func


def second(func):
    print(sys.argv[0])
    for i in range(1,5):
    ┆   print(i)

    def another():
    ┆   print("hello sir!")
    ┆   func()
    return another


@first
def test():
    print("I am a test function1")
    return ""

# Should cause conflict, but doesn't
@first
def test():
    print("I am a test function1")
    return ""


# Another function with the same name
# Python VM does not complain of name conflict as
# the decorated function would assume a new name
@second
def test():
    print("I am a test function2")
    return ""

Output

$ python mydecor.py
hello sir!
hello sir!
mydecor.py
1
2
3
4

Things to look for

  • There is no name collison problem for a decorated function
  • A decorated function get a name of parent function and and offset
  • Decorator code is run even before main() is called

References

Sample code

Written with StackEdit.

Git Cheat Sheet

  • Describe a change number
    $ git show dc54076

  • Reset the current branch with HEAD of master

    $ git fetch origin
    $ git reset --hard origin/master
    
  • Keep your branch in sync with origina master branch
    $ git pull origin master

  • Arrange all changes in order. Fixes the log too.
    $ git rebase origin wip_my_work

  • Squash commits in one commit. 2 represents number of commits to review.
    $ git rebase -i HEAD~2

  • Compare two branches
    $ git diff master..my_branch

  • Modify a commit message
    $ git commit --amend

  • Find diff of local file against remote master

    $ git fetch origin master
    $ git diff origin/master -- [local-path]
    
  • Elaborate comment amend techniques

  • Stash current commits

    $ git stash
    $ git stash list
    $ git stash apply stash@{0}
    
  • Recover/ revert a file to original/master copy

    # For a committed file
    $ git checkout origin/master filename  
    
    # For a local, uncommitted file 
    $ git checkout -- filename
    
  • Delete a local branch

    $ git branch -d branch_name
    $ git branch -D branch_name
    
  • Squash local commits
    $git reset --soft HEAD~3 && git commit

  • Delete files from staging

    $ git reset file_name
    
  • Pull master’s changes to your local branch
    $git checkout my_branch
    $git rebase master

  • Push your local branch changes to master branch
    $git checkout my_branch
    $git merge master

  • Alias to add and commit a changed file

    $ git config --global alias.c "!git add -A && git commit -m"
    

Reference

Why ‘cd’ is not an external command like ls?

cd command changes the current directory to desired valid path. The shell is a process that runs user commands. Each command is run by forking a new process and exec’ing. After exec, the child process dies.

cd changes the current environment. If cd is an external command, the usual flow of fork->exec would cause environment change in the child process; the parent remains unchanged.

Hence cd is built and implemented by the shell itself.

Written with StackEdit.

Golang Essentials for C & C++ Developers: Part III

The Empty Interface

  • An empty interface is used when type of a variable is unknown.
  • There are scenarios such as printf function, raw buffers where you would not know the type.

Use case

// Definition of a map is --> var map[key]value
// If we do not know the type of the value and want
// to use the map for generic types, then an empty
// interface helps

// scenario 1: map value type is unknown
	var mymap map[int] interface{}
	mymap = make(map[int]interface{})
	mymap[10] = "python"
	mymap[21] = 22
	
	fmt.Println(mymap[10])
	fmt.Println(mymap[21])
// scenario 2: map key and value types are unknown
	var mymap1 map[interface{}] interface{}
	mymap1 = make(map[interface{}]interface{})
	mymap1[10] = "python"
	mymap1[21] = 22
	mymap1["test"] = 10
	fmt.Println(mymap1[10])
	fmt.Println(mymap1["test"])

An array with interface

var []arr interface{}{}

Written with StackEdit.

Why use Base64 Encoding?

What is Base64 encoding?

  • Given a stream of binary bits, it will encode 6-bits to a character from a set of 2 pow 6 (64 chracters).
  • Example “abcd”, the ASCII representation is 65666768.
  • [1000001][1000010][1000011][1000100]
  • Base64 would pics six continuous bits
  • 100000|| 110000|| 101000|| 011100||0100xx here xx would be 00 (padding)
  • gwocQ

Why use base64 encoding?

  • Transferring binary data in URLs
  • Tranferring binary data such as images as text
  • Transmit and store text that might cause delimiter collision.
    • Example is a random string followed by a delimiter (_) and a pattern and the code logic searches the delimiter to seperate the pattern.
    • The _ can appear in the generated random string too.
    • So encoding the random string in base64 would avoid such case.
    • Embed image in a XML

Time based Key Expiry in Redis

https://redis.io/commands/expire
It is a useful feature to expire keys based on their last access time. We can use it to develop interesting feature such as rate limits,

There are various rate limiting implementations.
https://github.com/redislabsdemo/RateLimiter/tree/master/src/com/redislabs/metering/ratelimiter

Written with StackEdit.

Netlink Sockets: Linux Kernel-User communication (PART I)

  • Ubuntu 14.04, Kernel version 3.11
  • Netlink sockets provide full duplex, asynchronous, low-overhead communication channel between user-kernel space processes.
  • Other solutions such as ioctl(), sysfs, UDP sockets are either blocking (hence expensive) or slow (UDP has more overhead compared to Netlink) and complex.
  • Netlink can carry data buffers on a return trip from kernel to user and vice-verse.
  • By nature, netlink sockets are non-blocking.
  • It provides sender and receiver queues to handle burst of messages.
  • User space APIs are exactly like ordinary sockets. You have to specify socket family as AF_NETLINK.
  • Kernel space API is netlink_kernel_create().
  • Default netlink queue size is 208K. To set a higher size of queue, run the following commands:
#default buffer size = 212992
echo 425984 > /proc/sys/net/core/wmem_max
echo 425984 > /proc/sys/net/core/wmem_default  
echo 425984 > /proc/sys/net/core/rmem_default
echo 425984 > /proc/sys/net/core/rmem_max
  • Netlink do not care about the data buffer you wish to send/receive.
  • It allows unicasting, multicasting and broadcasting of messages.

Linux Device Driver Development: Block Device Driver

It is my very first interaction with Linux kernel at device driver level. My objective is to develop a block device driver, very simple, that just forward I/O requests to a virtual device. This post explains my observations limited to attacking the problem.

Block v/s Character Device

Linux support block and character device drivers. Only block devices can host and support a filesystem. Block devices support random read/write operations. Each block is composed of sectors, usually 512 bytes long and uniquely addressable. Block is a logical entity. Filesystems usually use 4096 bytes blocks (8*512) or 8 sectors. In Linux kernel, a block device is represented as a logical entity (actually just a C structure). So, we can export anything as a device as long as we can facilitate read/writes operations on sector level.

Device driver is the layer that glues Linux kernel and the device. Kernel receives device targeted I/O requests from an application. All I/O requests pass through buffer cache and I/O scheduler. The latter arranges I/O requests optimally to improve seek time, assuming requests would run on a disk. In fact, Linux kernel has various I/O schedulers and hence multiple type of I/O request order could exist.

A device driver always implement a request queue. The Linux I/O scheduler enqueues requests in driver’s queue. How to serve these requests? That is device driver’s headache. The request queue is represented by the request_queue structure and is defined in “blkdev.h". Driver dequeues requests from this queue and send them to device. It then acknowledgement to each requests with error status.

If a device do not need optimal I/O order, it may opt for direct handing of I/O requests. An excellent example of such driver is loopback driver (loop.c, loop,h). It handles struct bio that stands for block I/O. A bio structure is a scatter gather list of page aligned buffer (usually 4K). Handling of bio structure is almost same as a struct req.

What are requirements for my driver

 

  • Runs on flash storage drives
  • Perform plain I/O forwarding
  • Minimal overhead, minimal code size

In my next post, I will discuss design of my driver.