Why Redis Pipelining is a Good Idea?

What is PipeLining?

Pipelining is a form of asynchronous task execution. A Pipeline is a task that is composed of many subtasks. Each subtask may be dependent on its previous subtasks.

t1 => t2 => t3
            ^
        t4 =|

t2 depends on t1. t3 is dependent on t2 & t4.

While we are executing subtask t3 for primary task S1, we can also execute subtask t1 for another primary task S2. This is a pipelined execution.

How Redis Pipelining helps in Client-Server communication?
Each network request between the client and server involves a latency named Round-Trip-Time. Redis server reads a request from a client and client waits till the server writes the response.
We can rather ask the client to send many requests at once and collectively wait for the responses.

This helps us achieve:

  • Ability to counter RTT of a network.
  • Better throughput for client
  • Saves multiple reads from server to client.
  • The server sends single write with all responses

Classic Usage

  • POP3 uses pipelining

References

Advertisements

Simplifying go-kit toolkit for Microservices – Part I

Introduction

go-kit is one of the most complete and flexible toolkits for developing microservices in Go language. At the same time, the learning curve of go-kit is steep. In this post, I’m trying to explain go-kit fundamental components using general purpose client-server model of Linux.

General Purpose Client-Server Architecture

A server in Linux binds & listens on a port, accepts connections and server requests. A client creates its own socket, makes a request to the server (IP: PORT). The request is just a buffer. The server spawns a thread or runs the handler in the same process. The handler is a function that understands request parameters and typecast it to its expected type.
The client receives a response from the server. The response is read into a buffer and then interpreted.

go-kit in Client-Server Paradigm

go-kit helps create a service. A service listens on a well-known port and IP. It is the server equivalent of Linux server. The server defines handlers for different types of request (PUT, GET, POST). A handler is a function called for a service request. The handler is a generic wrapper over service specific business logic implementation. A handler for GET call may eventually call get_status_reservation() in a sample reservation system.

go-kit intends to keep the core logic away from go-kit influence. The core logic (set of functions that your service implements) stays in a file called service.go. The remaining go-kit code tries to access these functions in an abstract manner. There are entities called endpoints, transport and server. Each of these allows a generic interaction of service functions through HTTP (or gRPC).

The overall objective is to expose service functionality through go-kit toolchain.

Transport

It defines the send and receiver buffer structure. Each service API could expect and send different data. The transport defines structures that define request and response for each API

In the next post, I will share endpoints, the most interesting part of go-kit.

Why Using Golang sync Pool is a Bad Idea?

Not an absolutely bad idea, but you need a careful understanding before using sync Pool.

Golang sync Pool is used to create a self-managed object pool, with just New/Get/Set functions. The Pool implementation uses a mutex based locking for thread-safe operations from multiple Go routines.

Design Principles
  • Golang sync. Pool use garbage collection to manage objects. The pool is essentially a list of user-defined blob objects that are GC’ed periodically.
  • The application uses Get/Put for using an object from the Pool.
  • The application defines a New method to create a new object.
  • The application does not bother about object memory management.
  • Unreferenced that is objects present in the Pool at a given time is GC’ed.
  • There is no finalizer for objects deletion.
  • Get randomly selected object from the list; if no object is available, creates a new object and returns it.
  • Put places an object back to the Pool.
  • The design wants to avoid hotspots and randomize object selections in Put and Get operations. So, Put and Get calls may not be co-related.
    • A Put op may randomly return without placing the object back to the Pool.
    • This intentional delay helps the Get to pick a new object.
What are Potential Problems with Pool
  • If the object has allocated resources (socket, buffer), there is no way to free them!
  • If the application has got an object and starts processing it with async operations, there is a risk of data corruption if the application returns the object to the Pool.
Best Use cases for Pool
  • Plain buffer management
  • Non-composite objects lifecycle management
Not to Use If
  • Objects that dynamically allocate resources (buffer, sockets)
  • Objects that are processed asynchronously.
References

Notes on Python Decorator

Notes on Python Decorator

Python decorator is a function wrapper. It helps you wrap essential tasks around your functions such as logging, ratelimits, metrics updates etc. However, it has a peculiar behavior due to its dynamic implementation.

If you run the following code, what is the expected output?

def register(func):
    print("hello to decorator!")
    def another():
    ┆   print("wrapped now!")
    ┆   func()
    return another

@register
def test():
    print("I am a test function")
    return ""

The interesting thing about this code is that it prints the line “hello to decorator” even without any function invocation.

But Why?

The reason is that during the decorator application to the function test interpreter would run all the code inside decorator and prepare a new function object. Decorator essentially creates new functions. Since the new function is created dynamically, Python VM runs all the code inside the wrapper (decorator) function and ultimately retunrs a new function object.

Another example

import sys

def first(func):
    print("hello sir!")
    return func


def second(func):
    print(sys.argv[0])
    for i in range(1,5):
    ┆   print(i)

    def another():
    ┆   print("hello sir!")
    ┆   func()
    return another


@first
def test():
    print("I am a test function1")
    return ""

# Should cause conflict, but doesn't
@first
def test():
    print("I am a test function1")
    return ""


# Another function with the same name
# Python VM does not complain of name conflict as
# the decorated function would assume a new name
@second
def test():
    print("I am a test function2")
    return ""

Output

$ python mydecor.py
hello sir!
hello sir!
mydecor.py
1
2
3
4

Things to look for

  • There is no name collison problem for a decorated function
  • A decorated function get a name of parent function and and offset
  • Decorator code is run even before main() is called

References

Sample code

Written with StackEdit.

Why ‘cd’ is not an external command like ls?

cd command changes the current directory to desired valid path. The shell is a process that runs user commands. Each command is run by forking a new process and exec’ing. After exec, the child process dies.

cd changes the current environment. If cd is an external command, the usual flow of fork->exec would cause environment change in the child process; the parent remains unchanged.

Hence cd is built and implemented by the shell itself.

Written with StackEdit.

Go Runtime Scheduler Design Internals

Concurrency is one of the most exciting features of Go language. A single threaded program runs serially, but if you have tasks that can run concurrently, you create threads for such tasks. Threads execute independently and progress concurrently. Go supports creation of thousands of threads in an application!! How is it possible? It’s Go’s runtime. Go programs are compiled and the executable is self-contained (runtime is linked with the binary).

Let’s understand the design motivation of the Go runtime. The runtime must follow the resource constraints. The system must run multiple threads. CPU core can run only one thread a time and if there are more threads than available cores, threads are paused/resumes (context switched). During a context switch, thread execution state is preserved and another thread is loaded. Creation of a thread requires resources and hence there is a limit.

Under the constraints, Go runtime maximise CPU utilisation, minimises latencies and memory footprint.

Go provides concurrency with language primitives of Goroutines and channels. Using Goroutines applications can grow dynamically (forking new Goroutines). Channels are internal to Go runtime and system has no knowledge of channels.

Let’s understand Goroutines in detail. It is essentially a light weight thread, exists on the user space. Goroutines are frugal with resources. Unlike system threads, a Goroutine has a small stack and it grows as needed. It is one of the reasons that Go can support creation of thousands of Goroutines.

So how are thousands of Goroutines managed by the runtime? Instead of delegating the responsibility to the system scheduler, Go uses its own scheduler. It is coded in Go itself.

How does Go scheduler work? Let’s understand thread model used in applications. An application can use system thread, managed by the OS. These threads can make system call and access system resources (e.g. CPU). However, these threads are expensive as they consume system resources such as signal masks, PID, cgroup etc. Context switch is expensive as they trap in to kernel. In contrast, user threads are created and managed by the application, consume less resources and context switch is fast because it does not go through the kernel. A user thread needs a system thread to execute the code on CPU or accessing any other system resources.  

Now the next decision is to maintain a ratio of user threads and system threads. The first model is using N user threads and one system threads. It gives fast context switching but can’t use multiple cores (if available). Also if a user thread is blocked hence there is no available system thread, other user threads will wait. Other scheme is to have 1-on-1 mapping of user threads to system threads. It provides good CPU utilisation but context switching is slow. Then third option is to create many-to-many mapping.

Go takes the third option. Goroutines are distributed on a set of system/OS threads. There are three major entities in the Go scheduler: A set of machines (M), Goroutines (G) and processors (P). There are minor entities such as global and local run queue and thread cache.

Let’s understand “MGP”: Machines are a representation for OS threads. An OS thread is part of a pool of worker threads. On Linux, it is a standard POSIX thread. A M runs a set of Gs. A G Goroutine represents a user space Goroutine; it has its own IP, stack and blocking info. The P represents a logical entity for available processors or a scheduling context. Every worker (M) needs a P to run G. The number of P is pre-decided (GOMAXPROCS) and fixed during the run. 

A M runs a set of Gs. A G Goroutine represents a user space Goroutine; it has its own IP, stack and blocking info. The P represents a logical entity for available processors or a scheduling context. Every worker (M) needs a P to run G. The number of P is pre-decided (GOMAXPROCS) and fixed during the run. 

In this diagram, we have a setup with 2 Processors, defined by GOMAXPROCS. There are two worker machine (M). Each M has a local run queue. As new Goroutines are created and is runnable, they are added to the local run queue. If local run queue is full, new G’s are added to the global run queue. Idle G are kept in an idle queue.

What happens when a G makes a system call? The scheduler would know that G is blocked and hence its M is also blocked. P is not utilised, and can be used by some other M.

The scheduler takes back the P, and creates a new M, and assign to the new worker machine (M). The runnable queue is also moved to the new worker. When M’ comes back, it would try to find an idle P. If not possible, it would move the G’ to the global queue and park itself in Thread cache. The scheduler makes sure there are enough threads to run all contexts. There can be more than M even for a P=1, because a worker might get stuck in a syscall.

A scenario can occur in which a P is idle as its run queue is exhausted. In such was it would try to pick G’s from global queue. If global queue is also empty, what to do?  There are two major scheduling paradigms for distributing work. The first is work-sharing in which G would get distributed to other P’s. In contrast, work-stealing scheduler an idle processor would steal G’s from other P’s run queue. Till a P is busy, there is no G movement. So, an idle P would steal about half of Gs from another P. Work Stealing has better resource utilisation and lower migration of Gs.

In this diagram, we have an idle P, local and global run queue are empty, so it will randomly pick a P and steal half of Gs.

Go has a sophisticated scheduler, it offers massive concurrency and intelligent scheduling and always tries to achieve maximum utilisation, minimum latencies.