How to Select a Random Key from a Hash Map in Constant Time?

Premise

A hashmap has a time complexity of O(1) for all operations.

Problem

You have to find a constant time method to uniformly random selection of a key in a Hash map.

Assumptions

  1. The map can grow to memory size.
  2. You can use any readymade hash map.

Solution

I’m discussing the pseudo code for a Python solution. All operations work in constant time except remove().

randMap = {}

def insert(k, v):
    randMap[k] = v    

The random must pick a random key. But how? We can store all the keys in a list and then run random() on the list indexes.

randMap = {}
indexMap = {}
keyList = []
last = 0
def insert(k, v):
    randMap[k] = v
    keyList.append(k)
    last += 1
    
    # Maintain index of each key
    indexMap[k] = len(keyList)    
def getRandom():
    start = 0
    end = len(keyList) - 1
    randomIndex = random(start, end)
    key = keyList[randomIndex]
    
    return key

How do you delete a key?

This is the crux of the problem. Deletion of a key would need us to change the index array too. That means we need to shift all elements of the array and complexity would shoot to O(n).

def remove(k):
    # Remove from the hash map
    randMap.pop(k, None)
    
    # Remove from key list
    index = indexMap[k]
    
    # This is O(n) operation
    keyList.remove(index)

    # Pop the index too
    indexMap.pop(k) 

How to improve deletion?

  1. Delete the entry and mark the entry invalid. The probability distribution does not change, however with more and more deletions, you have a sparse array. Thus you would need multiple getRandom() to get a valid key.
  2. Move the last element to the deleted element. Adjust the index of the last element.
def remove(k):
    # Remove from the hash map
    randMap.pop(k, None)

    index = indexMap[k]
    indexMap[last] = index
    
    # This is O(1) operation
    keyList[index] = keyLast[last]

    # Pop the index too
    indexMap.pop(k)
    last -= 1 

This is a constant time solution.

Advertisements

How to Write a Conference Scheduler

The conference schedule is an NP problem since it is O(2**n). A greedy approach is pragmatic to solve the problem. We can pick various strategies to prepare a schedule such as the biggest talk first, smallest talk first, random selection, etc.

The Python code uses the Longest talk first and random selection to prepare schedules.

Github Code

https://github.com/vishalkanaujia/dev-tools/tree/master/conference_scheduler

Written with StackEdit.

Internals of Linux Process Signals

Linux Signals

  • A process in Linux is represented as task_struct.
  • A process A sends a signal to process B using system call kill() or kill -<sig num> <pid>.
  • Kernel updates the task_struct of the receiving process B. It changes the signal field with the passed value.
  • The scheduler checks the signals on a process before running the process. If the signal is set, it calls the signal handler.
  • A process can define its handlers for all signals except SIGKILL and SIGSTOP.
  • There is always a default signal handler for each signal.

References

Notes on String & Encoding Techniques

String and their encoding decide the languages the code can support.

Introduction

  • We have many languages and their symbols that need more than 8-bits (ASCII) for binary representation.
  • Encoding adds semantics to a set of bytes.
  • Unicode is a table of all characters and their numeric equivalent.
  • Since there are more than 100k symbols, 8-bits are not enough.

What is UTF-8

  • Unicode representation of a string needs 12-bits, padded to 32-bit.
  • The idea of UTF-8 is a variable encoding of symbols. The common ASCII symbols can use just 8-bits and other extended symbols can use up 1,2,3 or 4-bytes.
  • UTF-8 is more memory efficient compared to UTF-16 which uses minimum two bytes for a symbol.

How do Programs Use an Encoding?

  • The program picks a library for string encoding.
  • UTF is the most common library used.
  • The number of bytes in encoded string is decided by number of leading 1s in the array.

Test

见/見
Unicode= \U+FFE8\U+FFA7/\U+FFE8\U+FFA6

References

Written with StackEdit.

How to Change View of Coding Block in WordPress?

Problem

Many times a good WordPress theme does not render code properly. There are problems like too big font making it harder to read the text. I use StackEdit to prepare posts and publish to my blog.

Solution

There is a way to solve this problem with custom CSS.

Description

The posting source is in Markdown. StackEdit translates MarkDown to HTML and posts to the destination. There is not much I could do with the Markdown. However, the theme on the WordPress blog allows me to set a CSS for particular content.
The code blocks are translated to pre blocks in the HTML. So adding a custom rendering for pre block should fix the problem.
I found a CSS snippet from https://wp-mix.com/css-style-pre-tags/.

pre {
	box-sizing: border-box;
	width: 100%;
	padding: 0;
	margin: 0;
	overflow: auto;
	overflow-y: hidden;
	font-size: 12px;
	line-height: 20px;
	background: #efefef;
	border: 1px solid #777;
	background: url('lines.png') repeat 0 0;
	padding: 10px;
	color: #333;
}

Another good reference is https://perishablepress.com/perfect-pre-tags/

This made the code look much much better.

Written with StackEdit.

How to Pick a Commit of Git Submodule?

Current State

There is a new submodule ‘test’ and we want to add it to the repo.

Steps

Git add submodule

$ cd my_repo
$ git submodule add git@github.com:vishalkanaujia/dev-tools.git

Initialize the repo

$ git submodule update --init --recursive

cd to the submodule

$ cd dev-tools

By default, the latest version is picked. We can checkout to a commit for moving a different version.

$ git checkout abdcca6

Written with StackEdit.

The Best Go Library for Redis

There are two popular choices:

  • go-redis
  • redigo

I prefer redigo over go-redis.

Cons with go-redis

  • Not enough documentation for APIs
  • Difficult to understand naming conventions for APIs
  • I’m still finding how to refresh expiry time of a key using go-redis
  • Needs Cmdable interface for calling Redis commands. The interface is not so well designed.

An interface in go-redis

   type Cmdable interface {
   TTL(key string) *DurationCmd
   Restore(key string, ttl time.Duration, value string) *StatusCmd

Pros with redigo

  • Easy to understand APIs
  • Ability to call Redis commands from the client so you never have to learn a new wrapper
  • No extra interface like go-redis

A sample client with go-redis is as following:

  package main

  import (
      "fmt"

      "github.com/go-redis/redis"
  )

  func HandleErr(err error) {
      if err != nil {
          panic(err)
      }
  }

  /*
  type DurationCmd struct {
      baseCmd

      val       time.Duration
      precision time.Duration
  }
  */

  func ExampleNewClient() {
      client := redis.NewClient(&redis.Options{
          Addr:     "localhost:6379",
          Password: "", // no password set
          DB:       0,  // use default DB
      })
      
      err = client.Set("key", "value", 0).Err()
      HandleErr(err)

      val, err := client.Get("key").Result()
      HandleErr(err)
      fmt.Println("key", val)

      ttl := client.TTL("key")
      fmt.Println("ttl", ttl)
  }

  func main() {
      ExampleNewClient()
  }

Simplifying go-kit toolkit for Microservices – Part I

Introduction

go-kit is one of the most complete and flexible toolkits for developing microservices in Go language. At the same time, the learning curve of go-kit is steep. In this post, I’m trying to explain go-kit fundamental components using general purpose client-server model of Linux.

General Purpose Client-Server Architecture

A server in Linux binds & listens on a port, accepts connections and server requests. A client creates its own socket, makes a request to the server (IP: PORT). The request is just a buffer. The server spawns a thread or runs the handler in the same process. The handler is a function that understands request parameters and typecast it to its expected type.
The client receives a response from the server. The response is read into a buffer and then interpreted.

go-kit in Client-Server Paradigm

go-kit helps create a service. A service listens on a well-known port and IP. It is the server equivalent of Linux server. The server defines handlers for different types of request (PUT, GET, POST). A handler is a function called for a service request. The handler is a generic wrapper over service specific business logic implementation. A handler for GET call may eventually call get_status_reservation() in a sample reservation system.

go-kit intends to keep the core logic away from go-kit influence. The core logic (set of functions that your service implements) stays in a file called service.go. The remaining go-kit code tries to access these functions in an abstract manner. There are entities called endpoints, transport and server. Each of these allows a generic interaction of service functions through HTTP (or gRPC).

The overall objective is to expose service functionality through go-kit toolchain.

Transport

It defines the send and receiver buffer structure. Each service API could expect and send different data. The transport defines structures that define request and response for each API

In the next post, I will share endpoints, the most interesting part of go-kit.