Tag Archives: python

Python Code To Find Median of Sorted Arrays of Equal Size

The following code has two implementations of finding median of two sorted arrays in O(n) and O(log n).

Reference

Advertisements

How to Select a Random Key from a Hash Map in Constant Time?

Premise

A hashmap has a time complexity of O(1) for all operations.

Problem

You have to find a constant time method to uniformly random selection of a key in a Hash map.

Assumptions

  1. The map can grow to memory size.
  2. You can use any readymade hash map.

Solution

I’m discussing the pseudo code for a Python solution. All operations work in constant time except remove().

randMap = {}

def insert(k, v):
    randMap[k] = v    

The random must pick a random key. But how? We can store all the keys in a list and then run random() on the list indexes.

randMap = {}
indexMap = {}
keyList = []
last = 0
def insert(k, v):
    randMap[k] = v
    keyList.append(k)
    last += 1
    
    # Maintain index of each key
    indexMap[k] = len(keyList)    
def getRandom():
    start = 0
    end = len(keyList) - 1
    randomIndex = random(start, end)
    key = keyList[randomIndex]
    
    return key

How do you delete a key?

This is the crux of the problem. Deletion of a key would need us to change the index array too. That means we need to shift all elements of the array and complexity would shoot to O(n).

def remove(k):
    # Remove from the hash map
    randMap.pop(k, None)
    
    # Remove from key list
    index = indexMap[k]
    
    # This is O(n) operation
    keyList.remove(index)

    # Pop the index too
    indexMap.pop(k) 

How to improve deletion?

  1. Delete the entry and mark the entry invalid. The probability distribution does not change, however with more and more deletions, you have a sparse array. Thus you would need multiple getRandom() to get a valid key.
  2. Move the last element to the deleted element. Adjust the index of the last element.
def remove(k):
    # Remove from the hash map
    randMap.pop(k, None)

    index = indexMap[k]
    indexMap[last] = index
    
    # This is O(1) operation
    keyList[index] = keyLast[last]

    # Pop the index too
    indexMap.pop(k)
    last -= 1 

This is a constant time solution.

How to Write a Conference Scheduler

The conference schedule is an NP problem since it is O(2**n). A greedy approach is pragmatic to solve the problem. We can pick various strategies to prepare a schedule such as the biggest talk first, smallest talk first, random selection, etc.

The Python code uses the Longest talk first and random selection to prepare schedules.

Github Code

https://github.com/vishalkanaujia/dev-tools/tree/master/conference_scheduler

Written with StackEdit.

Notes on Python Decorator

Notes on Python Decorator

Python decorator is a function wrapper. It helps you wrap essential tasks around your functions such as logging, ratelimits, metrics updates etc. However, it has a peculiar behavior due to its dynamic implementation.

If you run the following code, what is the expected output?

def register(func):
    print("hello to decorator!")
    def another():
    ┆   print("wrapped now!")
    ┆   func()
    return another

@register
def test():
    print("I am a test function")
    return ""

The interesting thing about this code is that it prints the line “hello to decorator” even without any function invocation.

But Why?

The reason is that during the decorator application to the function test interpreter would run all the code inside decorator and prepare a new function object. Decorator essentially creates new functions. Since the new function is created dynamically, Python VM runs all the code inside the wrapper (decorator) function and ultimately retunrs a new function object.

Another example

import sys

def first(func):
    print("hello sir!")
    return func


def second(func):
    print(sys.argv[0])
    for i in range(1,5):
    ┆   print(i)

    def another():
    ┆   print("hello sir!")
    ┆   func()
    return another


@first
def test():
    print("I am a test function1")
    return ""

# Should cause conflict, but doesn't
@first
def test():
    print("I am a test function1")
    return ""


# Another function with the same name
# Python VM does not complain of name conflict as
# the decorated function would assume a new name
@second
def test():
    print("I am a test function2")
    return ""

Output

$ python mydecor.py
hello sir!
hello sir!
mydecor.py
1
2
3
4

Things to look for

  • There is no name collison problem for a decorated function
  • A decorated function get a name of parent function and and offset
  • Decorator code is run even before main() is called

References

Sample code

Written with StackEdit.

MySQL & Python- Error: 2006 mysql has gone away

MySQL & Python: Error: 2006 mysql has gone away

This problem occurs for multiple reasons such as DB connection problem. In our code, we hit this issue due to a subtle problem with DB cursor.

The code was as following:

with conn as cur:
    try:
        print "hello"
    except:
        print "sorry"
    finally: 
        conn.close()

The above code would throw the error 2016 mysql has gone away exception. The problem lies in with conn as cur. This statement creates a cursor on the DB and the cursor object autmatically gets destroyed.
Here, we are closing the DB connection before the automatic destruction happened.

So since connection was invalid(closed), cursor deletion hit an exception.

The solution is to close the connection after cursor object deletion.

try:
    with conn as cur:
        print "hello"
except:
    print "sorry"
finally: 
    conn.close()

Written with StackEdit.

Google App Engine: Data store with async calls

  • Google Data Store provides single core CPU  (F class) for applications.
  • Hence heavily multi-threaded applications can’t scale enough
  • DataStoreprovidesasync calls for DB access. It has:
    • Async queries
    • Async transactions
    • Async read/writes
  • Documentation is good but lacks clarity on usage and benefits on async calls. It says that async queries can run in parallel after immediately returning a temporary result object. But how?
  • I think Data Store can time share multiple requests, hence improving responsiveness and fairness.
  • How to convert sync calls to async:
  • Please refer Source: Google Developer website
    • I got confused at async queries. It uses run() menthod on a query to make it async. A get() method is a sync call.
    • Difference between run() and get() is that:
      • run() returns an iterator over results
      • run() pre-fetches results with a batch read, so access to elements in results is faster
      • get() is a sync call and returns the first result from result set
      • run(limit=1) == get()

 

q1 = Salesperson.all().filter('date_of_hire <', one_month_ago)
# Returns instantly, query is executing in the background.
recent_hires = q1.run()

# q1 and recent_hires are both iterable objects. Your application should iterate over 'recent_hires' to enjoy benefits of pre-fetched reads.
##### for h in recent_hires:
# instead of
##### for h in q1:

Dissecting Python List allocator (CPython)

List is one of the most frequently used data structure in Python applications. The code for Python list object allocator is present in Python-2.7.5/Objects/listobject.c.

Python allocates list objects in heap and uses underlying malloc based APIs. List is a container of Python object references. In plain C terminology. it is a dynamic array of pointers.

Let’s understand basic creation and insertion APIs.

  • list_resize(): A list is of type PyListObject and is equivalent to PyObject. Every resize request allocates more than asked size.

Allocation unit size is : (size_requested + (size_requested << 3 + (size_requested < 9 : 3 or 6) ).

It gives a growth pattern of (0, 4, 8, 16, 25, 35, 46, 58, 72, 88).

  • free_list: Python keeps PyList_MAXFREELIST (defined to value 80) of free lists to save on lists allocation and deallocation.
  • PyList_New(): A new list allocation requests is served from either a free list or a new GC tracked object creation of type PyListObject. Next, a malloc request is made for requested list size.

Size calculated: nbytes = size * sizeof(PyObject *);

Object allocation: PyListObject *op = free_list[numfree] OR PyObject_GC_New(PyListObject, &PyList_Type);

Memory allocation for references: op->ob_item = (PyObject **) PyMem_MALLOC(nbytes);