# Tagpython

## Python Code To Find Median of Sorted Arrays of Equal Size

The following code has two implementations of finding median of two sorted arrays in `O(n)` and `O(log n)`.

## How to Select a Random Key from a Hash Map in Constant Time?

### Premise

A hashmap has a time complexity of O(1) for all operations.

### Problem

You have to find a constant time method to uniformly random selection of a key in a Hash map.

### Assumptions

1. The map can grow to memory size.

### Solution

I’m discussing the pseudo code for a Python solution. All operations work in constant time except `remove()`.

``````randMap = {}

def insert(k, v):
randMap[k] = v
``````

The random must pick a random key. But how? We can store all the keys in a list and then run `random()` on the list indexes.

``````randMap = {}
indexMap = {}
keyList = []
last = 0
def insert(k, v):
randMap[k] = v
keyList.append(k)
last += 1

# Maintain index of each key
indexMap[k] = len(keyList)
``````
``````def getRandom():
start = 0
end = len(keyList) - 1
randomIndex = random(start, end)
key = keyList[randomIndex]

return key
``````

### How do you delete a key?

This is the crux of the problem. Deletion of a key would need us to change the index array too. That means we need to shift all elements of the array and complexity would shoot to O(n).

``````def remove(k):
# Remove from the hash map
randMap.pop(k, None)

# Remove from key list
index = indexMap[k]

# This is O(n) operation
keyList.remove(index)

# Pop the index too
indexMap.pop(k)
``````

### How to improve deletion?

1. Delete the entry and mark the entry invalid. The probability distribution does not change, however with more and more deletions, you have a sparse array. Thus you would need multiple `getRandom()` to get a valid key.
2. Move the last element to the deleted element. Adjust the index of the last element.
``````def remove(k):
# Remove from the hash map
randMap.pop(k, None)

index = indexMap[k]
indexMap[last] = index

# This is O(1) operation
keyList[index] = keyLast[last]

# Pop the index too
indexMap.pop(k)
last -= 1
``````

This is a constant time solution.

## How to Write a Conference Scheduler

The conference schedule is an NP problem since it is O(2**n). A greedy approach is pragmatic to solve the problem. We can pick various strategies to prepare a schedule such as the biggest talk first, smallest talk first, random selection, etc.

The Python code uses the Longest talk first and random selection to prepare schedules.

## Github Code

https://github.com/vishalkanaujia/dev-tools/tree/master/conference_scheduler

Written with StackEdit.

## Notes on Python Decorator

Notes on Python Decorator

Python decorator is a function wrapper. It helps you wrap essential tasks around your functions such as logging, ratelimits, metrics updates etc. However, it has a peculiar behavior due to its dynamic implementation.

If you run the following code, what is the expected output?

``````def register(func):
print("hello to decorator!")
def another():
┆   print("wrapped now!")
┆   func()
return another

@register
def test():
print("I am a test function")
return ""
``````

The interesting thing about this code is that it prints the line “hello to decorator” even without any function invocation.

#### But Why?

The reason is that during the decorator application to the function test interpreter would run all the code inside decorator and prepare a new function object. Decorator essentially creates new functions. Since the new function is created dynamically, Python VM runs all the code inside the wrapper (decorator) function and ultimately retunrs a new function object.

#### Another example

``````import sys

def first(func):
print("hello sir!")
return func

def second(func):
print(sys.argv[0])
for i in range(1,5):
┆   print(i)

def another():
┆   print("hello sir!")
┆   func()
return another

@first
def test():
print("I am a test function1")
return ""

# Should cause conflict, but doesn't
@first
def test():
print("I am a test function1")
return ""

# Another function with the same name
# Python VM does not complain of name conflict as
# the decorated function would assume a new name
@second
def test():
print("I am a test function2")
return ""
``````

Output

``````\$ python mydecor.py
hello sir!
hello sir!
mydecor.py
1
2
3
4
``````

#### Things to look for

• There is no name collison problem for a decorated function
• A decorated function get a name of parent function and and offset
• Decorator code is run even before `main()` is called

#### References

Sample code

Written with StackEdit.

## MySQL & Python: Error: 2006 mysql has gone away

This problem occurs for multiple reasons such as DB connection problem. In our code, we hit this issue due to a subtle problem with DB cursor.

The code was as following:

``````with conn as cur:
try:
print "hello"
except:
print "sorry"
finally:
conn.close()
``````

The above code would throw the error 2016 mysql has gone away exception. The problem lies in `with conn as cur`. This statement creates a cursor on the DB and the cursor object autmatically gets destroyed.
Here, we are closing the DB connection before the automatic destruction happened.

So since connection was invalid(closed), cursor deletion hit an exception.

The solution is to close the connection after cursor object deletion.

``````try:
with conn as cur:
print "hello"
except:
print "sorry"
finally:
conn.close()
``````

Written with StackEdit.

## Google App Engine: Data store with async calls

• Google Data Store provides single core CPU  (F class) for applications.
• Hence heavily multi-threaded applications can’t scale enough
• DataStoreprovidesasync calls for DB access. It has:
• Async queries
• Async transactions
• Documentation is good but lacks clarity on usage and benefits on async calls. It says that async queries can run in parallel after immediately returning a temporary result object. But how?
• I think Data Store can time share multiple requests, hence improving responsiveness and fairness.
• How to convert sync calls to async:
• I got confused at async queries. It uses run() menthod on a query to make it async. A get() method is a sync call.
• Difference between run() and get() is that:
• run() returns an iterator over results
• get() is a sync call and returns the first result from result set
• run(limit=1) == get()

```q1 = Salesperson.all().filter('date_of_hire <', one_month_ago)
# Returns instantly, query is executing in the background.
recent_hires = q1.run()

# q1 and recent_hires are both iterable objects. Your application should iterate over 'recent_hires' to enjoy benefits of pre-fetched reads.
##### for h in recent_hires:
##### for h in q1:
```

## Dissecting Python List allocator (CPython)

List is one of the most frequently used data structure in Python applications. The code for Python list object allocator is present in Python-2.7.5/Objects/listobject.c.

Python allocates list objects in heap and uses underlying malloc based APIs. List is a container of Python object references. In plain C terminology. it is a dynamic array of pointers.

Let’s understand basic creation and insertion APIs.

• list_resize(): A list is of type PyListObject and is equivalent to PyObject. Every resize request allocates more than asked size.

Allocation unit size is : (size_requested + (size_requested << 3 + (size_requested < 9 : 3 or 6) ).

It gives a growth pattern of (0, 4, 8, 16, 25, 35, 46, 58, 72, 88).

• free_list: Python keeps PyList_MAXFREELIST (defined to value 80) of free lists to save on lists allocation and deallocation.
• PyList_New(): A new list allocation requests is served from either a free list or a new GC tracked object creation of type PyListObject. Next, a malloc request is made for requested list size.

Size calculated: nbytes = size * sizeof(PyObject *);

Object allocation: PyListObject *op = free_list[numfree] OR PyObject_GC_New(PyListObject, &PyList_Type);

Memory allocation for references: op->ob_item = (PyObject **) PyMem_MALLOC(nbytes);

## How to limit heap size of a Python process on Linux?

Python allows customization to resource allocation for a Python VM instance.

http://docs.python.org/2/library/resource.html

``````import resource
resource.setrlimit(resource.RLIMIT_AS, (megs * 1048576L, -1L))``````

## memory_profiler: NameError: name ‘profile’ is not defined

I was experimenting with Python’s memory_profiler module and suddenly started getting following error:

\$ python -m memory_profiler memProf.py
Traceback (most recent call last):
File “/usr/lib/python2.7/runpy.py”, line 162, in _run_module_as_main
File “/usr/lib/python2.7/runpy.py”, line 72, in _run_code
exec code in run_globals
File “/usr/local/lib/python2.7/dist-packages/memory_profiler.py”, line 14, in <module>
import subprocess
File “/usr/lib/python2.7/subprocess.py”, line 432, in <module>
import pickle
File “pickle.py”, line 8, in <module>

NameError: name ‘profile’ is not defined
Error in sys.excepthook:
Traceback (most recent call last):
File “/usr/lib/python2.7/dist-packages/apport_python_hook.py”, line 66, in apport_excepthook
from apport.fileutils import likely_packaged, get_recent_crashes
File “/usr/lib/python2.7/dist-packages/apport/__init__.py”, line 1, in <module>
from apport.report import Report
File “/usr/lib/python2.7/dist-packages/apport/report.py”, line 12, in <module>
import subprocess, tempfile, os.path, urllib, re, pwd, grp, os
File “/usr/lib/python2.7/subprocess.py”, line 432, in <module>
import pickle
File “pickle.py”, line 8, in <module>

NameError: name ‘profile’ is not defined

Original exception was:
Traceback (most recent call last):
File “/usr/lib/python2.7/runpy.py”, line 162, in _run_module_as_main
File “/usr/lib/python2.7/runpy.py”, line 72, in _run_code
exec code in run_globals
File “/usr/local/lib/python2.7/dist-packages/memory_profiler.py”, line 14, in <module>
import subprocess
File “/usr/lib/python2.7/subprocess.py”, line 432, in <module>
import pickle
File “pickle.py”, line 8, in <module>

NameError: name ‘profile’ is not defined

I checked the installation and validated the run-time environment, but everything looked fine. Then I took a closer look at the error traces.

File “pickle.py”, line 8, in <module>

NameError: name ‘profile’ is not defined

There was no pickling in my code! Why is pickle module in the error log?

Then I checked the current directory and saw a file with name “pickle.py”, Bingo! I had created this file for another test and mistakenly named as “pickle.py” that coincides with standard Python pickle module.

So, fix was to rename this file to a modest name and delete its “pyc” file.

## Python Internals: Understanding Python data model (I)

Python sees everything as object. Every object has an identity, value and a type. Object identity and type are invariable.

Object type determines if value is mutable or otherwise.

Lifetime of object is based on reference count mechanism.

### Object Container

Object containers are: list, dictionary, tuple, set. Containers keep reference (object identity) to objects. Mutability of container is based on references. The value of the referred object could be mutable.

Let’s see how they behave:

```from sys import getrefcount

a = 1
b = 1

list1 = []
list2 = []

t1 = (a, b)
t2 = (a, b)

# a and b share reference to same object ID
print "a=", id(a)
print "b=", id(b)

# Constant 1 has ref count of +2 (a and b)
print "getrefcount(1)=", getrefcount(1)

# Constant 1 has ref count of +3 now (a,b and c)
c = 1
print "getrefcount(1)=", getrefcount(1)

# Decrement the object ref count
del c
print "getrefcount(1)=", getrefcount(1)

# Default ref count of an unused new integer object is 3. But, why?
print "getrefcount(999999)=", getrefcount(999999)

print ""

# Mutable objects like list do not refer to same object ID, even
# though value of objects are same!
print "list1=", id(list1)
print "list2=", id(list2)
print "getrefcount(list1)=", getrefcount(list1)
print "getrefcount(list2)=", getrefcount(list2)

print ""

print "t1=", id(t1)
print "t2=" ,id(t2)

# Containers have default ref count of 2
print "getrefcount(t1)=", getrefcount(t1)
print "getrefcount(t2)=", getrefcount(t2)

# Changing contained object values do not modiy immutable container
a = 3
b = 10

print "t1=", id(t1)
print "t2=" ,id(t2)

# String literals are constant and are referred to like numbers
s1 = "hello"
s2 = "hello"

# s1 and s2 refer to same object
print "s1=", id(s1)
print "s2=" ,id(s2)
print "getrefcount(hello)=", getrefcount("hello")

# The first use of a literal uses 3 ref count
print "getrefcount(hello!!)=", getrefcount("hello!!")

s3 = "hello!"
print "getrefcount(hello!)=", getrefcount("hello!")```

## Parenthesize an expression in Python

```    def pref(op):
print "called with op", op
ret = -1
if op == '+':
print "matched +"
ret = 1
if op == '-':
print "matched -"
ret = 2
if op == '*':
print "matched *"
ret = 3
if op == '/':
print "matched /"
ret = 4

return ret

def evaluate(expr, operand_stack, operator_stack):
print "**In evaluate**"
print operator_stack
print operand_stack

expr1 = operand_stack.pop()
expr2 = operand_stack.pop()
op    = operator_stack.pop()

# Parenthesize the expression
expr = "(" + expr2 + op + expr1 + ")"
print "expr1", expr1
print "expr2", expr2
print "expr", expr

# Push the result back on the stack
operand_stack.append(expr)

print operator_stack
print operand_stack
print "**Out evaluate**"
return expr

def looper(str, expr, operator_stack, operand_stack):
l = 0
cnt = len(str)

# Loop over the input string
while  l < cnt:
if str[l] in ('+', '-', '*', '/'):
print "operator found: op, index", str[l], l
print operator_stack, len(operator_stack)

x = len(operator_stack) - 1
if x > 0:
print "Comparing:", operator_stack[x], str[l]

# If op on stack has higher preference than the op in question
if (pref(operator_stack[x]) > pref(str[l])):
expr = evaluate(expr, operand_stack, operator_stack)
operator_stack.append(str[l])
else:
# Add the operand to operand stack
operand_stack.append(str[l])
l += 1

print operator_stack
print operand_stack

print "Take care of last elements"
op_cnt = len(operator_stack)
while op_cnt:
expr = evaluate(expr, operand_stack, operator_stack)
op_cnt -= 1

print operator_stack
print operand_stack

if __name__ == '__main__':
str = "a+c*d-e/w*x+a-s"
cnt = len(str)

operand_stack  = []
operator_stack  = []
expr = ""
looper(str, expr, operator_stack, operand_stack)

print "Output=>", operand_stack[0]

```

## Pylucene- Part II: Searching index

In the last post, we discussed how to create an index over a directory. Now, let’s search our index.

```from lucene import \
TermPositionVector, SimpleFSDirectory, File, MoreLikeThis, \
VERSION, initVM, Version
import sys

FIELD_CONTENTS = "contents"
FIELD_PATH = "path"

QUERY_STRING = "lucene and restored"

STORE_DIR = "/home/kanaujia/lucene_index"

if __name__ == '__main__':
initVM()
print 'lucene', VERSION

# Get handle to index directory
directory = SimpleFSDirectory(File(STORE_DIR))

# Creates a searcher searching the provided index.

# Implements search over a single IndexReader.
# Use a single instance and use it across queries
# to improve performance.

# Get the analyzer
analyzer = StandardAnalyzer(Version.LUCENE_CURRENT)

# Constructs a query parser. We specify what field to search into.
queryParser = QueryParser(Version.LUCENE_CURRENT,
FIELD_CONTENTS, analyzer)

# Create the query
query = queryParser.parse(QUERY_STRING)

# Run the query and get top 50 results
topDocs = searcher.search(query, 50)

# Get top hits
scoreDocs = topDocs.scoreDocs
print "%s total matching documents." % len(scoreDocs)

for scoreDoc in scoreDocs:
doc = searcher.doc(scoreDoc.doc)
print doc.get(FIELD_PATH)```

## Pylucene- Part I: Creating index

How to write a simple index generator with pylucene

```  1 import lucene
2
3 if __name__ == '__main__':
4     INDEX_DIR = "/home/kanaujia/lucene_index"
5
6     # Initialize lucene and JVM
7     lucene.initVM()
8
9     print "lucene version is:", lucene.VERSION
10
11     # Get the analyzer
12     analyzer = lucene.StandardAnalyzer(lucene.Version.LUCENE_CURRENT)
13
14     # Get index storage
15     store = lucene.SimpleFSDirectory(lucene.File(INDEX_DIR))
16
17     # Get index writer
18     writer = lucene.IndexWriter(store, analyzer, True, lucene.IndexWriter.MaxFieldLength.LIMITED)
19
20     try:
21         # create a document that would we added to the index
22         doc = lucene.Document()
23
24         # Add a field to this document
25         field = lucene.Field("title", "India", lucene.Field.Store.YES, lucene.Field.Index.ANALYZED)
26
27         # Add this field to the document
29
30         # Add the document to the index
32     except Exception, e:
33         print "Failed in indexDocs:", e

```

# Fundamentals

• An index is created with an IndexWriter
• An index is a collection of documents
• A document represents a file, or data in terms of fields
• A field is a tuple of field name, data

Let’s understand the above program:

1. We provide a location of index as INDEX_DIR = “/home/kanaujia/lucene_index”
2. Start and initialize the Java VM
3. Get the lucene’s standard analyzer for fields
4. This example keeps the index on disk, so the `SimpleFSDirectory` class is used to get a handle to this index.
5. IndexWriter creates and maintains an index. The constructor is as follows:

`IndexWriter(Directory d, Analyzer a, boolean create, IndexDeletionPolicy deletionPolicy, IndexWriter.MaxFieldLength mfl)`

• Directory is handle to index location
• ‘create’ tells if a new index object is created for every user request
```# Get index writer
writer = lucene.IndexWriter(store, analyzer, True, lucene.IndexWriter.MaxFieldLength.LIMITED)```
• Create a document that would become part in the index
• Create a field, add it to a document.
• Add the document to the index.
• Run the program
```kanaujia@ubuntu:~/work/Py\$ python example1.py
lucene version is: 3.6.1
```
```kanaujia@ubuntu:~/work/Py\$ ls /home/kanaujia/lucene_index/
_0.fdt  _0.fdx  write.lock```

## Python goof ups of a C/C++ programmer

Python is a new age language when compared to good old C. Writing code in Python needs a subtle shift from C mindset. Python offers so many things ready-made that make you feel that you wrote very less code. Anyway, I goofed up while using a very common feature of C: pass by reference.

Python too offers it, but with a caveat: you should know mutable and immutable data objects. When you pass a mutable object like list, dictionary; they can be changed. But, if you pass an immutable data object like string, tuple; they are kept unchanged in caller’s scope. This is very similar to passing “references” (C++) or constant pointers.

```  1 class Node:
2     def __init__(self, value, data=[]):
3         self.char = value
4         # We intend to keep a list of values for a key
5         self.data = data
6         # XXX: List of Node
7         self.children = []
8
9
10 n = Node('a')
11 m = Node('b')
12
13 (n.data).append("north")
14 (m.data).append("south")
15
16 print n.data
17 print m.data```

The output of the above code is:

```ubuntu@ubuntu:~/Desktop/ToKeep/cprogs/pycon\$ python pybug.py
['north', 'south']
['north', 'south']```

I don’t quite understand why Python keeps a single instance of default argument(list). Nonetheless, it is interesting.

This thread on Stackoverflow is very informational.

• # Omitting “this” pointer

In C++, “this” which is pointer to the object is passed implicitly. Python has similar keyword “self” for this. But, contrary to an implicit understanding, “self” should be an argument to a function, else you would see an error:

NameError: global name ‘self’ is not defined

```  1 class pydbg:
2     def __init__(self, modname=None):
3         self.level = 0
4         self.modname = modname
5
6     def DLOG(self, lvl=None, fmt=None, *args):
7         self.level = lvl
8         print '%s' %self.modname
9         print fmt
10         for arg in args:
11             print arg```