Notes on Python Decorator

Notes on Python Decorator

Python decorator is a function wrapper. It helps you wrap essential tasks around your functions such as logging, ratelimits, metrics updates etc. However, it has a peculiar behavior due to its dynamic implementation.

If you run the following code, what is the expected output?

def register(func):
    print("hello to decorator!")
    def another():
    ┆   print("wrapped now!")
    ┆   func()
    return another

@register
def test():
    print("I am a test function")
    return ""

The interesting thing about this code is that it prints the line “hello to decorator” even without any function invocation.

But Why?

The reason is that during the decorator application to the function test interpreter would run all the code inside decorator and prepare a new function object. Decorator essentially creates new functions. Since the new function is created dynamically, Python VM runs all the code inside the wrapper (decorator) function and ultimately retunrs a new function object.

Another example

import sys

def first(func):
    print("hello sir!")
    return func


def second(func):
    print(sys.argv[0])
    for i in range(1,5):
    ┆   print(i)

    def another():
    ┆   print("hello sir!")
    ┆   func()
    return another


@first
def test():
    print("I am a test function1")
    return ""

# Should cause conflict, but doesn't
@first
def test():
    print("I am a test function1")
    return ""


# Another function with the same name
# Python VM does not complain of name conflict as
# the decorated function would assume a new name
@second
def test():
    print("I am a test function2")
    return ""

Output

$ python mydecor.py
hello sir!
hello sir!
mydecor.py
1
2
3
4

Things to look for

  • There is no name collison problem for a decorated function
  • A decorated function get a name of parent function and and offset
  • Decorator code is run even before main() is called

References

Sample code

Written with StackEdit.

Advertisements

MySQL & Python- Error: 2006 mysql has gone away

MySQL & Python: Error: 2006 mysql has gone away

This problem occurs for multiple reasons such as DB connection problem. In our code, we hit this issue due to a subtle problem with DB cursor.

The code was as following:

with conn as cur:
    try:
        print "hello"
    except:
        print "sorry"
    finally: 
        conn.close()

The above code would throw the error 2016 mysql has gone away exception. The problem lies in with conn as cur. This statement creates a cursor on the DB and the cursor object autmatically gets destroyed.
Here, we are closing the DB connection before the automatic destruction happened.

So since connection was invalid(closed), cursor deletion hit an exception.

The solution is to close the connection after cursor object deletion.

try:
    with conn as cur:
        print "hello"
except:
    print "sorry"
finally: 
    conn.close()

Written with StackEdit.

Google App Engine: Data store with async calls

  • Google Data Store provides single core CPU  (F class) for applications.
  • Hence heavily multi-threaded applications can’t scale enough
  • DataStoreprovidesasync calls for DB access. It has:
    • Async queries
    • Async transactions
    • Async read/writes
  • Documentation is good but lacks clarity on usage and benefits on async calls. It says that async queries can run in parallel after immediately returning a temporary result object. But how?
  • I think Data Store can time share multiple requests, hence improving responsiveness and fairness.
  • How to convert sync calls to async:
  • Please refer Source: Google Developer website
    • I got confused at async queries. It uses run() menthod on a query to make it async. A get() method is a sync call.
    • Difference between run() and get() is that:
      • run() returns an iterator over results
      • run() pre-fetches results with a batch read, so access to elements in results is faster
      • get() is a sync call and returns the first result from result set
      • run(limit=1) == get()

 

q1 = Salesperson.all().filter('date_of_hire <', one_month_ago)
# Returns instantly, query is executing in the background.
recent_hires = q1.run()

# q1 and recent_hires are both iterable objects. Your application should iterate over 'recent_hires' to enjoy benefits of pre-fetched reads.
##### for h in recent_hires:
# instead of
##### for h in q1:

Dissecting Python List allocator (CPython)

List is one of the most frequently used data structure in Python applications. The code for Python list object allocator is present in Python-2.7.5/Objects/listobject.c.

Python allocates list objects in heap and uses underlying malloc based APIs. List is a container of Python object references. In plain C terminology. it is a dynamic array of pointers.

Let’s understand basic creation and insertion APIs.

  • list_resize(): A list is of type PyListObject and is equivalent to PyObject. Every resize request allocates more than asked size.

Allocation unit size is : (size_requested + (size_requested << 3 + (size_requested < 9 : 3 or 6) ).

It gives a growth pattern of (0, 4, 8, 16, 25, 35, 46, 58, 72, 88).

  • free_list: Python keeps PyList_MAXFREELIST (defined to value 80) of free lists to save on lists allocation and deallocation.
  • PyList_New(): A new list allocation requests is served from either a free list or a new GC tracked object creation of type PyListObject. Next, a malloc request is made for requested list size.

Size calculated: nbytes = size * sizeof(PyObject *);

Object allocation: PyListObject *op = free_list[numfree] OR PyObject_GC_New(PyListObject, &PyList_Type);

Memory allocation for references: op->ob_item = (PyObject **) PyMem_MALLOC(nbytes);

memory_profiler: NameError: name ‘profile’ is not defined

I was experimenting with Python’s memory_profiler module and suddenly started getting following error:

$ python -m memory_profiler memProf.py
Traceback (most recent call last):
  File “/usr/lib/python2.7/runpy.py”, line 162, in _run_module_as_main
    “__main__”, fname, loader, pkg_name)
  File “/usr/lib/python2.7/runpy.py”, line 72, in _run_code
    exec code in run_globals
  File “/usr/local/lib/python2.7/dist-packages/memory_profiler.py”, line 14, in <module>
    import subprocess
  File “/usr/lib/python2.7/subprocess.py”, line 432, in <module>
    import pickle
  File “pickle.py”, line 8, in <module>
    
NameError: name ‘profile’ is not defined
Error in sys.excepthook:
Traceback (most recent call last):
  File “/usr/lib/python2.7/dist-packages/apport_python_hook.py”, line 66, in apport_excepthook
    from apport.fileutils import likely_packaged, get_recent_crashes
  File “/usr/lib/python2.7/dist-packages/apport/__init__.py”, line 1, in <module>
    from apport.report import Report
  File “/usr/lib/python2.7/dist-packages/apport/report.py”, line 12, in <module>
    import subprocess, tempfile, os.path, urllib, re, pwd, grp, os
  File “/usr/lib/python2.7/subprocess.py”, line 432, in <module>
    import pickle
  File “pickle.py”, line 8, in <module>
    
NameError: name ‘profile’ is not defined

Original exception was:
Traceback (most recent call last):
  File “/usr/lib/python2.7/runpy.py”, line 162, in _run_module_as_main
    “__main__”, fname, loader, pkg_name)
  File “/usr/lib/python2.7/runpy.py”, line 72, in _run_code
    exec code in run_globals
  File “/usr/local/lib/python2.7/dist-packages/memory_profiler.py”, line 14, in <module>
    import subprocess
  File “/usr/lib/python2.7/subprocess.py”, line 432, in <module>
    import pickle
  File “pickle.py”, line 8, in <module>
    
NameError: name ‘profile’ is not defined

I checked the installation and validated the run-time environment, but everything looked fine. Then I took a closer look at the error traces.

  File “pickle.py”, line 8, in <module>

NameError: name ‘profile’ is not defined

There was no pickling in my code! Why is pickle module in the error log?

Then I checked the current directory and saw a file with name “pickle.py”, Bingo! I had created this file for another test and mistakenly named as “pickle.py” that coincides with standard Python pickle module.

So, fix was to rename this file to a modest name and delete its “pyc” file.

Python Internals: Understanding Python data model (I)

Python sees everything as object. Every object has an identity, value and a type. Object identity and type are invariable.

Python Data Model
Python Data Model

Object type determines if value is mutable or otherwise.

Lifetime of object is based on reference count mechanism.

Object Container

Object containers are: list, dictionary, tuple, set. Containers keep reference (object identity) to objects. Mutability of container is based on references. The value of the referred object could be mutable.

Let’s see how they behave:

from sys import getrefcount

a = 1
b = 1

list1 = [] 
list2 = [] 

t1 = (a, b)
t2 = (a, b)

# a and b share reference to same object ID 
print "a=", id(a)
print "b=", id(b)

# Constant 1 has ref count of +2 (a and b)
print "getrefcount(1)=", getrefcount(1)

# Constant 1 has ref count of +3 now (a,b and c)
c = 1
print "getrefcount(1)=", getrefcount(1)

# Decrement the object ref count
del c
print "getrefcount(1)=", getrefcount(1)

# Default ref count of an unused new integer object is 3. But, why?
print "getrefcount(999999)=", getrefcount(999999)

print ""

# Mutable objects like list do not refer to same object ID, even
# though value of objects are same!
print "list1=", id(list1)
print "list2=", id(list2)
print "getrefcount(list1)=", getrefcount(list1)
print "getrefcount(list2)=", getrefcount(list2)

print ""

print "t1=", id(t1)
print "t2=" ,id(t2)

# Containers have default ref count of 2
print "getrefcount(t1)=", getrefcount(t1)
print "getrefcount(t2)=", getrefcount(t2)

# Changing contained object values do not modiy immutable container
a = 3
b = 10

print "t1=", id(t1)
print "t2=" ,id(t2)

# String literals are constant and are referred to like numbers
s1 = "hello"
s2 = "hello"

# s1 and s2 refer to same object
print "s1=", id(s1)
print "s2=" ,id(s2)
print "getrefcount(hello)=", getrefcount("hello")

# The first use of a literal uses 3 ref count
print "getrefcount(hello!!)=", getrefcount("hello!!")

s3 = "hello!"
print "getrefcount(hello!)=", getrefcount("hello!")

Parenthesize an expression in Python

    def pref(op):
        print "called with op", op
        ret = -1
        if op == '+':
            print "matched +"
            ret = 1
        if op == '-':
            print "matched -"
            ret = 2
        if op == '*':
            print "matched *"
            ret = 3
        if op == '/':
            print "matched /"
            ret = 4
    
        return ret
    
    def evaluate(expr, operand_stack, operator_stack):
        print "**In evaluate**"
        print operator_stack
        print operand_stack
    
        expr1 = operand_stack.pop()
        expr2 = operand_stack.pop()
        op    = operator_stack.pop()
    
        # Parenthesize the expression
        expr = "(" + expr2 + op + expr1 + ")"
        print "expr1", expr1
        print "expr2", expr2
        print "expr", expr
    
        # Push the result back on the stack
        operand_stack.append(expr)
    
        print operator_stack
        print operand_stack
        print "**Out evaluate**"
        return expr
    
    def looper(str, expr, operator_stack, operand_stack):
        l = 0
        cnt = len(str)
    
        # Loop over the input string
        while  l < cnt:
            if str[l] in ('+', '-', '*', '/'):
                print "operator found: op, index", str[l], l
                print operator_stack, len(operator_stack)
    
                x = len(operator_stack) - 1
                if x > 0:
                    print "Comparing:", operator_stack[x], str[l]
    
                    # If op on stack has higher preference than the op in question
                    if (pref(operator_stack[x]) > pref(str[l])):
                        expr = evaluate(expr, operand_stack, operator_stack)
                operator_stack.append(str[l])
            else:
                # Add the operand to operand stack
                operand_stack.append(str[l])
            l += 1
    
        print operator_stack
        print operand_stack
    
        print "Take care of last elements"
        op_cnt = len(operator_stack)
        while op_cnt:
            expr = evaluate(expr, operand_stack, operator_stack)
            op_cnt -= 1
    
        print operator_stack
        print operand_stack
    
    if __name__ == '__main__':
        str = "a+c*d-e/w*x+a-s"
        cnt = len(str)
    
        operand_stack  = []
        operator_stack  = []
        expr = ""
        looper(str, expr, operator_stack, operand_stack)
    
        print "Output=>", operand_stack[0]

Pylucene- Part II: Searching index

In the last post, we discussed how to create an index over a directory. Now, let’s search our index.

from lucene import \
            QueryParser, IndexSearcher, IndexReader, StandardAnalyzer, \
        TermPositionVector, SimpleFSDirectory, File, MoreLikeThis, \
            VERSION, initVM, Version
import sys

FIELD_CONTENTS = "contents"
FIELD_PATH = "path"

QUERY_STRING = "lucene and restored"

STORE_DIR = "/home/kanaujia/lucene_index"

if __name__ == '__main__':
    initVM()
    print 'lucene', VERSION

    # Get handle to index directory
    directory = SimpleFSDirectory(File(STORE_DIR))

    # Creates a searcher searching the provided index.
    ireader  = IndexReader.open(directory, True)

    # Implements search over a single IndexReader.
    # Use a single instance and use it across queries
    # to improve performance.
    searcher = IndexSearcher(ireader)

    # Get the analyzer
    analyzer = StandardAnalyzer(Version.LUCENE_CURRENT)

    # Constructs a query parser. We specify what field to search into.
    queryParser = QueryParser(Version.LUCENE_CURRENT,
                              FIELD_CONTENTS, analyzer)

    # Create the query
    query = queryParser.parse(QUERY_STRING)

    # Run the query and get top 50 results
    topDocs = searcher.search(query, 50)

    # Get top hits
    scoreDocs = topDocs.scoreDocs
    print "%s total matching documents." % len(scoreDocs)

    for scoreDoc in scoreDocs:
        doc = searcher.doc(scoreDoc.doc)
        print doc.get(FIELD_PATH)

Pylucene- Part I: Creating index

How to write a simple index generator with pylucene

  1 import lucene
  2 
  3 if __name__ == '__main__':
  4     INDEX_DIR = "/home/kanaujia/lucene_index"
  5 
  6     # Initialize lucene and JVM
  7     lucene.initVM()
  8 
  9     print "lucene version is:", lucene.VERSION
 10 
 11     # Get the analyzer
 12     analyzer = lucene.StandardAnalyzer(lucene.Version.LUCENE_CURRENT)
 13 
 14     # Get index storage
 15     store = lucene.SimpleFSDirectory(lucene.File(INDEX_DIR))
 16 
 17     # Get index writer
 18     writer = lucene.IndexWriter(store, analyzer, True, lucene.IndexWriter.MaxFieldLength.LIMITED)
 19 
 20     try:
 21         # create a document that would we added to the index
 22         doc = lucene.Document()
 23 
 24         # Add a field to this document
 25         field = lucene.Field("title", "India", lucene.Field.Store.YES, lucene.Field.Index.ANALYZED)
 26 
 27         # Add this field to the document
 28         doc.add(field)
 29 
 30         # Add the document to the index
 31         writer.addDocument(doc)
 32     except Exception, e:
 33         print "Failed in indexDocs:", e

Fundamentals

  • An index is created with an IndexWriter
  • An index is a collection of documents
  • A document represents a file, or data in terms of fields
  • A field is a tuple of field name, data

Let’s understand the above program:

  1. We provide a location of index as INDEX_DIR = “/home/kanaujia/lucene_index”
  2. Start and initialize the Java VM
  3. Get the lucene’s standard analyzer for fields
  4. This example keeps the index on disk, so the SimpleFSDirectory class is used to get a handle to this index.
  5. IndexWriter creates and maintains an index. The constructor is as follows:

IndexWriter(Directory d, Analyzer a, boolean create, IndexDeletionPolicy deletionPolicy, IndexWriter.MaxFieldLength mfl)

  • Directory is handle to index location
  • ‘create’ tells if a new index object is created for every user request
# Get index writer
    writer = lucene.IndexWriter(store, analyzer, True, lucene.IndexWriter.MaxFieldLength.LIMITED)
  • Create a document that would become part in the index
  • Create a field, add it to a document.
  • Add the document to the index.
  • Run the program
kanaujia@ubuntu:~/work/Py$ python example1.py
lucene version is: 3.6.1
kanaujia@ubuntu:~/work/Py$ ls /home/kanaujia/lucene_index/
_0.fdt  _0.fdx  write.lock