Tag Archives: python

How to limit heap size of a Python process on Linux?

Python allows customization to resource allocation for a Python VM instance.

http://docs.python.org/2/library/resource.html

import resource
resource.setrlimit(resource.RLIMIT_AS, (megs * 1048576L, -1L))
Advertisements

memory_profiler: NameError: name ‘profile’ is not defined

I was experimenting with Python’s memory_profiler module and suddenly started getting following error:

$ python -m memory_profiler memProf.py
Traceback (most recent call last):
  File “/usr/lib/python2.7/runpy.py”, line 162, in _run_module_as_main
    “__main__”, fname, loader, pkg_name)
  File “/usr/lib/python2.7/runpy.py”, line 72, in _run_code
    exec code in run_globals
  File “/usr/local/lib/python2.7/dist-packages/memory_profiler.py”, line 14, in <module>
    import subprocess
  File “/usr/lib/python2.7/subprocess.py”, line 432, in <module>
    import pickle
  File “pickle.py”, line 8, in <module>
    
NameError: name ‘profile’ is not defined
Error in sys.excepthook:
Traceback (most recent call last):
  File “/usr/lib/python2.7/dist-packages/apport_python_hook.py”, line 66, in apport_excepthook
    from apport.fileutils import likely_packaged, get_recent_crashes
  File “/usr/lib/python2.7/dist-packages/apport/__init__.py”, line 1, in <module>
    from apport.report import Report
  File “/usr/lib/python2.7/dist-packages/apport/report.py”, line 12, in <module>
    import subprocess, tempfile, os.path, urllib, re, pwd, grp, os
  File “/usr/lib/python2.7/subprocess.py”, line 432, in <module>
    import pickle
  File “pickle.py”, line 8, in <module>
    
NameError: name ‘profile’ is not defined

Original exception was:
Traceback (most recent call last):
  File “/usr/lib/python2.7/runpy.py”, line 162, in _run_module_as_main
    “__main__”, fname, loader, pkg_name)
  File “/usr/lib/python2.7/runpy.py”, line 72, in _run_code
    exec code in run_globals
  File “/usr/local/lib/python2.7/dist-packages/memory_profiler.py”, line 14, in <module>
    import subprocess
  File “/usr/lib/python2.7/subprocess.py”, line 432, in <module>
    import pickle
  File “pickle.py”, line 8, in <module>
    
NameError: name ‘profile’ is not defined

I checked the installation and validated the run-time environment, but everything looked fine. Then I took a closer look at the error traces.

  File “pickle.py”, line 8, in <module>

NameError: name ‘profile’ is not defined

There was no pickling in my code! Why is pickle module in the error log?

Then I checked the current directory and saw a file with name “pickle.py”, Bingo! I had created this file for another test and mistakenly named as “pickle.py” that coincides with standard Python pickle module.

So, fix was to rename this file to a modest name and delete its “pyc” file.

Python Internals: Understanding Python data model (I)

Python sees everything as object. Every object has an identity, value and a type. Object identity and type are invariable.

Python Data Model

Python Data Model

Object type determines if value is mutable or otherwise.

Lifetime of object is based on reference count mechanism.

Object Container

Object containers are: list, dictionary, tuple, set. Containers keep reference (object identity) to objects. Mutability of container is based on references. The value of the referred object could be mutable.

Let’s see how they behave:

from sys import getrefcount

a = 1
b = 1

list1 = [] 
list2 = [] 

t1 = (a, b)
t2 = (a, b)

# a and b share reference to same object ID 
print "a=", id(a)
print "b=", id(b)

# Constant 1 has ref count of +2 (a and b)
print "getrefcount(1)=", getrefcount(1)

# Constant 1 has ref count of +3 now (a,b and c)
c = 1
print "getrefcount(1)=", getrefcount(1)

# Decrement the object ref count
del c
print "getrefcount(1)=", getrefcount(1)

# Default ref count of an unused new integer object is 3. But, why?
print "getrefcount(999999)=", getrefcount(999999)

print ""

# Mutable objects like list do not refer to same object ID, even
# though value of objects are same!
print "list1=", id(list1)
print "list2=", id(list2)
print "getrefcount(list1)=", getrefcount(list1)
print "getrefcount(list2)=", getrefcount(list2)

print ""

print "t1=", id(t1)
print "t2=" ,id(t2)

# Containers have default ref count of 2
print "getrefcount(t1)=", getrefcount(t1)
print "getrefcount(t2)=", getrefcount(t2)

# Changing contained object values do not modiy immutable container
a = 3
b = 10

print "t1=", id(t1)
print "t2=" ,id(t2)

# String literals are constant and are referred to like numbers
s1 = "hello"
s2 = "hello"

# s1 and s2 refer to same object
print "s1=", id(s1)
print "s2=" ,id(s2)
print "getrefcount(hello)=", getrefcount("hello")

# The first use of a literal uses 3 ref count
print "getrefcount(hello!!)=", getrefcount("hello!!")

s3 = "hello!"
print "getrefcount(hello!)=", getrefcount("hello!")

Parenthesize an expression in Python

    def pref(op):
        print "called with op", op
        ret = -1
        if op == '+':
            print "matched +"
            ret = 1
        if op == '-':
            print "matched -"
            ret = 2
        if op == '*':
            print "matched *"
            ret = 3
        if op == '/':
            print "matched /"
            ret = 4
    
        return ret
    
    def evaluate(expr, operand_stack, operator_stack):
        print "**In evaluate**"
        print operator_stack
        print operand_stack
    
        expr1 = operand_stack.pop()
        expr2 = operand_stack.pop()
        op    = operator_stack.pop()
    
        # Parenthesize the expression
        expr = "(" + expr2 + op + expr1 + ")"
        print "expr1", expr1
        print "expr2", expr2
        print "expr", expr
    
        # Push the result back on the stack
        operand_stack.append(expr)
    
        print operator_stack
        print operand_stack
        print "**Out evaluate**"
        return expr
    
    def looper(str, expr, operator_stack, operand_stack):
        l = 0
        cnt = len(str)
    
        # Loop over the input string
        while  l < cnt:
            if str[l] in ('+', '-', '*', '/'):
                print "operator found: op, index", str[l], l
                print operator_stack, len(operator_stack)
    
                x = len(operator_stack) - 1
                if x > 0:
                    print "Comparing:", operator_stack[x], str[l]
    
                    # If op on stack has higher preference than the op in question
                    if (pref(operator_stack[x]) > pref(str[l])):
                        expr = evaluate(expr, operand_stack, operator_stack)
                operator_stack.append(str[l])
            else:
                # Add the operand to operand stack
                operand_stack.append(str[l])
            l += 1
    
        print operator_stack
        print operand_stack
    
        print "Take care of last elements"
        op_cnt = len(operator_stack)
        while op_cnt:
            expr = evaluate(expr, operand_stack, operator_stack)
            op_cnt -= 1
    
        print operator_stack
        print operand_stack
    
    if __name__ == '__main__':
        str = "a+c*d-e/w*x+a-s"
        cnt = len(str)
    
        operand_stack  = []
        operator_stack  = []
        expr = ""
        looper(str, expr, operator_stack, operand_stack)
    
        print "Output=>", operand_stack[0]

Pylucene- Part II: Searching index

In the last post, we discussed how to create an index over a directory. Now, let’s search our index.

from lucene import \
            QueryParser, IndexSearcher, IndexReader, StandardAnalyzer, \
        TermPositionVector, SimpleFSDirectory, File, MoreLikeThis, \
            VERSION, initVM, Version
import sys

FIELD_CONTENTS = "contents"
FIELD_PATH = "path"

QUERY_STRING = "lucene and restored"

STORE_DIR = "/home/kanaujia/lucene_index"

if __name__ == '__main__':
    initVM()
    print 'lucene', VERSION

    # Get handle to index directory
    directory = SimpleFSDirectory(File(STORE_DIR))

    # Creates a searcher searching the provided index.
    ireader  = IndexReader.open(directory, True)

    # Implements search over a single IndexReader.
    # Use a single instance and use it across queries
    # to improve performance.
    searcher = IndexSearcher(ireader)

    # Get the analyzer
    analyzer = StandardAnalyzer(Version.LUCENE_CURRENT)

    # Constructs a query parser. We specify what field to search into.
    queryParser = QueryParser(Version.LUCENE_CURRENT,
                              FIELD_CONTENTS, analyzer)

    # Create the query
    query = queryParser.parse(QUERY_STRING)

    # Run the query and get top 50 results
    topDocs = searcher.search(query, 50)

    # Get top hits
    scoreDocs = topDocs.scoreDocs
    print "%s total matching documents." % len(scoreDocs)

    for scoreDoc in scoreDocs:
        doc = searcher.doc(scoreDoc.doc)
        print doc.get(FIELD_PATH)

Pylucene- Part I: Creating index

How to write a simple index generator with pylucene

  1 import lucene
  2 
  3 if __name__ == '__main__':
  4     INDEX_DIR = "/home/kanaujia/lucene_index"
  5 
  6     # Initialize lucene and JVM
  7     lucene.initVM()
  8 
  9     print "lucene version is:", lucene.VERSION
 10 
 11     # Get the analyzer
 12     analyzer = lucene.StandardAnalyzer(lucene.Version.LUCENE_CURRENT)
 13 
 14     # Get index storage
 15     store = lucene.SimpleFSDirectory(lucene.File(INDEX_DIR))
 16 
 17     # Get index writer
 18     writer = lucene.IndexWriter(store, analyzer, True, lucene.IndexWriter.MaxFieldLength.LIMITED)
 19 
 20     try:
 21         # create a document that would we added to the index
 22         doc = lucene.Document()
 23 
 24         # Add a field to this document
 25         field = lucene.Field("title", "India", lucene.Field.Store.YES, lucene.Field.Index.ANALYZED)
 26 
 27         # Add this field to the document
 28         doc.add(field)
 29 
 30         # Add the document to the index
 31         writer.addDocument(doc)
 32     except Exception, e:
 33         print "Failed in indexDocs:", e

Fundamentals

  • An index is created with an IndexWriter
  • An index is a collection of documents
  • A document represents a file, or data in terms of fields
  • A field is a tuple of field name, data

Let’s understand the above program:

  1. We provide a location of index as INDEX_DIR = “/home/kanaujia/lucene_index”
  2. Start and initialize the Java VM
  3. Get the lucene’s standard analyzer for fields
  4. This example keeps the index on disk, so the SimpleFSDirectory class is used to get a handle to this index.
  5. IndexWriter creates and maintains an index. The constructor is as follows:

IndexWriter(Directory d, Analyzer a, boolean create, IndexDeletionPolicy deletionPolicy, IndexWriter.MaxFieldLength mfl)

  • Directory is handle to index location
  • ‘create’ tells if a new index object is created for every user request
# Get index writer
    writer = lucene.IndexWriter(store, analyzer, True, lucene.IndexWriter.MaxFieldLength.LIMITED)
  • Create a document that would become part in the index
  • Create a field, add it to a document.
  • Add the document to the index.
  • Run the program
kanaujia@ubuntu:~/work/Py$ python example1.py
lucene version is: 3.6.1
kanaujia@ubuntu:~/work/Py$ ls /home/kanaujia/lucene_index/
_0.fdt  _0.fdx  write.lock

Python goof ups of a C/C++ programmer

Python is a new age language when compared to good old C. Writing code in Python needs a subtle shift from C mindset. Python offers so many things ready-made that make you feel that you wrote very less code. Anyway, I goofed up while using a very common feature of C: pass by reference.

Python too offers it, but with a caveat: you should know mutable and immutable data objects. When you pass a mutable object like list, dictionary; they can be changed. But, if you pass an immutable data object like string, tuple; they are kept unchanged in caller’s scope. This is very similar to passing “references” (C++) or constant pointers.

  1 class Node:
  2     def __init__(self, value, data=[]):
  3         self.char = value
  4         # We intend to keep a list of values for a key
  5         self.data = data
  6         # XXX: List of Node
  7         self.children = []
  8
  9
 10 n = Node('a')
 11 m = Node('b')
 12
 13 (n.data).append("north")
 14 (m.data).append("south")
 15
 16 print n.data
 17 print m.data

The output of the above code is:

ubuntu@ubuntu:~/Desktop/ToKeep/cprogs/pycon$ python pybug.py
['north', 'south']
['north', 'south']

I don’t quite understand why Python keeps a single instance of default argument(list). Nonetheless, it is interesting.

This thread on Stackoverflow is very informational.

  • Omitting “this” pointer

In C++, “this” which is pointer to the object is passed implicitly. Python has similar keyword “self” for this. But, contrary to an implicit understanding, “self” should be an argument to a function, else you would see an error:

NameError: global name ‘self’ is not defined

  1 class pydbg:
  2     def __init__(self, modname=None):
  3         self.level = 0
  4         self.modname = modname
  5
  6     def DLOG(self, lvl=None, fmt=None, *args):
  7         self.level = lvl
  8         print '%s' %self.modname
  9         print fmt
 10         for arg in args:
 11             print arg

So, always add the “self” argument while adding a function in your class.