This is second part of my notes for the python interview prep. Was valid in python27 capturing here for future reference.
Table of Contents
- What is bytecode?
- How to check memory address of a variable in python?
- How to iterate over indices and list?
- What is Classmethod?
- What is Staticmethod?
- What is docstring?
- What is map?
- Difference between shallow copy and deep copy?
- Insertion Sort in python?
- Heap sort and priority queues?
- Binary Tree?
- Huffman Encoding?
- Graph in python?
- Generators in python?
- Decorators in python?
- Function Overloading in python?
- Notes process vs threads?
- Notes global interpreter lock
- What is coupling?
- What is cohesion?
What is bytecode?
Bytecode a form of instruction set designed for efficient execution by a software interpreter. Unlike human-readable source code, bytecodes are compact numeric codes, constants, and references (normally numeric addresses) that encode the result of compiler parsing and performing semantic analysis of things like type, scope, and nesting depths of program objects.
How to check memory address of a variable in python?
Using the id function. The id function returns the identify of the object in the python code.
my_list =[1,2]
id(my_list)
2113236931200
# Make use of hex to get the hexadecimal address
hex(id(my_list))
'0x1ec06bc9280'
How to iterate over indices and list?
enumerate is a way to iterate over indices and items of a list.
for example
alist = ['a1', 'a2', 'a3']
for i, a in enumerate(alist):
print(i, a)
Results:
0 a1
1 a2
2 a3
zip is a way to iterate over two lists in parallel in python.
alist = ['a1', 'a2', 'a3']
blist = ['b1', 'b2', 'b3']
for a, b in zip(alist, blist):
print(a, b)
Results:
a1 b1
a2 b2
a3 b3
zip() in conjunction with the * operator can be used to unzip a list.
>>> x = [1, 2, 3]
>>> y = [4, 5, 6]
>>> zipped = zip(x, y)
>>> zipped
[(1, 4), (2, 5), (3, 6)]
>>> x2, y2 = zip(*zipped)
What is Classmethod?
The classmethod differs with static method in a way where the first argument for the classmethod is a reference to a class object. Following is an example code that helps in understanding the difference in the classmethod and staticmethod.
class Date(object):
def __init__(self, day=0, month=0, year=0):
self.day = day
self.month = month
self.year = year
@classmethod
def from_string(cls, date_as_string):
day, month, year = map(int, date_as_string.split('-'))
date1 = cls(day, month, year)
return date1
@staticmethod
def is_date_valid(date_as_string):
day, month, year = map(int, date_as_string.split('-'))
return day <= 31 and month <= 12 and year <= 3999
date2 = Date.from_string('11-09-2012')
is_date = Date.is_date_valid('11-09-2012')
Note the use cls reference passed to the class method and that is missing in the staticmethod.
Following is an excerpt that explains how this can be used. As the function overloading is not available in the python. This can be used to instantiate multiple class objects.
Let's assume that we want to create a lot of Date class instances having date information coming from an outer source encoded as a string with format 'dd-mm-yyyy'. Suppose we have to do this in different places in the source code of our project.
So what we must do here is:
Parse a string to receive day, month and year as three integer variables or a 3-item tuple consisting of that variable.
Instantiate Date by passing those values to the initialization call.
This will look like:
day, month, year = map(int, string_date.split('-'))
date1 = Date(day, month, year)
For this purpose, C++ can implement such a feature with overloading, but Python lacks this overloading. Instead, we can use classmethod. Let's create another constructor.
@classmethod
def from_string(cls, date_as_string):
day, month, year = map(int, date_as_string.split('-'))
date1 = cls(day, month, year)
return date1
date2 = Date.from_string('11-09-2012')
So what has happened here??
- We've implemented date string parsing in one place and it's reusable now.
- Encapsulation works fine here (if you think that you could implement string parsing as a single function elsewhere, this solution fits the OOP paradigm far better).
- cls is the class itself, not an instance of the class. It's pretty cool because if we inherit our Date class, all children will have from_string defined also.
What is Staticmethod?
It is Pretty similar to classmethod but doesn't take any "compulsory" parameters. (like a class method or instance method does). Let's look at the next use case.
Following is an excerpt take from here.
We have a date string that we want to validate somehow. This task is also logically bound to the Date class we've used so far, but doesn't require instantiation of it.
Here is where staticmethod can be useful. Let's look at the next piece of code:
@staticmethod
def is_date_valid(date_as_string):
day, month, year = map(int, date_as_string.split('-'))
return day < 31 and month < 12 and year < 3999
# usage:
is_date = Date.is_date_valid('11-09-2012')
What is docstring?
docstring is the documentation string for a function. It can be accessed by
function_name.__doc__
it is declared as:
>> def function_name():
>> """your docstring"""
Writing documentation for your progams is a good habit and makes the code more understandable and reusable
What is map?
map executes the function given as the first argument on all the elements of the iterable given as the second argument. If the function given takes in more than 1 arguments, then many iterables are given. For eg:
>>> a='ayush'
>>> map(ord,a)
.... [97, 121, 117, 115, 104]
Difference between shallow copy and deep copy?
The difference between shallow and deep copying is only relevant for compound objects (objects that contain other objects, like lists or class instances):
A shallow copy constructs a new compound object and then (to the extent possible) inserts references into it to the objects found in the original. A deep copy constructs a new compound object and then, recursively, inserts copies into it of the objects found in the original.
>>> colours1 = ["red", "green"]
>>> colours2 = colours1
>>> colours2[1] = "blue"
>>> colours1
['red', 'blue']
It's possible to completely copy shallow list structures with the slice operator without having any of the side effects, which we have described above:
>>> list1 = ['a','b','c','d']
>>> list2 = list1[:]
>>> list2[1] = 'x'
>>> print list2
['a', 'x', 'c', 'd']
>>> print list1
['a', 'b', 'c', 'd']
If you assign a new value to the 0th Element of one of the two lists, there will be no side effect. Problems arise, if you change one of the elements of the sublist.
>>> lst1 = ['a','b',['ab','ba']]
>>> lst2 = lst1[:]
>>> lst2[0] = 'c'
>>> lst2[2][1] = 'd'
>>> print(lst1)
['a', 'b', ['ab', 'd']]
Assignment points to same objects
- copy.copy() - mutable refer to different ids as copy is created
- copy.deepcopy() - immutable refer to same ids
Insertion Sort in python?
Insertion sort algo psuedo code looks like below:
Insertion-Sort(A)
for j <- 2="" br="" length="" to="">
do key <- A[j]
i <- j - 1
while i > 0 and A[i] > key
do A[i+1] <- A[i]
i <- i - 1
A[i + 1] <- key
->
Python code
def InsertionSort(A):
for j in range(1, len(A)):
key = A[j]
i = j - 1
while (i >=0) and (A[i] > key):
A[i+1] = A[i]
i = i - 1
A[i+1] = key
>>> x = [2,7,3,8,1] # create test case
>>> InsertionSort(x) # call routine
>>> x # look at result
[1, 2, 3, 7, 8]
Heap sort and priority queues?
I have always struggle to understand the heap sort but here is the best paragraph i have found.Okay could be simplified but still.
A heap is a data structure that represents a nearly balanced binary tree using an array A[1..n], where the left and right children of an element A[i] are located at A[2i], A[2i+1], respectively, and A[i] >= A[2i], A[2i+1]. HeapSort builds the sorted subarray from the back of the array towards the front one element at a time by extracting the largest element from the front of the heap. Initially the sorted portion is empty, and a call to BuildHeap turns A[1..n] into a heap. Since the heap part puts the largest element at A[1], in the first iteration we extract it and put it in A[n], which is its correct sorted position. The next iteration extracts the second largest element (from A[1] again) and puts it in A[n-1], etc, and it continues until all of A is sorted. Note that Heapify is called as part of each extraction step. This is because if we swap A[1] and A[h], then A[1..h-1] no longer satisfies the heap property, but since it is still "almost" a heap -- that is, all except the root position are still subheaps -- it can be fixed efficiently in O(lg h) time by calling Heapify without having to rebuild the heap in O(h) time.
def Parent(i): return i/2
def Left(i): return 2*i
def Right(i): return 2*i+1
def Heapify(A, i, n): # A is "almost a heap" (except root); fix it so all of A is a heap
l = Left(i)
r = Right(i)
if l <e; n and A[l] > A[i]: largest = l
else: largest = i
if r <e; n and A[r] > A[largest]:
largest = r
if largest != i:
A[i], A[largest] = A[largest], A[i]
Heapify(A, largest, n)
def HeapLength(A): return len(A)-1
def BuildHeap(A): # build a heap A from an unsorted array
n = HeapLength(A)
for i in range(n/2,0,-1):
Heapify(A,i,n)
def HeapSort(A): # use a heap to build sorted array from the end
BuildHeap(A)
HeapSize=HeapLength(A)
for i in range(HeapSize,1,-1):
A[1],A[i]=A[i],A[1] # largest element is a root of heap, put it at the end of array
HeapSize=HeapSize-1 # shrink heap size by 1 to get next largest element
Heapify(A,1,HeapSize)
Binary Tree?
Huffman Encoding?
Graph in python?
Generators in python?
generator is a special type of function that acts as an iterator, allowing you to produce a sequence of values on the fly, one at a time. The key characteristic of a generator is that it doesn't store the entire sequence in memory at once. This makes them incredibly memory-efficient, especially when dealing with very large datasets or infinite sequences.
In Python, a generator is a special type of function that acts as an iterator, allowing you to produce a sequence of values on the fly, one at a time. The key characteristic of a generator is that it doesn't store the entire sequence in memory at once. This makes them incredibly memory-efficient, especially when dealing with very large datasets or infinite sequences.
The yield Keyword
- The "magic" behind generators is the yield keyword, which is used instead of return.
- A normal function, when it encounters a return statement, exits and returns a single value. Its local state is then discarded.
- A generator function, when it encounters a yield statement, produces a value and pauses its execution. The function's entire local state (all local variables and the point of execution) is saved.
- When you ask for the next value from the generator (e.g., in a for loop or by calling next()), the function resumes from where it left off, continues its execution, and runs until it hits the next yield statement.
- This process continues until the generator function finishes, at which point it automatically raises a StopIteration exception, signaling that there are no more values.
Decorators in python?
The decorators are a way to modify or extend the behavior of a function or class without permanently altering its source code.
The fundamental concept behind decorators relies on two key features of Python:
- Functions are First-Class Objects: This means you can treat functions just like any other object. You can pass them as arguments to other functions, return them from functions, and assign them to variables.
- Nested Functions (Closures): You can define a function inside another function. The inner function can access variables from the outer function's scope, even after the outer function has finished executing.
A decorator is a function that takes another function as input, defines an inner "wrapper" function, adds some new functionality within the wrapper, and then returns the wrapper function.
# A simple decorator function
def my_decorator(func):
def wrapper():
print("Something is happening before the function is called.")
func() # Call the original function
print("Something is happening after the function is called.")
return wrapper
# Using the decorator with the @ syntax
@my_decorator
def say_hello():
print("Hello!")
# When you call say_hello(), you are actually calling the wrapper function
say_hello()
# Output:
# Something is happening before the function is called.
# Hello!
# Something is happening after the function is called.
This is what is happening actually
def say_hello():
print("Hello!")
# This is what the decorator syntax does under the hood
say_hello = my_decorator(say_hello)
say_hello()
Decorators are used to implement "cross-cutting concerns"—functionality that affects many parts of a program but is not part of its core logic. Some practical examples include:
- Logging: Automatically log function calls, their arguments, and return values for debugging or monitoring.
- Timing: Measure the execution time of a function to identify performance bottlenecks.
- Authentication and Authorization: Restrict access to a function based on user permissions or whether a user is logged in (common in web frameworks like Django and Flask).
- Caching/Memoization: Store the result of an expensive function call so that subsequent calls with the same arguments can return the cached value instantly.
- Retries: Automatically retry a function call a number of times if it fails.
- Error Handling: Wrap a function in a try...except block to gracefully handle exceptions.
Function Overloading in python?
There is no function overloading in Python, meaning that one can't have multiple functions with the same name but different arguments. Function overloading is basically same function name, with arguments differing in their data type or their number.
Notes on Process vs Threads in Python?
The threading module uses threads, the multiprocessing uses processes. The difference is that threads run in the same memory space, while processes have separate memory. This makes it a bit harder to share objects between processes with multiprocessing. Since threads use the same memory, precautions have to be taken or two threads will write to the same memory at the same time. This is what the global interpreter lock is for.
Spawning processes is a bit slower than spawning threads. Once they are running, there is not much difference.
However, Python* has an added issue: There's a Global Interpreter Lock that prevents two threads in the same process from running Python code at the same time. This means that if you have 8 cores, and change your code to use 8 threads, it won't be able to use 800% CPU and run 8x faster; it'll use the same 100% CPU and run at the same speed. (In reality, it'll run a little slower, because there's extra overhead from threading, even if you don't have any shared data, but ignore that for now.)
There are exceptions to this. If your code's heavy computation doesn't actually happen in Python, but in some library with custom C code that does proper GIL handling, like a numpy app, you will get the expected performance benefit from threading. The same is true if the heavy computation is done by some subprocess that you run and wait on.
More importantly, there are cases where this doesn't matter. For example, a network server spends most of its time reading packets off the network, and a GUI app spends most of its time waiting for user events. One reason to use threads in a network server or GUI app is to allow you to do long-running "background tasks" without stopping the main thread from continuing to service network packets or GUI events. And that works just fine with Python threads. (In technical terms, this means Python threads give you concurrency, even though they don't give you core-parallelism.)
But if you're writing a CPU-bound program in pure Python, using more threads is generally not helpful.
Using separate processes has no such problems with the GIL, because each process has its own separate GIL. Of course you still have all the same tradeoffs between threads and processes as in any other languages—it's more difficult and more expensive to share data between processes than between threads, it can be costly to run a huge number of processes or to create and destroy them frequently, etc. But the GIL weighs heavily on the balance toward processes, in a way that isn't true for, say, C or Java. So, you will find yourself using multiprocessing a lot more often in Python than you would in C or Java
Meanwhile, Python's "batteries included" philosophy brings some good news: It's very easy to write code that can be switched back and forth between threads and processes with a one-liner change.
If you design your code in terms of self-contained "jobs" that don't share anything with other jobs (or the main program) except input and output, you can use the concurrent.futures library to write your code around a thread pool.
Notes on Global Interpreter Lock
Global interpreter lock (GIL) is a mechanism used in computer language interpreters to synchronize the execution of threads so that only one native thread can execute at a time. An interpreter that uses GIL always allows exactly one thread to execute at a time, even if run on a multi-core processor. Some popular interpreters that have GIL are CPython and Ruby MRI.
A global interpreter lock (GIL) is a mutual-exclusion lock held by a programming language interpreter thread to avoid sharing code that is not thread-safe with other threads. In implementations with a GIL, there is always one GIL for each interpreter process. CPython and Ruby MRI use GILs.
Applications running on implementations with a GIL can be designed to use separate processes to achieve full parallelism, as each process has its own interpreter and in turn has its own GIL. Otherwise, the GIL can be a significant barrier to parallelism.
Use of a global interpreter lock in a language effectively limits the amount of parallelism reachable through concurrency of a single interpreter process with multiple threads. If the process is almost purely made up of interpreted code and does not make calls outside of the interpreter for long periods of time (which can release the lock on the GIL on that thread while it processes), there is likely to be very little increase in speed when running the process on a multiprocessor machine. Due to signaling with a CPU-bound thread, it can cause a significant slowdown, even on single processors
Reasons for employing such a lock include:
- increased speed of single-threaded programs (no necessity to acquire or release locks on all data structures separately),
- easy integration of C libraries that usually are not thread-safe,
- ease of implementation (having a single GIL is much simpler to implement than a lock-free interpreter or one using fine-grained locks).
What is coupling?
Coupling - A measure of how much a module (package, class, method) relies on other modules. It is desirable to reduce coupling, or reduce the amount that a given module relies on the other modules of a system.
What is cohesion?
Cohesion - A measure of how closely related the members (classes, methods, functionality within a method) of a module are to the other members of the same module. It is desirable to increase cohesion as that indicates that a module has a very specific task and does only that task.
No comments:
Post a Comment