On being pythonic
I recently was given a Python coding challenge during a job interview. I was criticized for not writing "pythonic" enough code. The good news is that I did great with the other interviewers in the panel, so I still got the job \o/. Afterwards, though, I decided to use the criticism as a learning opportunity, revisit the challenge and write this blog post with the resulting code, while making my best effort to be "pythonic". To that end, here's a basic definition: pythonic code uses the powerful idioms of Python that distinguishes it from other languages to produce elegant and concise programs. To guide me on the road to being "pythonic", I am using the following references:
- Transforming Code into Beautiful, Idiomatic Python (slides) by Raymond Hettinger, who is a Python core developer.
- Since in this exercise we are going to use generators and coroutines, my other reference is Making sense of generators, coroutines, and “yield from” in Python by Reuven Lerner, who teaches Python all over the world.
In the code that follows I will cover the following topics
- One of the aspects that strongly distinguishes Python from other programming languages is its approach to iterations. As Raymond Hettinger argues in his presentation, Python's
for
statement should have been calledfor each
to better reflect its semantics. I will talk about iterables, iterators, the iterator protocol, generators and co-routines. - Python's regular expresions module.
- Improving clarity with named tuples.
- Factoring out administrative logic with decorators.
The coding problem was stated as follows:
# Model a computing device that can load third party programs.
# PROGRAMS consist of ACTIONS that share an execution ENVIRONMENT,
# particularly computer STATE. ACTIONS belong to a limited set supported
# by this device. ACTIONS can communicate with each other through
# REGISTERS.
# PROGRAM execution results in computer STATE change. The STATE change
# can then be inspected.
# Device should allow to LOAD multiple PROGRAMS and execute them
# SEQUENTIALLY.
# Typical API of the computing device would include:
# - load(_program)
# - execute
# - (get_)state
# Implement the device, define two programs (see below), load them
# and execute, then show results.
# program1: receive int through INPUT; add +10 to the INPUT; store result.
# program2: load result of previous program execution; depending on if
# it's > 50, set a register in machine state to 100; otherwise to 0.
# usage example
# c = Computer(...)
# p1 = Program(...)
# p2 = Program(...)
# c.state() # -> {...}
# c.load(p1, input=...)
# c.load(p2, input=...)
# c.execute()
# c.state() # -> {...0 or 100...}
Recognizing instructions with Python regular expressions¶
To make parsing programs easy, the computing device (henceforth the computer) will have one operand instructions and an accumulator register. The add
instruction, for example, adds its operand to the accumulator register. This means that we are also going to need lda
and sto
instructions, that load a value to the accumulator and store the value of the accumulator in another register, respectively. Those registers are part of the computer's state and are described further below. To be able to write program1
and program2
in the coding challenge, the following is the minimum action set necessary:
ACTION_SET = set(['lda', 'sto', 'bgt', 'cmp', 'add'])
ACTIONS_NO_LITERAL = set(['sto', 'bgt'])
BRANCHES = set(['bgt'])
Some instructions, like lda
or add
can take two types of operands, registers and literals:
-
lda r0
loads the content of registerr0
to the acumulator. -
lda #10
loads the value10
to the acumulator.
In the case of the sto
instruction, though, it doesn't make sense to have literal operands. This instruction always stores the value in the acumulator in one of the registers. The set ACTIONS_NO_LITERAL
will help in processing this type of instructions.
Similarly, branches only accept destination operands, which is an integer i
from 0
to n - 1
, where n
is the total number of instructions in the program. Branches, like bgt
, direct the computer to continue fetching instructions from the location specified by the destination operand. In other words, the computer is expected to fetch next the ith
instruction from the beginning of the program. The set BRANCHES
will help in processing this type of instructions.
To parse a programs, we are going to use Python's re
(regular expressions) module:
import re
RE_LINE = re.compile(
r'^(?P<action>[a-z]{3})\s+(?P<arg>r[io0-9]$|\#-?[0-9]+$|[0-9]+$)')
This regular expression parses one instruction at a time. It can be divided in three parts:
- The named capture group
^(?P<action>[a-z]{3})
recognizes sequences of 3 alphabetic characters that correspond to members ofACTION_SET
. Note that it is anchored to the beginning of the string with^
. - One or more whitespaces
\s+
that separate theaction
from thearg
. - The named capture group
(?P<arg>r[io0-9]$|\#-?[0-9]+$|[0-9]+$)
. This group can be one of three alternatives, separated by|
in the regular expression (note that each alternative is anchored to the end of the string with$
):
-
r[io0-9]$
, one of the computer's registersri
,ro
,r0
,...r9
or... -
#-?[0-9]+$
, a literal beginning with#
as explained above or... -
[0-9]+$
, a branch destination as explained above.
These are a few examples of the instructions that can be parsed:
# Instruction with register operand
match = RE_LINE.search('lda r0')
print(match.group('action'))
print(match.group('arg'), '\n')
# Instruction with literal operand
match = RE_LINE.search('add #16')
print(match.group('action'))
print(match.group('arg'), '\n')
# Instruction with branch destination operand
match = RE_LINE.search('bgt 23')
print(match.group('action'))
print(match.group('arg'))
If the provided string doesn't conform with the regular expression, no match object will be returned by the search
method:
# Invalid action
match = RE_LINE.search('ldab r0')
print(match, '\n')
# Invalid operand
match = RE_LINE.search('lda ##10')
print(match, '\n')
# No operand
match = RE_LINE.search('lda')
print(match)
Using coroutines to implement programs¶
Python lists have two characteristics that make them very suitable to represent the programs as defined in this coding challenge:
- They are an ordered collection of objects. A program is an ordered collection of instructions, with each instruction being a string that can be recognized using the regular expresion
RE_LINE
defined above. - Their individual elements can be directly accesed using and integer index. This index will serve as the program counter, i.e. the pointer to the next instruction within a program to be executed.
We are going to tweak the behavior of lists a little bit, though, to control better the execution of programs. So instead of using lists directly, we are going to use collections.UserList, a useful base class for user defined list-like classes which can inherit from them and override existing methods or add new ones. To be more precise, there will be a Program
class that will inherit from collections.UserList
and override the __iter__()
method. As can be seen below, the original __iter__()
method initializes an index i
with 0
and proceeds to iterate sequentially over the list, returning the next element in each iteration:
import collections
import inspect
print(inspect.getsource(collections.UserList.__iter__))
The __iter__()
method in our Program
class will be a coroutine that will enable the implementation of branch instructions by changing the value of the index variable to any value between 0
and the length - 1
of the program. To see how, let's take a little detour to review in more detail some Python concepts and constructs.
The __iter__()
method defined by collections.UserList
is a generator function. As explained in Making sense of generators, coroutines, and “yield from” in Python:
- A generator function, when executed, returns a generator object.
- A generator object implements the iterator protocol, meaning that it knows what to do in a
for
loop. - Each time a generator object reaches a
yield
statement, it returns the yielded value to thefor
loop that is invoking it, and goes to sleep. - With each successive iteration, the generator object starts running from where it paused (i.e., just after the most recent
yield
statement) - When the generator object reaches the end of the function, or encounters a
return
statement, it raises aStopIteration
exception, which is how Python iterators indicate tofor
loops that they’ve reached the end of the line.
Let's define a collections.UserList
and iterate over it with a for
loop:
a_list = collections.UserList([0, 1, 2])
for item in a_list:
print(item)
We can also interact with the generator function directly, similar to what the for
statement does behind the scenes. An iterable object, such as a collection.UserList
, produces a fresh new iterator or generator object each time you pass it to the iter() built-in function
a_list = collections.UserList([0, 1, 2])
g = iter(a_list)
g
Notice that when iter()
was called, it returned a generator object. This of course happened because iter()
in turn called the __iter__()
in collections.UserList
. We now execute the generator object with the next()
built-in function:
print(next(g))
print(next(g))
print(next(g))
try:
next(g)
except StopIteration:
print('End of a_list')
Generator functions use the statement yield
to produce a stream of objects so their callers can iterate over them (in a for
loop for example). yield
, though, can also be an expression and be placed on the right hand side of an assigment. This allows functions to consume values sent to them with their send()
method by their callers. Such functions are called coroutines and open the possibility of concurrent programming in a single thread. Let's look at a simple example, borrowed from A Curious Course on Coroutines and Concurrency:
def grep(pattern):
print ("Looking for %s" % pattern)
while True:
line = (yield)
if pattern in line:
print(line)
Execution of a coroutine is similar to a generator. When a coroutine is called, it returns a generator:
c = grep("python")
c
Coroutines only run in response to the next()
statement (like generators) and their send()
method. All coroutines must be "primed" first by calling next()
on them. This advances execution to the first yield
expression. At this, point a coroutine is ready to receive a value:
# Prime the coroutine so execution advances to the yield expression
next(c)
# Send lines to the coroutine, so it can search for the pattern "python"
c.send("Yeah, but no, but yeah, but no")
c.send("A series of tubes")
c.send("python generators and coroutines rock!")
As indicated above, our Program
class inherits from collections.UserList
and refines it as follows:
- It overrides the
__iter__()
method with a coroutine. This enables theComputer
class that we will define below to iterate sequentially over the instructions in aProgram
, while making it possible to change the next instruction to be returned as a result of a branch instruction execution. Anint
value produced by the coroutine'syield
expression is used as a pointer to the next instruction to be retrieved, thus altering the flow of execution. - It adds a
_parse()
method, which parses each instruction using the regular expression presented above, performs additional checks and returns aParsedInstruction
named tuple with two attributes,action
andargument
, for ease of execution by theComputer
class.
s = collections.namedtuple("ParsedInstruction",
"action, argument")
class Program(collections.UserList):
def __iter__(self):
next_instruction = 0
try:
while True:
instruction = self[next_instruction]
next_instruction += 1
branch_dest = yield self._parse(instruction)
# Flow of execution is altered when the preceding yield
# expression produces a value of type int
if isinstance(branch_dest, int):
next_instruction = branch_dest
except IndexError:
return
def _parse(self, instruction):
match = RE_LINE.search(instruction.lower())
if not match:
raise ValueError('Invalid instruction "%s"' % instruction)
action = match.group('action')
arg = match.group('arg')
is_arg_literal = arg[0] == '#'
is_branch_destination = arg.isnumeric()
if action not in ACTION_SET:
raise ValueError('Action "%s" not in actions set' %
match.group('action'))
if is_arg_literal:
if action in ACTIONS_NO_LITERAL:
raise ValueError('Action "%s" cannot have a literal argument' %
match.group('action'))
arg = int(arg[1:])
if action in BRANCHES:
if not is_branch_destination:
raise ValueError(
'Branch "%s" at %s has non branch destination "%s"' %
(action, self._next, arg))
arg = int(arg)
if arg >= len(self) or arg < 0:
raise ValueError(
'Branch "%s" to destination "%s" jumps outside program '
'boundaries' % (action, arg))
return ParsedInstruction(action, arg)
Let's create a Program
and its generator (coroutine) and "prime" the latter to start fetching instructions:
# Define a program with 3 instructions
p = Program(['add #10', # add 10 to the accumulator
'cmp #50', # if the accumulator is greater than 50...
'bgt 0']) # ... loop back to the first instruction
# Get the Program's iterator (coroutine)
coroutine = iter(p)
# "Prime" the coroutine and fetch the first instruction
parsed_instruction = next(coroutine)
parsed_instruction
We now fetch the other two instructions in the program and, for the bgt
instruction, alter the flow of execution by sending a 0
(the first instruction's location) back to the coroutine:
# Fetch the cmp instruction
parsed_instruction = coroutine.send(None)
print(parsed_instruction)
# Fetch the bgt instruction
parsed_instruction = coroutine.send(None)
print(parsed_instruction)
# Branch back to the add instruction at location 0
parsed_instruction = coroutine.send(0)
print(parsed_instruction)
Using decorators to factor-out logic¶
By definition, a decorator is a function that takes another function as argument and extends the behavior of the latter function without explicitly modifying it. This makes decorators a great mechanism to factor-out logic that is repeated across several functions or methods. Frequently, this is "administrative" logic that is not part of the main functionality that we are implementing. In our case, we are implementing two decorators that are used by the Computer
class defined below:
- As mentioned above, many
Computer
instructions can take register or literal operands. In theComputer
class we have, for each instruction, a method that implements it. The methods for instructions that can take both types of operands are decorated withget_argument
. This decorator retrieves the actual value to be operated on from the register specified in the instruction, when that is the case. This way, the methods can assume they only receive literal operands. - For debugging purposes, we may choose to log to the console the name of methods that implement instructions as they are executed, along with the arguments received. To accomplish this, we use the
log
decorator together with aLOG
flag that is configured withTrue
orFalse
in theComputer
class.
import functools
def get_argument(func):
functools.wraps(func)
def wrapper(self, argument):
to_pass = argument
if not isinstance(argument, int):
to_pass = self._state[argument]
return func(self, to_pass)
return wrapper
def log(func):
functools.wraps(func)
def wrapper(self, argument):
if LOG:
print(func.__name__, argument)
return func(self, argument)
return wrapper
Implementing the computer¶
We can now implement the Computer
class. Its core is the execute()
method, which iterates over the instructions of each program by interacting with the Program
's __iter__()
coroutine in the same manner we interacted with it manually above. For each parsed_instruction
yielded by the coroutine, execute()
passes parsed_instruction.action
as input to Python's getattr()
to select and execute the specific method that implements the instruction, which receives parsed_instruction.argument
as input. Most of the methods that implement instructions return None
to execute()
, which in turn sends it back to the coroutine. The exceptions are the branch instructions, of which we only have bgt
. These branch methods return the integer representing the location of the next instruction to be executed.
Following the iterator protocol, execute()
catches the StopIteration
exception, which is the mechanism that the coroutine uses to signal the end of a program.
LOG = True
class Computer(object):
def __init__(self):
self._state = {
"ri": None,
"ro": None,
"r0": None,
"r1": None,
"r2": None,
"r3": None,
"r4": None,
"r5": None,
"r6": None,
"r7": None,
"r8": None,
"r9": None,
"zero": None,
"negative": None,
"accumulator": 0
}
self._programs = []
self._inputs = []
def load(self, program, input=0):
self._programs.append(iter(program))
self._inputs.append(input)
def execute(self):
for i, program in enumerate(self._programs):
self._state['ri'] = self._inputs[i]
parsed_instruction = next(program)
while True:
branch_dest = getattr(self, parsed_instruction.action)(
parsed_instruction.argument)
try:
parsed_instruction = program.send(branch_dest)
except StopIteration:
# End of program
break
@get_argument
@log
def lda(self, argument):
self._state['accumulator'] = argument
@log
def sto(self, argument):
self._state[argument] = self._state['accumulator']
@log
def bgt(self, argument):
if not (self._state['zero'] or self._state['negative']):
return argument
@get_argument
@log
def cmp(self, argument):
v = self._state['accumulator'] - argument
# Set flags in the computer state that are used by the logic
# of branch methods
self._state['zero'] = not v
self._state['negative'] = v < 0
@get_argument
@log
def add(self, argument):
self._state['accumulator'] += argument
def state(self):
return self._state
We can now define a function that creates a computer and the two programs defined by the coding challenge above:
def create_computer_and_programs():
c = Computer()
# p1: receive int through INPUT; add +10 to the INPUT; store result.
p1 = Program(['lda ri', # load input
'add #10', # add 10 to it
'sto r0']) # store result for next program
# p2: load result of previous program execution; depending on if
# it's > 50, set a register (r1) in machine state to 100; otherwise to 0.
p2 = Program(['lda #100', # assume p1 output > 50, so...
'sto r1', # ... set r1 to 100
'lda r0', # load p1 output
'cmp #50', # if p1 output...
'bgt 7', # ... is greater than 50, r1 already set up
'lda #0', # p1 output is less than 50 so ...
'sto r1', # ... set r1 to 0
'cmp #0']) # noop to have destination for bgt above
return c, p1, p2
If we load p1
with an input value of 40
, its output (stored in r0
) will be 50
. As a consequence, p2
will set r1
to 0
:
c, p1, p2 = create_computer_and_programs()
c.load(p1, input=40)
c.load(p2)
c.execute()
print("p2's output = %s" % c.state()['r1'])
Whereas if p1
is loaded with an input value of 41
, its output will be 51
and p2
will set r1
to 100
:
c, p1, p2 = create_computer_and_programs()
c.load(p1, input=41)
c.load(p2)
c.execute()
print("p2's output = %s" % c.state()['r1'])
With this we conclude the implementation of this coding challenge.
Comments
Comments powered by Disqus