On being pythonic

I recently was given a Python coding challenge during a job interview. I was criticized for not writing "pythonic" enough code. The good news is that I did great with the other interviewers in the panel, so I still got the job \o/. Afterwards, though, I decided to use the criticism as a learning opportunity, revisit the challenge and write this blog post with the resulting code, while making my best effort to be "pythonic". To that end, here's a basic definition: pythonic code uses the powerful idioms of Python that distinguishes it from other languages to produce elegant and concise programs. To guide me on the road to being "pythonic", I am using the following references:

In the code that follows I will cover the following topics

  • One of the aspects that strongly distinguishes Python from other programming languages is its approach to iterations. As Raymond Hettinger argues in his presentation, Python's for statement should have been called for each to better reflect its semantics. I will talk about iterables, iterators, the iterator protocol, generators and co-routines.
  • Python's regular expresions module.
  • Improving clarity with named tuples.
  • Factoring out administrative logic with decorators.

The coding problem was stated as follows:

In [46]:
# Model a computing device that can load third party programs.

# PROGRAMS consist of ACTIONS that share an execution ENVIRONMENT,
# particularly computer STATE. ACTIONS belong to a limited set supported
# by this device. ACTIONS can communicate with each other through
# REGISTERS.

# PROGRAM execution results in computer STATE change. The STATE change
# can then be inspected.

# Device should allow to LOAD multiple PROGRAMS and execute them
# SEQUENTIALLY.

# Typical API of the computing device would include:
# - load(_program)
# - execute
# - (get_)state

# Implement the device, define two programs (see below), load them
# and execute, then show results.

# program1: receive int through INPUT; add +10 to the INPUT; store result.
# program2: load result of previous program execution; depending on if
#           it's > 50, set a register in machine state to 100; otherwise to 0.

# usage example
# c = Computer(...)
# p1 = Program(...)
# p2 = Program(...)
# c.state()  # -> {...}
# c.load(p1, input=...)
# c.load(p2, input=...)
# c.execute()
# c.state()  # -> {...0 or 100...}

Recognizing instructions with Python regular expressions

To make parsing programs easy, the computing device (henceforth the computer) will have one operand instructions and an accumulator register. The add instruction, for example, adds its operand to the accumulator register. This means that we are also going to need lda and sto instructions, that load a value to the accumulator and store the value of the accumulator in another register, respectively. Those registers are part of the computer's state and are described further below. To be able to write program1 and program2 in the coding challenge, the following is the minimum action set necessary:

In [47]:
ACTION_SET = set(['lda', 'sto', 'bgt', 'cmp', 'add'])
ACTIONS_NO_LITERAL = set(['sto', 'bgt'])
BRANCHES = set(['bgt'])

Some instructions, like lda or add can take two types of operands, registers and literals:

  • lda r0 loads the content of register r0 to the acumulator.
  • lda #10 loads the value 10 to the acumulator.

In the case of the sto instruction, though, it doesn't make sense to have literal operands. This instruction always stores the value in the acumulator in one of the registers. The set ACTIONS_NO_LITERAL will help in processing this type of instructions.

Similarly, branches only accept destination operands, which is an integer i from 0 to n - 1, where n is the total number of instructions in the program. Branches, like bgt, direct the computer to continue fetching instructions from the location specified by the destination operand. In other words, the computer is expected to fetch next the ith instruction from the beginning of the program. The set BRANCHES will help in processing this type of instructions.

To parse a programs, we are going to use Python's re (regular expressions) module:

In [48]:
import re

RE_LINE = re.compile(
    r'^(?P<action>[a-z]{3})\s+(?P<arg>r[io0-9]$|\#-?[0-9]+$|[0-9]+$)')

This regular expression parses one instruction at a time. It can be divided in three parts:

  1. The named capture group ^(?P<action>[a-z]{3}) recognizes sequences of 3 alphabetic characters that correspond to members of ACTION_SET. Note that it is anchored to the beginning of the string with ^.
  2. One or more whitespaces \s+ that separate the action from the arg.
  3. The named capture group (?P<arg>r[io0-9]$|\#-?[0-9]+$|[0-9]+$). This group can be one of three alternatives, separated by | in the regular expression (note that each alternative is anchored to the end of the string with $):
  • r[io0-9]$, one of the computer's registers ri, ro, r0,... r9 or...
  • #-?[0-9]+$, a literal beginning with # as explained above or...
  • [0-9]+$, a branch destination as explained above.

These are a few examples of the instructions that can be parsed:

In [49]:
# Instruction with register operand
match = RE_LINE.search('lda r0')
print(match.group('action'))
print(match.group('arg'), '\n')

# Instruction with literal operand
match = RE_LINE.search('add      #16')
print(match.group('action'))
print(match.group('arg'), '\n')

# Instruction with branch destination operand
match = RE_LINE.search('bgt 23')
print(match.group('action'))
print(match.group('arg'))
lda
r0 

add
#16 

bgt
23

If the provided string doesn't conform with the regular expression, no match object will be returned by the search method:

In [50]:
# Invalid action
match = RE_LINE.search('ldab r0')
print(match, '\n')

# Invalid operand
match = RE_LINE.search('lda ##10')
print(match, '\n')

# No operand
match = RE_LINE.search('lda')
print(match)
None 

None 

None

For a deeper look into Python's regular expressions, here's part 1 and part 2 of an excellent tutorial.

Using coroutines to implement programs

Python lists have two characteristics that make them very suitable to represent the programs as defined in this coding challenge:

  • They are an ordered collection of objects. A program is an ordered collection of instructions, with each instruction being a string that can be recognized using the regular expresion RE_LINE defined above.
  • Their individual elements can be directly accesed using and integer index. This index will serve as the program counter, i.e. the pointer to the next instruction within a program to be executed.

We are going to tweak the behavior of lists a little bit, though, to control better the execution of programs. So instead of using lists directly, we are going to use collections.UserList, a useful base class for user defined list-like classes which can inherit from them and override existing methods or add new ones. To be more precise, there will be a Program class that will inherit from collections.UserList and override the __iter__() method. As can be seen below, the original __iter__() method initializes an index i with 0 and proceeds to iterate sequentially over the list, returning the next element in each iteration:

In [51]:
import collections
import inspect

print(inspect.getsource(collections.UserList.__iter__))
    def __iter__(self):
        i = 0
        try:
            while True:
                v = self[i]
                yield v
                i += 1
        except IndexError:
            return

The __iter__() method in our Program class will be a coroutine that will enable the implementation of branch instructions by changing the value of the index variable to any value between 0 and the length - 1 of the program. To see how, let's take a little detour to review in more detail some Python concepts and constructs.

The __iter__() method defined by collections.UserList is a generator function. As explained in Making sense of generators, coroutines, and “yield from” in Python:

  • A generator function, when executed, returns a generator object.
  • A generator object implements the iterator protocol, meaning that it knows what to do in a for loop.
  • Each time a generator object reaches a yield statement, it returns the yielded value to the for loop that is invoking it, and goes to sleep.
  • With each successive iteration, the generator object starts running from where it paused (i.e., just after the most recent yield statement)
  • When the generator object reaches the end of the function, or encounters a return statement, it raises a StopIteration exception, which is how Python iterators indicate to for loops that they’ve reached the end of the line.

Let's define a collections.UserList and iterate over it with a for loop:

In [52]:
a_list = collections.UserList([0, 1, 2])

for item in a_list:
    print(item)
0
1
2

We can also interact with the generator function directly, similar to what the for statement does behind the scenes. An iterable object, such as a collection.UserList, produces a fresh new iterator or generator object each time you pass it to the iter() built-in function

In [53]:
a_list = collections.UserList([0, 1, 2])

g = iter(a_list)
g
Out[53]:
<generator object Sequence.__iter__ at 0x7f121e2693c0>

Notice that when iter() was called, it returned a generator object. This of course happened because iter() in turn called the __iter__() in collections.UserList. We now execute the generator object with the next() built-in function:

In [54]:
print(next(g))
print(next(g))
print(next(g))

try:
    next(g)
except StopIteration:
    print('End of a_list')
0
1
2
End of a_list

Generator functions use the statement yield to produce a stream of objects so their callers can iterate over them (in a for loop for example). yield, though, can also be an expression and be placed on the right hand side of an assigment. This allows functions to consume values sent to them with their send() method by their callers. Such functions are called coroutines and open the possibility of concurrent programming in a single thread. Let's look at a simple example, borrowed from A Curious Course on Coroutines and Concurrency:

In [55]:
def grep(pattern):
    print ("Looking for %s" % pattern)
    while True:
        line = (yield)
        if pattern in line:
            print(line)

Execution of a coroutine is similar to a generator. When a coroutine is called, it returns a generator:

In [56]:
c = grep("python")
c
Out[56]:
<generator object grep at 0x7f121e269580>

Coroutines only run in response to the next() statement (like generators) and their send() method. All coroutines must be "primed" first by calling next() on them. This advances execution to the first yield expression. At this, point a coroutine is ready to receive a value:

In [57]:
# Prime the coroutine so execution advances to the yield expression
next(c)
Looking for python
In [58]:
# Send lines to the coroutine, so it can search for the pattern "python"
c.send("Yeah, but no, but yeah, but no")
c.send("A series of tubes")
c.send("python generators and coroutines rock!")
python generators and coroutines rock!

As indicated above, our Program class inherits from collections.UserList and refines it as follows:

  • It overrides the __iter__() method with a coroutine. This enables the Computer class that we will define below to iterate sequentially over the instructions in a Program, while making it possible to change the next instruction to be returned as a result of a branch instruction execution. An int value produced by the coroutine's yield expression is used as a pointer to the next instruction to be retrieved, thus altering the flow of execution.
  • It adds a _parse() method, which parses each instruction using the regular expression presented above, performs additional checks and returns a ParsedInstruction named tuple with two attributes, action and argument, for ease of execution by the Computer class.
In [59]:
s = collections.namedtuple("ParsedInstruction",
                           "action, argument")


class Program(collections.UserList):

    def __iter__(self):
        next_instruction = 0
        try:
            while True:
                instruction = self[next_instruction]
                next_instruction += 1
                branch_dest = yield self._parse(instruction)
                
                # Flow of execution is altered when the preceding yield
                # expression produces a value of type int
                if isinstance(branch_dest, int):
                    next_instruction = branch_dest
        except IndexError:
            return

    def _parse(self, instruction):
        match = RE_LINE.search(instruction.lower())
        if not match:
            raise ValueError('Invalid instruction "%s"' % instruction)
        action = match.group('action')
        arg = match.group('arg')
        is_arg_literal = arg[0] == '#'
        is_branch_destination = arg.isnumeric()
        if action not in ACTION_SET:
            raise ValueError('Action "%s" not in actions set' %
                             match.group('action'))
        if is_arg_literal:
            if action in ACTIONS_NO_LITERAL:
                raise ValueError('Action "%s" cannot have a literal argument' %
                                 match.group('action'))
            arg = int(arg[1:])
        if action in BRANCHES:
            if not is_branch_destination:
                raise ValueError(
                    'Branch "%s" at %s has non branch destination "%s"' %
                    (action, self._next, arg))
            arg = int(arg)
            if arg >= len(self) or arg < 0:
                raise ValueError(
                    'Branch "%s" to destination "%s" jumps outside program '
                    'boundaries' % (action, arg))
        return ParsedInstruction(action, arg)

Let's create a Program and its generator (coroutine) and "prime" the latter to start fetching instructions:

In [60]:
# Define a program with 3 instructions
p = Program(['add #10',     # add 10 to the accumulator
             'cmp #50',     # if the accumulator is greater than 50...
             'bgt 0'])      # ... loop back to the first instruction

# Get the Program's iterator (coroutine)
coroutine = iter(p)

# "Prime" the coroutine and fetch the first instruction
parsed_instruction = next(coroutine)
parsed_instruction
Out[60]:
ParsedInstruction(action='add', argument=10)

We now fetch the other two instructions in the program and, for the bgt instruction, alter the flow of execution by sending a 0 (the first instruction's location) back to the coroutine:

In [61]:
# Fetch the cmp instruction
parsed_instruction = coroutine.send(None)
print(parsed_instruction)

# Fetch the bgt instruction
parsed_instruction = coroutine.send(None)
print(parsed_instruction)

# Branch back to the add instruction at location 0
parsed_instruction = coroutine.send(0)
print(parsed_instruction)
ParsedInstruction(action='cmp', argument=50)
ParsedInstruction(action='bgt', argument=0)
ParsedInstruction(action='add', argument=10)

Using decorators to factor-out logic

By definition, a decorator is a function that takes another function as argument and extends the behavior of the latter function without explicitly modifying it. This makes decorators a great mechanism to factor-out logic that is repeated across several functions or methods. Frequently, this is "administrative" logic that is not part of the main functionality that we are implementing. In our case, we are implementing two decorators that are used by the Computer class defined below:

  • As mentioned above, many Computer instructions can take register or literal operands. In the Computer class we have, for each instruction, a method that implements it. The methods for instructions that can take both types of operands are decorated with get_argument. This decorator retrieves the actual value to be operated on from the register specified in the instruction, when that is the case. This way, the methods can assume they only receive literal operands.
  • For debugging purposes, we may choose to log to the console the name of methods that implement instructions as they are executed, along with the arguments received. To accomplish this, we use the log decorator together with a LOG flag that is configured with True or False in the Computer class.
In [62]:
import functools


def get_argument(func):
    functools.wraps(func)

    def wrapper(self, argument):
        to_pass = argument
        if not isinstance(argument, int):
            to_pass = self._state[argument]
        return func(self, to_pass)

    return wrapper


def log(func):
    functools.wraps(func)

    def wrapper(self, argument):
        if LOG:
            print(func.__name__, argument)
        return func(self, argument)

    return wrapper

Implementing the computer

We can now implement the Computer class. Its core is the execute() method, which iterates over the instructions of each program by interacting with the Program's __iter__() coroutine in the same manner we interacted with it manually above. For each parsed_instruction yielded by the coroutine, execute() passes parsed_instruction.action as input to Python's getattr() to select and execute the specific method that implements the instruction, which receives parsed_instruction.argument as input. Most of the methods that implement instructions return None to execute(), which in turn sends it back to the coroutine. The exceptions are the branch instructions, of which we only have bgt. These branch methods return the integer representing the location of the next instruction to be executed.

Following the iterator protocol, execute() catches the StopIteration exception, which is the mechanism that the coroutine uses to signal the end of a program.

In [63]:
LOG = True 


class Computer(object):
    def __init__(self):
        self._state = {
            "ri": None,
            "ro": None,
            "r0": None,
            "r1": None,
            "r2": None,
            "r3": None,
            "r4": None,
            "r5": None,
            "r6": None,
            "r7": None,
            "r8": None,
            "r9": None,
            "zero": None,
            "negative": None,
            "accumulator": 0
        }

        self._programs = []
        self._inputs = []

    def load(self, program, input=0):
        self._programs.append(iter(program))
        self._inputs.append(input)

    def execute(self):
        for i, program in enumerate(self._programs):
            self._state['ri'] = self._inputs[i]
            parsed_instruction = next(program)
            while True:
                branch_dest = getattr(self, parsed_instruction.action)(
                    parsed_instruction.argument)
                try:
                    parsed_instruction = program.send(branch_dest)
                except StopIteration:
                    # End of program
                    break

    @get_argument
    @log
    def lda(self, argument):
        self._state['accumulator'] = argument

    @log
    def sto(self, argument):
        self._state[argument] = self._state['accumulator']

    @log
    def bgt(self, argument):
        if not (self._state['zero'] or self._state['negative']):
            return argument

    @get_argument
    @log
    def cmp(self, argument):
        v = self._state['accumulator'] - argument
        # Set flags in the computer state that are used by the logic
        # of branch methods
        self._state['zero'] = not v
        self._state['negative'] = v < 0

    @get_argument
    @log
    def add(self, argument):
        self._state['accumulator'] += argument

    def state(self):
        return self._state

We can now define a function that creates a computer and the two programs defined by the coding challenge above:

In [64]:
def create_computer_and_programs():
    c = Computer()

    # p1: receive int through INPUT; add +10 to the INPUT; store result.
    p1 = Program(['lda ri',     # load input
                  'add #10',    # add 10 to it
                  'sto r0'])    # store result for next program

    # p2: load result of previous program execution; depending on if
    #     it's > 50, set a register (r1) in machine state to 100; otherwise to 0.
    p2 = Program(['lda #100',   # assume p1 output > 50, so...
                  'sto r1',     # ... set r1 to 100
                  'lda r0',     # load p1 output
                  'cmp #50',    # if p1 output...
                  'bgt 7',      # ... is greater than 50, r1 already set up
                  'lda #0',     # p1 output is less than 50 so ...
                  'sto r1',     # ... set r1 to 0
                  'cmp #0'])    # noop to have destination for bgt above
    return c, p1, p2

If we load p1 with an input value of 40, its output (stored in r0) will be 50. As a consequence, p2 will set r1 to 0:

In [65]:
c, p1, p2 = create_computer_and_programs()
c.load(p1, input=40)
c.load(p2)
c.execute()
print("p2's output = %s" % c.state()['r1'])
lda 40
add 10
sto r0
lda 100
sto r1
lda 50
cmp 50
bgt 7
lda 0
sto r1
cmp 0
p2's output = 0

Whereas if p1 is loaded with an input value of 41, its output will be 51 and p2 will set r1 to 100:

In [66]:
c, p1, p2 = create_computer_and_programs()
c.load(p1, input=41)
c.load(p2)
c.execute()
print("p2's output = %s" % c.state()['r1'])
lda 41
add 10
sto r0
lda 100
sto r1
lda 51
cmp 50
bgt 7
cmp 0
p2's output = 100

With this we conclude the implementation of this coding challenge.

Comments

Comments powered by Disqus