Project - Collecting NBA Player Data from ESPN

In this post, I'll deviate from my previous application-focused posts on data mining. I wanted to create a simple guide to one of the most useful features of the Python language: list comprehension. It provides a concise way to create lists in Python. Let's start with a simple example to illustrate just how it works. A list comprehension can be broken down into its three subsequent parts - iteration, conditional filtering, and processing. Take a look at the code below, which initializes a list and adds items to it.

my_list = []
  for i in xrange(0, 100): #iteration
    if i % 2 == 0:         #conditional filtering
      my_list.append(i)   #processing

Basically, we initialize a new empty list, and use a for loop to iterate through the range of numbers 0 to 100. Then, we use a conditional that checks whether the number is even (this can be easily achieved with the % or modulo operator to check if the remainder is 0 when dividing by 2). If this condition is satisfied, we append the number to the list. Now let's take a step-by-step look at how to collapse this into a one-line statement using a list comprehension:

my_list = [i for i in xrange(0, 100) if i % 2 == 0]

The first thing to note is that a list comprehension is always enclosed within square brackets [] to indicate that you want to generate a list. If you're familiar with generator expressions in Python, they kind of work the same way, except a generator expression returns an iterator to some type of sequence (list, tuple, etc.) whereas a list comprehension returns a full list. List comprehensions can run up to 35% faster than a for loop, in addition to allowing us to make our code much more concise and readable so that we aren't sticking verbose for loops all over the place.

Let's break down the code within the brackets above to get a better idea of how to construct a list comprehension. The first part is where we place our evaluating expression for each iteration of the list comprehension. In this case, our evaluating expression is nothing more than the symbol i which is iterating through the sequence. The next part of our list comprehension, for i in xrange(0, x), is where we place our loop expression to indicate the type of iteration we are doing to generate the list. Thus, you can see how the list comprehension condenses list creation by allowing you to place the for loop statement in the same line as your evaluating expression. Finally, we have if i % 2 == 0, which is our conditional filter. Note that a conditional filter is optional, in the same way that the if statement is optional in the for loop version of the code above. It all depends on your particular problem, whatever list you are trying to generate - if the list is generated conditionally according to a certain parameter or set of parameters, then you will need the conditional filtering.

You can easily extend the above example to many other situations, as long as you understand the basic breakdown of a list comprehension and how it translates to a normal iterative loop procedure. Now let's take a look at a slightly more complicated example - one where we generate a 2-dimensional list or matrix.

my_list = []
for i in xrange(0, 10):
  temp_list = []
  for j in xrange(0, 10):
    temp_list.append(j)
  my_list.append(temp_list)

It's not quite as intuitive to generate a list comprehension for this example. Here, we're creating a 2-dimensional list, or matrix, which happens to be a 10x10 based on the for loops shown above. Let's see how we would translate this to a more concise list-comprehension syntax.

my_list = [[i for i in xrange(0, 10)] for j in xrange(0, 10)]

From the list comprehension we can see that with nested loops, we must construct the list comprehension starting with the innermost loop going from left to right. So there you have it - a brief introduction to list comprehensions in Python!