Python and Immutability

December 21, 2012
• Programming
• Technology

When I first starting learning Python, one of the first non-trivial ideas that I came across was the concept of immutability. I quickly became confused about the following issue, and I thought the resolution had to do with the concept of immutability. Here is a somewhat counter-intuitive snippet that every python programmer should be aware of:

def add_one(num):
num = num + 1

a = 5
print a
&gt;&gt;&gt; 5


The question is: why didnt the function add 1 to the number a? We naively would have expected the print to have resulted in 6. But, it seems that a has not been changed. A person who programs in a language like C would simply say, Well, clearly a has been passed by value. Therefore, a local copy of a was made within the function, and THAT is what was incremented, not the original a itself. In C and C++, one has the option to pass objects to functions either by value or by reference (Im being general and sweeping the whole pointer issue under the rug).

But, thats not really whats happening here. a isnt being passed by value. We can clearly see this by modifying the function a bit:

def add_one(num):
print "In the function: ", id(num)
num = num + 1

a = 5
print "Before the function: ", id(a)
print "After the function: ", id(a)
print a

&gt;&gt;&gt; Before the function: 4298188648
&gt;&gt;&gt; In the function: 4298188648
&gt;&gt;&gt; After the function: 4298188648
&gt;&gt;&gt; 5


As one can see, the id of the variable a is constant both outside and within the function. So, python isnt copying a when it is passed to the function. This is good, because unnecessarily copying large objects should certainty be avoided. But, our first example indicates that it isnt (exactly) passing it by reference, either. So, what is happening?

I have seen this issue resolved by invoking the concepts of mutability or immutability. In python, an immutable type is one that cant be changed once it has been created. All other objects are mutable. Some people confuse this issue with the one that were facing with our increment operator. For example, see the explanation here:

http://forums.udacity.com/cs101/questions/1012/python-and-pass-by-reference

"You've got to watch it. It's not consistent. Literals are pass-by-value, as in Manmeet's example. I this case, the inc(n) does nothing to the value of n. However, other things are all pass by reference -- pretty much any class you write will have its objects passed around by reference. In Python, things which are immutable types are passed by value, and things which are mutable are passed by reference. So lists, while a built-in-type are pass-by-reference because they are a mutable type."

Or here:

http://bogdan.org.ua/2008/02/11/python-passing-by-value-vs-passing-by-reference.html

"It is important to understand mutable and immutable objects. Some objects, like strings, tuples, and numbers, are immutable. Altering them inside a function/method will create a new instance and the original instance outside the function/method is not changed. Other objects, like lists and dictionaries are mutable, which means you can change the object in-place. Therefore, altering an object inside a function/method will also change the original object outside. Immutable variables such as integers [strings, numerics and tuples are immutables] are passed by value. That is, if your function accepts some integer argument, you are safe assuming that your function wont be able to modify your integer. Mutable variables such as dictionaries and lists are passed by reference, and so if your function accepts mutable argument, it may modify the contents of that mutable variable outside the scope of the function."

These explanations are both confusing, confused, and simply wrong. But they were what I encountered when first trying to understand python. I became thrown off by the concept of mutability vs immutability and I thought that I had to be aware of it whenever passing things to functions, thinking that it would effect how functions worked. But, its easy to show that this is wrong. We can do the same experiment with a type that we know is mutable, namely a list:

def add_one(container):
container = container + [1]

a = [5]
print a
&gt;&gt;&gt; [5]


We see identical behavior for both mutable and immutable objects. The behavior were seeing cant only be explained by immutabilty vs mutability (some details will, as well talk about later, but they will turn out to be syntactic differences and not at the heart of the issue).

So, what does python do when passing an object to a function? Its not pass-by-reference, and its not pass-by-value, and its not a hybrid depending on mutability. Really, python does something pretty simple and easy to understand when one has the right picture in ones head. Im not going to try force a label on the system, that seems to make it more confusing. Rather, Ill simply draw what happens.

Some people say everything in python is an object. Thats not true, really. Most things are references to objects, or pointers to objects, or labels for objects, or variables representing objects or however youd like to describe it. But, whatever you call it, the picture that one should have in their head is the following:

Label Object Value
A Object Float 5
B Object List [ 3, 4, 5]
C

This is the result of the following snippet:

A = 5
B = [3, 4, 5]
C = B


When one declares a variable, two things are happening. Lets look at the statement B = [3, 4, 5]. This says two things. 1) Go into pythons memory and create a list that contains 3, 4, and 5. 2) Create a variable (or label) called B and have it point to that object. The statement C=B doesnt actually create anything in memory, it says, Create a variable called C and have it point to whatever B points to. Notice that C doesnt point to B, it points to the object that B is pointing to. Variables can never point to one another, they only point to objects in memory.

Now, imagine that I do the following: B = {"fish": 7, "dog": 8} What does my picture look like now?

Label Object Value
A Object Float 5
Object List [ 3, 4, 5]
C
B Object Dict {"fish": 7, "dog": 8}

In our magical python table, we told the variable (or label) to point to a new dictionary that we created. Its clear from this picture that C is still pointing to the list that B was previously pointing to. This all makes sense if one reinterprets the = operator. In python, = is the assignment operator. It takes a label on the left and makes that lable point to an object on the right. One cant assign an object to another object, objects live on the right side of our two-column list. One can only assign labels to objects. Thats why you cant do:

[3, 4, 5] = [3, 4, 5, 6]


But you can do:

a = [3, 4, 5]
a = [3, 4, 5, 6]


In python, calculations take place on the right side of expressions, and then the result is assigned to a variable on the left hand. If you think about this, the whole lvalue and rvalue issue becomes much simpler (thats a whole other bag of worms that is also simpler than it is commonly made out to be).

Okay, with all this in mind, lets go back to our original issue:

def add_one(num):
num = num + 1
a = 5
print a
&gt;&gt;&gt; 5


What happens in the function? When we do add_one(a), python creates a new variable and has it point to the object that a is pointing to. Lets call that variable num, since thats the variable that appears in the signature of the add_one function. So, at the beginning of the function, our table looks like this:

Label Object Value
a Object Float 5
num

Simple. It isnt really pass-by-reference, or pass-by-value. Its really create-new-local-variable-that-points-to-the-same-object-in-memory. I guess thats not quite as catchy. This is almost exactly the same as pass-by-reference but with the caveat that the equal operator cant change objects, but instead it reassigns variables. So, when we do num = num + 1, we are really creating a new float object whose value is equal to the value of the float object that num points to plus one, and then we are assigning num to that object. Therefore, our table after this step looks like this

Label Object Value
a Object Float 5
num Object Float 6

This is just a direct result of the fact that one cant change the contents of an object using the assignment operator =. That only changes what variables point to. The object remains constant. One can only change the contents of objects using certain methods of that object. This is where the difference between immutable and mutability becomes clear: mutable objects have methods that can modify the object (meaning, it can change the underlying memory layout of the object) and immutable objects dont. It has nothing to do with how objects are passed to functions, rather it involves the limitations of that objects ability to change itself.

In other words, a float in python is immutable because it doesnt have a set_val method, or an increment method, or something like that. One could imagine a version of python where the following was possible:

a = 5
a.set_val(6)
print a
&gt;&gt;&gt; 6
# NOT REAL !!!
a.increment()
print a
&gt;&gt;&gt; 7


If this were the case, then one could do the following:

def add_one(num):
# NOT REAL !!!
num.increment()
a = 5
print a
&gt;&gt;&gt; 6


The difference is that the function is (hypothetically) using a method that changes the underlying object. This would make our table look like this

Before the increment:

Label Object Value
a Object Float 5
num

After the increment:

Label Object Value
a Object Float 6
num

See the difference? We didnt use the assignment operator (or the re-assignment operator), so num remains assigned to the same object that a is assigned to. However, we went to the object itself, using a method, and asked it to change itself.

Of course, floats cant really do this. But mutable objects can, such as lists. So, we CAN actually do the following:

def add_one(container):
container.append(1)
a = [5]
print a
&gt;&gt;&gt; [5, 1]


Before appending:

Label Object Value
a Object List [5]
num

After appending:

Label Object Value
a Object List [5, 1]
num

So, using a lists append function can change the contents of the list. This has nothing to do with which variables are assigned to what objects, so both a and num get assigned to the same list. That list simply gets modified.

Appendix A:

One question that one may ask is why are floats immutable? Why dont they have the hypothetical set_val or increment methods that I described? The answer is that all floats are objects and that assignments to a float (usually) make variables point to the same object:

a = 5
b = 5
print id(a), id(b)
&gt;&gt;&gt; 4298188616 4298188616


Here, a and b point to the same underlying object in memory (this didnt have to be the case, but its an optimization that makes python faster). Contrast this with how lists work:

a = [5]
b = [5]
&gt;&gt;&gt; 4299804472 4299837520


The general python rule-of-thumb says that declaring objects will create new objects in memory, and we can see that this happens to lists. We want this, otherwise we wouldnt be able to append to list a without effecting list b. We shouldnt have to think, when declaring a list, if that list happens to be identical to another list and if that will change the behavior of our program. That would be a nightmare.

But, because floats cant be changed, then it turns out that it makes no difference if all floats of the same value are represented by the same object. The same thing happens with strings. Notice that strings too are immutable (one can of course do operations on strings, but this always returns a new string, not a modified version of the original string). This is for speed as well, as declaring new objects can be slow.

Appendix B:

There ARE a few issues with python that make it somewhat complicated and that I havent addressed above. Two examples are the concepts of global variables and the concept of function closures. The issue with globals revolves around what happens when one assigns a variable in a function with the same name as one defined in the global scope. Is the global variable reassigned, or is a local variable created and assigned? This issue comes from the fact that there is no var keyword in python that determines when a variable is declared or when an existing variable is reassigned. This keyword wasn't used for cleanliness, but I personally think the language would be more clear with such a variable (the presence of a global keyword is pretty ugly in my opinion).

Function closures describe a similar issue, but theyre a bit more complicated and can be more surprising. The issue describes the instances where one uses a non-local variable within the definition of a function. The ambiguity is what scope that function uses when it is called. Perhaps Ill have more on this later...