Big picture overview and contextualization of Python:
Conceptually, 90% of python can be explained in terms of these things
We’ll fill in the final 90%, and start building our own data structures using Object Oriented Programming (OOP).
These concepts can be explained largely in terms of those from this session.
Please refer to the Codecademy Python course for more practice and fluency with these concepts, and how they relate to each other.
Why do we care about Python?
Not so good for actual logic or anything too complex
data.frame
s are great)However, R has some pain points…
Don’t limit yourself unnecessarily to one tool…
Living in Unix makes it easy to use both R and Python in the context of a single project.
There are other languages that fit some of these characteristics, but python is nice because
In which we investigate the building blocks of logic
REPL is short for Read Evaluate Print Loop.
python
> print "Hello world"
Now you can enter in little bits of code as we go (type Ctrl-d or exit
to get out)
The elements of logic; our nouns
The Unix philosophy embraces plain text data.
Data types let us more easily manage information, and relate it to the things in the world it represents.
This comes at the expense of universal interoperability.
The most basic data types are scalars. We think of them as atomic values.
"this is a string"
42
3.14159
True
These are “containers” for other types of data (scalars, other collections, more complex data types…).
("Bob Jones", 42)
{"name": "Bob Jones", "age": 42}
["Bob Jones", "Jane Doe", "Ralph Nader"]
Tuples and dictionaries for organizing data about some single thing:
("Bob Jones", 42)
{"name": "Bob Jones", "age": 42}
Lists for arbitrary collections of individual data points:
["Bob Jones", "Jane Doe", "Ralph Nader"]
[("Bob Jones", 42), ("Jane Doe", 31), ("Ralph N.", 85)]
Here we use them more or less like a list, but where we index entries by name instead of position:
{"Bob Jones": {"age": 42, "occupation": "haxxor"},
"Jane Doe": {"age": 31, "occupation": "mathematician"},
"Ralph Nader": {"age": 85, "occupation": "politician"}}
We won’t go into these as deeply today, but in short, we tend to use them like lists:
In which we manipulate and reason about objects; our verbs
3 + 4
3 - 1
4 * 5
6 / 5
10 ** 2
42 % 5
You can create more complex expressions using parentheses for grouping: (4 + 5) * 2
+
works for strings as well:
"this" + "that"
=> "thisthat"
True & False
or True and True
True
only if both operands are True
False | True
or False or False
(not exclusive…)
True
if at least one of operands is True
Predict what these evaluate to: True and False
, True and True
, False and False
, True or False
, True or True
, False or False
True
and False
like ints 1
and 0
, respectively, and vice versa (try True + True
)0
, ""
, False
or None
will be treated as True
in logical expression (“truthy”)Like in the shell, we can give things names
age = 42
name = "Bob Jones"
person = {"name": name, "age": age}
# We can use variables just as though they were their values
age / 4
_
)Example: bobs_occupation = "haxxor"
We’ll mostly focus on lists and dictionaries
Writing xs = [1, 2, 3, 4, 5, 6, 7]
defines a list xs
.
+
[1, 2] + [3, 4]
=> [1, 2, 3, 4]
)xs[4]
xs[3] = 999
Note: Python has 0-based indexing.
These work similarly to lists, but we have more flexibility over the indices:
person["name"]
person["occupation"] = "haxxor"
Here "occupation"
is the key, and "haxxor"
is the value. Together, they form a key-value pair.
Keys are often strings, but don’t have to be:
crazy_dict = {4: "some string", (1,2,3): 999}
crazy_dict[4]
crazy_dict[(1,2,3)]
They can be numbers, tuples or anything hashable:
x
is hashable by running hash(x)
("this", 1234)
Is it hashable?
(4545, ["a", "b"])
Is it hashable?
A single DNA sequence?
(press down for answer)
String
seq = "AGCTAGCTACGT"
A DNA sequence, together with sequence name and other metadata?
(press down for answer)
Dictionary
seqrecord = {"seq": "AGCTAGCTACGT", "name": "MBG234"}
A collection of DNA sequences and names?
(press down for answer)
Dictionary
seqrecords = [{"seq": "AGCTAGCTACGT", "name": "MBG234"},
{"seq": "AGCTTCCCACGT", "name": "MBG235"},
{"seq": "AGATTCCTCCGT", "name": "MBG236"}]
A collection of DNA sequences and names, together with metadata about the collection (sampling date and such)?
(press down for answer)
Dictionary
{"seqrecords": [{"seq": "AGCTAGCTACGT", "name": "MBG234"},
{"seq": "AGCTTCCCACGT", "name": "MBG235"},
{"seq": "AGATTCCTCCGT", "name": "MBG236"}],
"sampling_location": "Bangladesh",
"technician": "Bob Jones"}
In which from our rules of logic we compose spells
def square(n):
ans = n * n
return ans
square(4)
4
, that value gets passed in for the variable n
in the body of the function.return
Functions are “first class”, in that they can be passed around as data just like numbers, strings, etc.
# Using our square function as a value
print square
[square, "data"]
We’ll see how this is useful later.
Some types have functions associated with them, such as str.upper
and list.append
:
str.upper("this")
xs = [1, 2, 3]
list.append(xs, 9)
print xs
You can see what functions are associated with a type t
using dir(t)
(example: dir(list)
shows all the list functions)
In which we gain mastery over the rules of logic
x = 3
if x < 4:
print "You should see this"
else:
print "You should NOT see this!"
Things to note:
<
(less than) operand, which behaves just as you’d expect, returning True
/False
depending on the operand valueselif
is a combination of an else
and an if
.
def f(n):
if n < 5:
print "Cond 1"
elif n > 10:
print "Cond 2"
else:
print "Cond 3"
f(3)
f(7)
f(12)
What do you think will happen below?
data = []
if data:
print "catgifs"
list.append(data, "puppies?")
if data:
print "yay internetz!"
Do something for every thing in a collection:
for n in xs:
print "On int:", n
dict
has an items
function that gives all of the key, value pairs. Try
dict.items(person)
Now loop over them:
for k, v in dict.items(person):
print "key:", k, "val:", v
Using just what we’ve learned already, how would you create a list ys
that contained the square of every number in the list xs
?
for
to build resultsys = []
for x in xs:
y = square(x)
list.append(ys, y)
print ys
This is imperative; we’re telling the computer how to build ys
List comprehensions:
ys = [square(x) for x in xs]
print ys
This is more declarative.
ys = map(square, xs)
print ys
Note that we’re passing the function square
as an argument to the function map
.
List comprehension and functional approaches are more declarative.
We tell the computer what to compute, not how we want it computed. This is usually a good thing (cleaner code; sometimes better performance).
In which we rise to the level of systems
import math
dir(math)
help(math.log)
math.log(3.14)
math
is a module. Modules, just like objects, have attributes, and these attributes are typically functions that we can use in our programs (though sometimes they’re just useful data, like math.e
).
We also talk about modules as being namespaces. A namespace is just a “path” to some data or function or module. In this case math
can also be thought of as a namespace.
Namespaces help us avoid naming conflicts. If we have a function called square
, and we import a module like math
with a function called square
, namespaces let us distinguish between them.
Collections of modules are called packages.
You can install Python packages using the pip
command line tool (the “App Store” for Python packages). From your Unix shell:
pip install --user biopython
The --user
flag tells pip to install things in your local python lib directory. You would only omit that if you were installing system wide on your own computer.
We’re using Python 2. Python 3 has been out for YEARS, but adoption has been slow. Eventually, Python 3 will become the system default and more widely used, but till then…
Bottom line: Python 3 is rather different in a number of ways, so be aware.