![logo](https://www.python.org/static/community_logos/python-logo-master-v3-TM.png)

# Python is a powerful, expressive, and easy to learn programming language. It is widely used in computational biology. There are a wealth of packages (numpy, scipy, pandas, statsmodels, scikit-learn) for data analysis in python. So it's **totally** worth learning!

[Official python tutorial](https://docs.python.org/3/tutorial/)


As we work through this notebook, you might want to open up a second one for trying things out. You can also fiddle with the code here.

Here are a few tips for working with notebooks:

* There are two modes: *command mode* and *edit mode*. 
* In edit mode you can edit the contents of individual cells. The selected cell is marked by a green border. 
* In command mode you can move up and down between cells, add and delete cells, switch a cell's *type* between code and markdown, etc. 
* To enter command mode, hit `Esc` (the Escape key).
* to enter edit mode with the selected cell (the one with the blue border), hit `Enter`. The border color will change to green.
* To run (execute) a cell (or selected cells), type `Ctrl-Enter`. `Shift-Enter` will run the current cell and select the next one.
* To see the other keyboard shortcuts, click on the `Help` menu.
* If you get stuck executing an infinite loop, choose `Interrupt` from the `Kernel` menu. 
* If you are in command mode, hitting the `a` key will add a new cell above the currently selected cell; `b` will add a new cell below the current cell.


# numeric types (int, float, complex)

In [None]:
# In python, everything on a given line that follows a # is a COMMENT
#
# create an integer variable called 'a' and assign it a value of 4
a = 4

# the output or result of the last command we type in a notebook cell will appear below the
# cell when we execute it
a

**DIGRESSION on NAMES and OBJECTS**

It's worth digging into this simple command `a = 4` a little more and learning some special
python terms. What this command actually does is create an OBJECT of TYPE `int` (i.e., an integer)
with the value of `4` and also a NAME (`a`) which points to the OBJECT.

In python, objects exist and have individuality (they live somewhere in a specific place in the
computer's memory). Multiple names can point to the same object. For simple objects like integers
this doesn't matter so much, but we will see later that having two names referring ('pointing') to the
same object can occasionally create confusion.

It's OK to think of names as variables, if you are comfortable with the notion of a variable from
another programming language or from math. As you become more proficient with python programming the
NAME versus OBJECT distinction will be more important.

In [None]:
# type is a python FUNCTION that will tell us the python type of an object
#
# FUNCTIONs are little snippets of previously defined python code that have a name
# and can be executed by typing that name followed by parentheses. FUNCTIONs can
# do stuff (like printing output) and also RETURN VALUES for us to use. FUNCTIONS
# may take ARGUMENTS that are passed in to the function to control their behavior. 
# ARGUMENTS (like 'a' below) go inside the parentheses. We will see later how to 
# define our own functions.

# (Like everything else in python, FUNCTIONs are actually just OBJECTs)
#
type(a)

In [None]:
a = 4
a += 3 # += adds the value (or thing) on the right to the variable on the left; shorthand for "a = a+3"

# print is a handy python FUNCTION that prints output to the terminal/notebook
# Note that the output of print() appears in a different place from the last value in the cell
# so we can have multiple outputs printed in a single cell
print(a)

# print can take multiple ARGUMENTS (things we pass in to FUNCTIONs to modify their behavior)
# FUNCTION ARGUMENTS are separated by commas
print('a+10 =', a+10)

In [None]:
# a decimal point tells python this is an object of type float
b = 3.5
type(b)

In [None]:
a/3 # In python 3, integer division returns floats rather than truncating

In [None]:
a//3 # this is what a/3 does in python2: the fractional part is discarded, so we always get an integer

In [None]:
## a % b gives the remainder from dividing a by b (aka modulus)
a = 17
b = 3
print(a, 'divided by', b, 'is', a//b, 'remainder', a%b)

In [None]:
a=3
b=0.1
a+b # integer + float = float

In [None]:
c = 3 - 4j # we can do complex numbers, too. j is the special symbol for the square root of -1

# abs() is a built-in python FUNCTION that works with all numeric types
# abs of a complex number is the length of the hypotenuse when we plot the number in 2D,
# that's why this comes out so nicely (3-4-5 triangle)
print(c, 'absolute value of c=', abs(c))
type(c)


# bools and comparisons

In [None]:
# One equals sign for assignment, two for comparison
a = 2
b = 4
a==b

In [None]:
c = ( a == b ) # we can store the result of a comparison in a variable of type 'bool'
print(c)
type(c)

In [None]:
print ( a > b )  # greater than
print ( a <= b ) # less than OR equal to

# strings

In [None]:
s = 'hello'
s

In [None]:
# len is a SUPER IMPORTANT built-in python FUNCTION that tells us the length of OBJECTs like strings
print(len(s))

# strings have TYPE str
type(s)

In [None]:
print (s) # print doesn't show the enclosing quotes

In [None]:
s = "ain't isn't a word" # double quotes work fine too and are useful for strings with single quotes
print (s)

In [None]:
s = "\"ain't\" isn't a word" # we can use backslash to escape quotes 
print(s)


In [None]:
s = "two-line\nstring" # \n is a newline
print (s)

**OBJECTS CAN HAVE FUNCTIONS "ATTACHED" TO THEM**

These special functions are called *methods*. To call a method, we use the '.' symbol
after the object name, like: `object.function()`

In [None]:
# create a new string object with the text 'hi there', assign the name 's' to point to it
s = 'hi there'

# call the upper() method of s. It returns an all-upper-case version of s
s.upper()

**INDEXING**

Access individual characters in a string using square braces ([]). **The numbering for indices starts at 0, not 1**. Negative indices count back from the end, with `-1` being the last element.

In [None]:
s='rhino'
print ('s[0]=',s[0]) # As we saw above, the print FUNCTION can take multiple ARGUMENTS separated by commas
print('s[2]=', s[2]) # the python language doesn't care about spacing around and within the parentheses
print('s[-2]=', s[-2]) # there are style suggestions here: https://www.python.org/dev/peps/pep-0008/
print('s[-1]=', s[-1]) # spaces after commas make it more readable

# indexing into a string returns a new string object containing the requested character
type(s[1])

**SLICING**

Strings can be *sliced* to generate sub-strings, as shown in the next cell.
Slicing can be applied to all sorts of python objects that are made of multiple
individual elements (strings, lists, tuples, ...)

In [None]:
s='peter pan'
print ( s[0:5] ) # 0 is the start; slice runs from first index up to but not including the second index
print ( len( s[2:-1]) ) # len tells us the length of objects like strings
print ( s[:5] ) # python assumes a missing first index is 0 (ie slice starts at the beginning)
print ( s[6:] ) # a missing second index means go all the way to the end

Strings can be *added* and *multiplied*

In [None]:
'hello'+' goodbye'

In [None]:
'whoah!'*3

Strings are immutable, ie they can't be changed

In [None]:
s = 'hello'
#s[0] = 'j' #this would give an error (try it)

If you want to change a string, make a new one

In [None]:
s = 'hello'
print(s)
s  = 'j' + s[1:]
print(s)

### Converting between different types of objects.

If we want to turn a string into an integer or vice versa, or more generally create an object of one type from an object of another, we can use the name of the desired TYPE as a FUNCTION and pass it the OBJECT of the other TYPE, and it will return a new OBJECT of the desired TYPE. Which sounds kind of complicated but is actually pretty simple:

In [None]:
# create integer object 
a = 4

# create string from 'a' -- see how this looks like we are calling a function called "str"
b = str(a)

b

In [None]:
# create string object 
a = '57'

# create integer from 'a' -- looks like we are calling the 'int' function
b = int(a)

b

### fstrings

python3 gave us a SUPER convenient way of creating nicely formatted strings from other objects called `f-strings`. You don't need to know the details but it's helpful to know that `f-strings` exist.

In [None]:
# f-strings start with an f before the quotes, and contain {} expressions that are replaced with variables
#  or other strings. The {} expressions can contain a ':' that is followed by cryptic instructions for
#  how python should format the thing before the ':'

name = 'Phil'
age = 29

message = f'My name is {name}, and I am {age} years old.' # yeah right
message

In [None]:
# f-strings are useful for formatting numeric values nicely, among many other things
# here the :5d tells python to make the length 5, by padding with whitespace if necessary
# and :9.2e says use scientific notation (e) with total length of 9 and 2 digits after the 
# decimal point
#
p_value = 0.000003545
num_trials = 154
print(f'number of trials: {num_trials:5d} P-value: {p_value:9.2e}')

p_value = 0.000243
num_trials = 12
print(f'number of trials: {num_trials:5d} P-value: {p_value:9.2e}')

p_value = 1.7456e-20
num_trials = 12043
print(f'number of trials: {num_trials:5d} P-value: {p_value:9.2e}')

# Practice time

In [None]:
# In one line, create the string 'elephant' by combining slices from the two strings a and b below 
# Try doing it two different ways, once with and once without negative indices

a = 'telephone'
b = 'santa'
#c = ?

In [None]:
# In one line, create a boolean variable that will be True if the dna sequence below 
#  preserves the reading frame when inserted into a transcript
#
# Use the function len() which returns the length of a string. ie, len('hello') returns the integer 5,
#  as well as the % operator introduced above.
#
dna_seq = 'TTATACGCGACTATCATATCGCCAGCCTTTGGAGTGTCAC'

#in_frame = ?


# lists

In [None]:
# I am in the habit of using the name 'l' for lists.
# unfortunately an 'l' looks a lot like a '1' -- apologies in advance!
#
l = [1,2,3,4]
print (l)

Like strings, lists are python objects that have some useful *methods* (functions attached to objects). To see these methods interactively, try creating a list called `l` and typing `l.` and then pressing tab.

By doing so you will see, for example, that `l` has a method called `append` which will add things to the end of the list:

In [None]:
# This shows that lists, unlike strings, are MUTABLE OBJECTS, which means that their contents
# can be modified during the lifetime of the object
#
l.append( 5 )
l

You can get information about a function or object in the notebook by typing `<name>?` and pressing enter. For example, try typing `l.append?` and pressing enter. This will work for functions you write in the future too, provided you write a little help message called a *docstring* at the beginning (more on this later).

In [None]:
l.append?


Note that you can also use tab-completion: `l.appe[TAB]` will complete to `l.append`

Lists can contain a mixture of types:

In [None]:
l = [1,2,3,4,5]
l.append( 'horse') # like strings
print(l)
l.append( [11,13,15] ) # or even other lists
print(l)

Indexing works pretty much the same for lists as for strings (but note that you *can* change elements of a list):

In [None]:
print('l[0]=',l[0])
print('l[-1]=',l[-1])
l[2] = 'pony'
print('now, l=',l)

Another very useful list method is `sort`

In [None]:
l = [3,2,11,5,10,-15]
l.sort()
print(l)

A useful `string` method is `split` -- it returns a `list` of strings generated by splitting the initial string, by default on whitespace but we can ask it to split on any string

In [None]:
s = 'hi   there tim' # note multiple spaces between hi and there
print(s.split()) # no whitespace in the output strings
print(s.split('t'))
print(s.split('the'))



Assignment (e.g., setting `x=y`) doesn't generally make a new copy of the object that `y` points to. Use slicing (`x=y[:]`) or explicit creation of a new object with the object typename (e.g. `x=list(y)`) to create a copy.

In [None]:
# create a new object of type list with the contents [1,2,3,4,5] and assign the name 'a' to point to it
a = [1,2,3,4,5]
print('a=',a)
b = a # create a new name 'b' and point it at the object that the name a points to
b[0] = 1000
print('b=',b)
print('a=',a) # a was changed, too
c = a[:] # slicing a list makes a copy
c[0] = 0
print('c=',c)
print('a=',a) # a still the same as before we modified c
d = list(a)
d[0] = -1
print('d=',d)
print('a=',a) # a still the same as before we modified d



# Practice time

In [None]:
# How many methods like "append" does a list have? Create a list and type l.[TAB] and count how many there are. 

#ans=?

In [None]:
# Can you find the string method that returns a lower-case copy of the string?
#
# Again, you can define a string s and then type s.[TAB] to see the method names.
# IMPORTANT: Note that this method does not change the starting string. strings are 'immutable' so their
# methods will not change them, rather they return new copies. 
#
# This is different from how many of the list methods behave.
#

In [None]:
# What is the difference between list.append() and list.extend()?
# First read the help messages for each using, for example, l.append?[ENTER]
# Then use trial and error by appending / extending different things onto a list and seeing what happens. 
# Does extend work with all input types?
#

In [None]:
# Create a new list called l2 which has the same elements as the list l1 below but in decreasing
#  order, using two list methods: one we've talked about and one we haven't
#
# Don't change l1 in the process! This will take a couple lines.

l1 = [5, 1, 1, 13, 9, 3, 18, 15, 10, 8, 15, 7, 15, 10, 1, 13, 4, 15, 18, 9, 7, 3, 19, 8, 0, 14, 11, 0, 15, 16]

#l2 = ... #


## OPTIONAL DIGRESSION: object introspection

In programming, introspection refers to the ability to investigate the type and/or other properties of objects at runtime. Remember that just about everything in python is an object! Two useful functions for introspection are `dir` and `id`.

`dir` shows all the methods (built-in functions) and attributes (accessible internal data) of an object.

`id` returns a unique identifier for the given object (for example, the location in memory where it is stored). No two objects will have the same `id`. `id` can be useful for figuring out whether two names point to the same underlying object.


In [None]:
# create list
l = [1,2]

# show attributes and methods of l
print(dir(l))

In [None]:
a = [4,5,6]
b = a # did we make a copy or not??
b.append(3)

print('id(a)=', id(a))
print('id(b)=', id(b))

# here we store the result of a comparison (checking whether two things are equal) in a boolean variable (ie, object)
the_same = id(a) == id(b)
print('a and b point to the same object:', the_same)


# flow of control: `for` loops
Python `for` loops let you repeat a block of code while changing the value of a LOOPING VARIABLE. The code to be repeated is INDENTED. This is a really important and distinctive thing about python: the formatting of the code, specifically the level of indenting, actually affects the syntax. Some people hate this. Some people love it. I like it and think it makes code more readable. But it takes some getting used to if you are coming from other languages like C/C++, java, etc. 

In [None]:
# a simple for-loop over the elements of the list

pets = ['dog', 'cat', 'goldfish']

# Below, 'pet' is the looping variable (name)
# it is equal to each of the elements of the list, in turn. First 'dog', then 'cat', then 'goldfish'
# In other words, the first time through the loop, pet is a name that points to the first object in the list
# the second time through the loop, pet points to the second object in the list. And so on.
#
for pet in pets:
    print('A', pet, 'is a great pet, if you happen to like pets.')

print('Those are the only pets I know.') # this only gets printed at the end; try indenting it and re-running...
    

In [None]:
#  range(num) gives us a loop from 0 to num-1
#  It's kind of like having a list of numbers running from 0 to num-1 (but not exactly the same)

range(5)
#list(range(5))

In [None]:
product = 1

## here i is the looping variable (name) that has a different value each time through the loop
## 
for i in range(10):
    print('2 to the',i,'power=',product)
    product *= 2
    
print('At the end, product=', product)
    

In [None]:
# Here we added an assert statement, which is super-important as a debugging tool as you start to write
#  more complex programs. assert will halt the execution of the program if the statement that follows is false.
#
# Also we introduce a more convenient way to get exponents, a**b = a to the power b 
#

product = 1
for i in range(10):
    print('2 to the',i,'power=',product)
    assert product == 2**i # assert is a very very very important and useful debugging statement
    product *= 2
    

In [None]:
# What happens when we run this version?

product = 1
for i in range(10):
    product *= 2
    print('2 to the',i,'power=',product)
    assert product == 2**i # assert is a very very very important and useful debugging statement
    

In [None]:
# The for statement works with strings, lists, and all kids of other python objects as you will see
message = 'FRED HUTCH'

for letter in message:
    print('Give me a/an',letter,'!')
    
# recall that the string method 'split' returns a list of strings we get by splitting the initial string on whitespace 
for word in message.split():
    print(word)

# `if` statements

In [None]:
# if statements let you choose between two (or more) outcomes based on a boolean expression 
#
for i in range(10):
    if i%2==0:
        print(i,'is even')
    else:
        print(i,'is odd')


In [None]:
# if statements let you choose between two (or more) outcomes based on a boolean expression 
#
# Notice what happens for 6:
#

for i in range(10):
    if i%2==0:
        print(i,'is even')
    elif i%3==0:
        print(i,'is divisible by 3')
    else:
        print(i,'is not')


## the `break` statement gets us out of a loop, while `continue` jumps directly to the next cycle through.

In [None]:
for i in range(1000):
    if i == 7:
        continue # 7 is a bad number

    print('3 to the',i,'power=',3**i)
    
    if i >= 9: # we don't actually want that much output
        break

## `for` is an incredible versatile statement: you can loop over almost any kind of object that has multiple items or components


In [None]:
s = 'string'
for a in s:
    print(a) #try using the optional end= argument
    
    
words = 'the quick brown fox jumps over the lazy dog'.split() # create a list by splitting a string


counter=0
# we can loop over the elements in a  list
for word in words:
    print('The',counter,'word is',word)
    counter +=1
    
# we can loop over a range of numbers
print('\nAgain,') # using \n to add some space
for i in range(len(words)):
    print('The',i,'word is',words[i])

# enumerate allows us to get the index and the element when we loop over a list
print('\nOnce more,')
for i,word in enumerate(words): ## NEW FUNCTION: enumerate
    print('The',i,'word is',word)
    
    

# `while` loops

`while` loops repeat a block of indented code as long as a boolean expression evaluates to `True`

In [None]:
# SUPER-HANDY TIP:
# you can assign two (or more) variables at the same time. This is called 'unpacking' in python.
a, b = 0, 1

while a<1000:
    print('a=',a)
    # Here unpacking saves us from having to use a temporary variable
    a, b = b, a+b


# Practice time

In [None]:
# Write a little program to generate the list of all the even, non-negative integers less than 100.
#
# Do this two ways,
#  1) by using if, %, and append inside a for loop up to 100
#  2) just by using append and a shorter loop (even numbers are all multiples of...)
#



In [None]:
# Write a little program to generate the reverse complement of the following nucleotide sequence
# Recall that A pairs with T and C pairs with G
# 
forward_dna_seq = 'TTATACGCGACTATCATATCGCCAGCCTTTGGAGTGTCAC'



In [None]:
# Write a little program to generate the list of all the prime numbers less than 100.
# Again, % is your friend here. You will probably need to do a for loop inside a for loop!
#

# Definining new functions

In [None]:
def update(a,b):
    """Generate a new element in the sequence based on the last two elements"""
    return a+b

a, b = 0, 1

while a <= 1000:
    print('a=', a)
    a, b = b, update(a,b)
        


Functions can have OPTIONAL ARGUMENTS whose DEFAULT VALUES are pre-specified in the function definition.

In [None]:
# here a_factor and b_factor are optional arguments to the update function
# if we call the function and we don't pass in values for them, the default value of 1 is used.
def update(a, b, a_factor=1, b_factor=1):
    """Generate a new element in the sequence based on the last two elements"""
    return a_factor * a + b_factor * b

a, b = 0, 1

while a <= 100:
    print('a=', a)
    a, b = b, update(a, b, a_factor=2)
        


Functions can call other functions, even themselves:

In [None]:
def factorial(n):
    """Calculate the factorial of a number recursively. Bad things will happen 
    if the number is negative or not an integer """
    if n==0:
        return 1
    else:
        return n * factorial(n-1) # this is called "recursion"
    
# the help function prints the docstring
help(factorial)

for i in range(10):
    print(i,'factorial =',factorial(i))

Try using `factorial?` and `factorial??` to see the docstring and source code of our new function. 

In [None]:
factorial??


# Using a function in a built-in module

Python has an extensive collection of built-in **modules** which contain all sorts of useful special purpose functions and objects. A few favorites:

* `math`: mathematical functions and constants
* `re`: regular expression searches.
* `os`: access to operating system routines (e.g., `os.path.exists` function to check if a file exists)
* `csv`: routines for reading/writing delimitted data
* `sys`: special variables used or maintained by the interpreter (`sys.path`, `sys.stdout`, `sys.argv`, ...)
* `random`: random number generators
* `glob`: file-finding functions with wild-cards (like using `*` on the command line)
* `pdb`: python debugger
* `timeit`: for profiling (timing) code snippets
* `tkinter`: python interface to tcl/tk graphics library
* `xml`: xml processing modules
* `string`: common string operations and constants


[The full list is here](https://docs.python.org/3/library/)

In [None]:
# Here we are "importing" a module called math which contains some math-y functions like sqrt and floor
import math


def is_prime( num ):
    """Figure out whether num is prime"""
    assert type(num) is int
    assert num >= 1
    
    max_factor = math.floor( math.sqrt( num ) )
    #max_factor = num-1
    
    for i in range(2,max_factor+1):
        if num%i == 0:
            return False
    return True

for i in range(1,50):
    if is_prime(i):
        print(i)
        
# look at the distribution of primes less than 1000

prime_counts_per_100 = [0]*10 # we need 10 bins for this

for i in range(1,1000):
    if is_prime(i):
        counts_bin = i//100
        prime_counts_per_100[ counts_bin ] += 1
        
print( 'prime_counts_per_100:',prime_counts_per_100 )

In [None]:
## try this to see how long the is_prime function takes (remove the #):
# %timeit is_prime(997)

In [None]:
def is_prime_v2( num, smaller_primes ):
    """Figure out if num is prime based on the set of smaller primes"""
    max_factor = math.floor( math.sqrt( num ) )
    for p in smaller_primes:
        if p>max_factor: break
        if num%p == 0:
            return False
    return True


# test the prime number theorem which says that the number of primes less than N
# asymptotically approaches N/log(N)
#

primes = []

for num in range(2,10000):
    if is_prime_v2( num, primes ):
        primes.append( num )
    if num%100==0:
        actual = len(primes)
        estimate = num / math.log(num)
        ratio = actual / estimate
        print( num, actual, estimate, ratio )


        
            


# `Dictionaries`
Python provides an associative mapping object called a `dictionary` to easily hold key-value pairs. 

In [None]:
base_partner = {'A':'T', 'T':'A', 'C':'G', 'G':'C'}
base_partner['A']

# we can assign new key,value pairs using [] notation:
base_partner['N'] = 'N'

print('base_partner=', base_partner)

In [None]:
num_hbonds = {'A':2,'T':2,'C':3,'G':3}
num_hbonds['A']


In [None]:
#num_hbonds['a']  # this gives an error: 'a' is not one of the keys.

In [None]:
base_partner = {'A':'T', 'T':'A', 'C':'G', 'G':'C'}
num_hbonds = {'A':2,'T':2,'C':3,'G':3}


def reverse_complement( seq ):
    """returns the reverse complement of a nucleic acid sequence"""
    rseq = ''
    for a in reversed( seq ):
        rseq += base_partner[a]
    return rseq

def total_hbonds( seq ):
    total=0
    for a in seq:
        total += num_hbonds[a]
    return total

fwd = 'ACGGTAATGATCCTCAG'
rev = reverse_complement( fwd )

print ('fwd=',fwd,'rev=',rev,'num_hbonds=',total_hbonds(fwd))

assert total_hbonds(fwd) == total_hbonds(rev)


We can loop over dictionaries quite easily

In [None]:
base_partner = {'A':'T', 'T':'A', 'C':'G', 'G':'C'}

for a in base_partner:
    print( a, 'pairs with', base_partner[a] )
    
print()

# Or this way to get the keys and values at the same time:

for a, a_partner in base_partner.items():
    print( a, 'pairs with', a_partner )
    

# `tuples`

Dictionary keys can be any *immutable* objects, where immutable objects are ones (like integers, floats, strings) that can't be changed after they are created. So lists are no good. But, there's something called a `tuple` which looks a lot like a list and can be used as a dictionary key, and in lots of other places where multiple items are passed around.

In [None]:
# create a tuple
t = (4,5,'a')

# create an empty dictionary
D = {}

# set the value 4 for the key t in the dictionary D
D[t] = 4

# print stuff
print('t=',t)
print('D=',D)

`tuples` can be **unpacked** by assigning to them a sequence of variables matching their length: 

In [None]:
nums = ( 4, 11, 17 )

a,b,c = nums

print('nums=',nums)
print('a=',a)
print('b=',b)
print('c=',c)

# running shell commands in notebooks and getting the results
We can run (most) shell commands from within the notebook by preceding them with `!`

In [None]:
!ls

In [None]:
!pwd

Even cooler, we can get the results of these commands back in python:

In [None]:
l = !pwd
print(l)
print(l[0].split('/'))

In [None]:
l = !find ../
print( len(l), l)