Python Fundamentals

Learn the fundamentals of the python programming language including variables, data types, and basic operations.
Author

John Inston

Published

April 2, 2026

Modified

April 3, 2026

ImportantAims
  • Understand basic python syntax and programming concepts.
  • Python variables and data types.
  • Branching and iteration.
  • numpy arrays.

Python Fundamentals

Now that we have finished our set up we are ready to begin coding in python. We will start with the basics including variable assignment, data types and basic operations.

Variables and Assignment

A variable is a named container for storing a value. In Python, we create one by assigning a value with =. Here we create three variables called name, age and height:

name = "Alice"
age = 25
height = 5.6

print(name, "is", age, "years old and", height, "feet tall.")
Alice is 25 years old and 5.6 feet tall.

Variable names should be lowercase with underscores for spaces (e.g. my_variable) and cannot start with a number (same as R).

Python Data Types

Python has several built-in data types. The most common are:

Type Example Description
int 42 Whole numbers
float 3.14 Decimal numbers
str "hello" Text (string)
bool True/False Boolean (true/false)
x = 42           # int
pi = 3.14159     # float
greeting = "Hi"  # str
is_fun = True    # bool

# Check the type of any variable with type()
print(type(x))
print(type(pi))
print(type(greeting))
print(type(is_fun))
<class 'int'>
<class 'float'>
<class 'str'>
<class 'bool'>

You can convert between types (explored further in our lab on data preparation):

number = 7
as_float = float(number)
as_string = str(number)

print(as_float)   # 7.0
print(as_string)  # "7"
7.0
7

Strings

Strings are sequences of characters. Python (unlike R) has a lot of built-in functionality for handling strings:

s = "Hello, Python!"

print(len(s))          # Length of the string
print(s.upper())       # All uppercase
print(s.lower())       # All lowercase
print(s.replace("Python", "World"))  # Replace a word
print(s[0])            # First character: "H"
print(s[0:5])          # Slice: "Hello"
14
HELLO, PYTHON!
hello, python!
Hello, World!
H
Hello

Note that in python some functions are attached directly to a value or variable and called using dot notation: variable.method(). These are called methods - functions for a specific data type. For example upper() is the method on the string s. This is our first-look at object-oriented programming, something that we will explore as we advance in python.

The cleanest way to embed variables inside strings are f-strings (formatted-strings):

name = "Alice"
age = 25

message = f"My name is {name} and I am {age} years old."
print(message)
My name is Alice and I am 25 years old.

This is particularly helpful when combined with looping, for example to efficiently generate labels for multiple plots.

Lists

A list is an ordered, changeable collection of items. Lists can hold any data type, even mixed types (in this way they differ to vectors) and as such are very flexible for data storage. They are created using square brackets [...] with items separated by commas:

fruits = ["apple", "banana", "cherry"]

print(fruits[0])    # First item: "apple"
print(fruits[-1])   # Last item: "cherry"
print(len(fruits))  # Number of items: 3

# Modifying a list
fruits.append("mango")      # Add to end
fruits.insert(1, "grape")   # Insert at index 1
fruits.remove("banana")     # Remove by value

print(fruits)
apple
cherry
3
['apple', 'grape', 'cherry', 'mango']

To index items from a list numerically we again use square brackets (not curly braces { }). Note that (unlike R) python indexes start from 0. We can index from the end of the list using the minus signs:

print(fruits[0])
print(fruits[-1])
apple
mango

We can slice a list, which means returning all elements before/after a certain point by using ::

numbers = [0, 1, 2, 3, 4, 5]

print(numbers[1:4])   # [1, 2, 3]
print(numbers[:3])    # [0, 1, 2]
print(numbers[3:])    # [3, 4, 5]
print(numbers[::2])   # Every other item: [0, 2, 4]
[1, 2, 3]
[0, 1, 2]
[3, 4, 5]
[0, 2, 4]

Dictionaries

A dictionary stores data as key-value pairs, like a real dictionary where you look up a work (key) to find its definition (value). These are defined with brackets \({...}\) where key values are strings:

person = {
    "name": "Alice",
    "age": 25,
    "city": "New York"
}

print(person["name"])   # Access by key
print(person["age"])

# Add or update a key
person["email"] = "alice@example.com"
person["age"] = 26

print(person)

# Check if a key exists
print("city" in person)   # True
Alice
25
{'name': 'Alice', 'age': 26, 'city': 'New York', 'email': 'alice@example.com'}
True

Basic Math

Python supports all standard arithmetic operations:

a = 10
b = 3

print(a + b)   # Addition:       13
print(a - b)   # Subtraction:    7
print(a * b)   # Multiplication: 30
print(a / b)   # Division:       3.333...
print(a // b)  # Floor division: 3  (rounds down)
print(a % b)   # Modulo:         1  (remainder)
print(a ** b)  # Exponent:       1000
13
7
30
3.3333333333333335
3
1
1000

Branching

Branching is a fundamental concept in programming that allows you to control the flow of your program based on certain conditions. In Python, you can use if, elif, and else statements to create conditional logic:

temperature = 72

if temperature > 90:
    print("It's very hot outside!")
elif temperature > 65:
    print("It's a nice day.")
elif temperature > 40:
    print("It's a bit chilly.")
else:
    print("It's cold outside!")
It's a nice day.

Note here that unlike R branching statements have no parentheses or brackets, it has a much cleaner and more intuitive syntax. These The branching conditions are logical statements with a single True/False output. Python uses the standard programming syntax for logical operators:

Operator Meaning
== Equal to
!= Not equal to
> Greater than
< Less than
>= Greater than or equal to
<= Less than or equal to
x = 10
print(x == 10)  # True
print(x != 5)   # True
print(x > 20)   # False
True
True
False

We note here another divergence from R in the existence of not, and and or rather than mathematical symbols. Python typically reads more like plain English.

is_raining = False

# Preferred
if not is_raining:
    print("No umbrella needed!")

# Also common: checking if a list is empty
items = []

if not items:
    print("The list is empty.")

# Combining with and / or
logged_in = True
is_admin = False

if logged_in and not is_admin:
    print("Welcome, regular user!")
No umbrella needed!
The list is empty.
Welcome, regular user!

Here are a few rules of thumb for when to use not:

  • Use not x instead of x == False.
  • Use not items instead of len(items) == 0 to check for empty objects.
  • Use not x in my_list - or better x not in my_list - to check for absence.

Iteration

A for loop iterates over a sequence. Similar to branching, we see that python looping syntax has no parentheses or braces:

fruits = ["apple", "banana", "cherry"]

for fruit in fruits:
    print(fruit)
apple
banana
cherry

Use range() to loop a specific number of times:

for i in range(5):
    print(f"Step {i}")
Step 0
Step 1
Step 2
Step 3
Step 4

A while loop repeats as long as a condition is true:

count = 0

while count < 5:
    print(f"Count is {count}")
    count += 1

print("Done!")
Count is 0
Count is 1
Count is 2
Count is 3
Count is 4
Done!

Example: Write a for loop that runs through all of the integers from 1 to 100 and appends them to two lists: 1. nine_multiples if the number is divisible by 9. 2. sevens_list if the number contains a 7.

# Solution

Functions

A function is a reusable block of code. You define it once and call it as many times as you like. The syntax is very simple:

def greet(name):
    print(f"Hello, {name}!")

greet("Alice")
greet("Bob")
Hello, Alice!
Hello, Bob!

Functions can send a value back using return:

def add(a, b):
    return a + b

result = add(3, 7)
print(result)  # 10
10

You can set default values for parameters:

def greet(name, greeting="Hello"):
    print(f"{greeting}, {name}!")

greet("Alice")              # Uses default: "Hello, Alice!"
greet("Bob", "Good morning")  # Overrides default
Hello, Alice!
Good morning, Bob!

Example: Define a function called compute_radius() that computes the radius r of a sphere for some specified volume V. Recall that the volume \(V\) of a sphere of radius \(r\) is given by \[ V = \frac{4}{3}\pi r^3. \]

# Solution

Example: Write a function summation that evaluates the following summation for \(n \geq 1\): \[\sum_{i=1}^{n} \left(i^3 + 5i\right)\]

# Solution

Importing Libraries and Magic Commands

In this course, we will be using common Python libraries to help us process data. By convention, we import all libraries at the very top of the notebook. There are also a set of standard aliases that are used to shorten the library names. Below are some of the libraries that you may encounter throughout the course, along with their respective aliases.

import pandas as pd
import numpy as np

A useful magic command is %%time, which times the execution of that cell. You can use this by writing it as the first line of a cell. (Note that %% is used for cell magic commands that apply to the entire cell, whereas % is used for line magic commands that only apply to a single line. If you are interested, you can read more about the magic commands in this Tutorials Point article).

%%time

lst = []
for i in range(100):
    lst.append(i)
CPU times: user 14 μs, sys: 0 ns, total: 14 μs
Wall time: 19.1 μs

Example: Complete summation(n) in the next cell (same formula as above), then use it for the values of n below.

def summation(n):
    """Compute sum of (i**3 + 5*i) for i from 1 to n (inclusive)."""
    ...

Use your function to compute the sum for…

# n = 2
...
# n = 20
...

List comprehension

In Python, normally you can fill a list with elements using a for loop as seen in the example below.

squares = []
# Add square numbers from 1 to 100 inclusive to the list "squares" if they end in the digit "4"
for i in range(1,101):
    if (i**2)%10 == 4:
        squares.append(i**2)
print(squares)
[4, 64, 144, 324, 484, 784, 1024, 1444, 1764, 2304, 2704, 3364, 3844, 4624, 5184, 6084, 6724, 7744, 8464, 9604]

Alternatively, you can create this same list in a single line of code by moving the for-loop and if-statement inside the loop’s creation. This is called a list comprehension.

The syntax for a list comprehension is this: [value for-loop condition] * value is the value you want to put into the list * for-loop is the for-loop that iterates through a list or a range * condition is the if-statement that determines if the value is allowed to be inserted into the list

For more information, you can read this beginner’s tutorial on list comprehensions

# value: i**2
# for-loop: for i in range(1,101)
# condition: if (i**2)%10 == 4
squares_list_comprehension = [i**2 for i in range(1,101) if (i**2)%10 == 4]

Aligning lists with zip()

If you need to line up 2 or more lists, Python has a built-in function called zip(). This allows you to loop through more than one list at a time such that you get values in each list that have the same index in the other lists. This can also be used within for-loops to make them even more powerful.

a = [1,2,3,4]
b = [2,3,4,5]
# Print a[0]*b[0], a[1]*b[1],...
for x, y in zip(a,b):
    print(x*y)
2
6
12
20

Example: Write a function list_sum that computes the square of each value in list_1, the cube of each value in list_2, and returns a list containing the element-wise sum of these results. Assume that list_1 and list_2 have the same number of elements. Try to use a list comprehension to write it all on one line. I have started the function for you below:

# SOLUTION

def list_sum(list_1, list_2):
    """Compute x^2 + y^3 for each x, y in list_1, list_2. 
    
    Assume list_1 and list_2 have the same length.
    """
    assert len(list_1) == len(list_2), "both args must have the same number of elements"
    ...

    return [x**2 + y**3 for x, y in zip(list_1, list_2)]

To test this function we define two lists and call the function:

list_1 = [1, 2, 3, 4, 5]
list_2 = [6, 7, 8, 9, 10]
print(list_sum(list_1, list_2))
[217, 347, 521, 745, 1025]

numpy Package

NumPy (pronounced “NUM-pie”) is the numerical computing module, which we will be using a lot in this course. Here’s a quick recap of NumPy. For more review, read the following materials.

Arrays

The core of NumPy is the array. Like Python lists, arrays store data; however, they store data in a more efficient manner. In many cases, this allows for faster computation and data manipulation.

Let’s use np.array to create an array. It takes a sequence, such as a list or range (remember that list elements are included between the square brackets [ and ]).

Example: Create an array arr containing the values 1, 2, 3, 4, and 5 (in that order).

arr = np.array([1, 2, 3, 4, 5])

In addition to values in the array, we can access attributes such as array’s shape and data type. A full list of attributes can be found here.

Indexing

NumPy arrays are integer-indexed by position, with the first element indexed as position 0. Elements can be retrieved by enclosing the desired positions in brackets [].

arr[3]
4

To retrieve consecutive positions, specify the starting index and the ending index separated by :e.g., arr[from:to]. This syntax is non-inclusive of the left endpoint; notice below that the starting index is not included in the output.

arr[2:4]
array([3, 4])

Attributes

NumPy arrays have several attributes that can be retrieved by name using syntax of the form arr.attr. Some useful attributes are:

  • .shape, a tuple with the length of each array dimension
  • .size, the length of the first array dimension
  • .dtype, the data type of the entries (float, integer, etc.)
arr.shape
(5,)
arr.size
5
arr.dtype
dtype('int64')

Arrays, unlike Python lists, cannot store items of different data types.

# A regular Python list can store items of different data types
[1, '3']
[1, '3']
# Arrays will convert everything to the same data type
np.array([1, '3'])
array(['1', '3'], dtype='<U21')
# Another example of array type conversion
np.array([5, 8.3])
array([5. , 8.3])

Operations on arrays

Arrays are also useful in performing vectorized operations. Given two or more arrays of equal length, arithmetic will perform element-wise computations across the arrays.

For example, observe the following:

# Python list addition will concatenate the two lists
[1, 2, 3] + [4, 5, 6]
[1, 2, 3, 4, 5, 6]
# NumPy array addition will add them element-wise
np.array([1, 2, 3]) + np.array([4, 5, 6])
array([5, 7, 9])

Example: Given the array random_arr, assign valid_values to an array containing all values \(x\) of random_arr such that \(2x^4 > 1\).

# for reproducibility - setting the seed will result in the same random draw each time
np.random.seed(67)

# draw uniformly random integers between 1 and 60
random_arr = np.random.rand(60)

# solution
valid_values = random_arr[2*random_arr**4 > 1]

Example: Recreate the list_sum function using numpy arrays, calling your new function array_sum.

def array_sum(array_1, array_2):
    return np.array([x**2 + y**3 for x, y in zip(array_1, array_2)])

You might have been told that Python is slow, but array arithmetic is carried out very fast, even for large arrays.

For ten numbers, list_sum and array_sum both take a similar amount of time.

sample_list_1 = list(range(10))
sample_array_1 = np.arange(10)
%%time
list_sum(sample_list_1, sample_list_1)
CPU times: user 7 μs, sys: 0 ns, total: 7 μs
Wall time: 9.06 μs
[0, 2, 12, 36, 80, 150, 252, 392, 576, 810]
%%time
array_sum(sample_array_1, sample_array_1)
CPU times: user 25 μs, sys: 20 μs, total: 45 μs
Wall time: 48.9 μs
array([  0,   2,  12,  36,  80, 150, 252, 392, 576, 810])

The time difference seems negligible for a list/array of size 10; depending on your setup, you may even observe that list_sum executes faster than array_sum! However, we will commonly be working with much larger datasets:

sample_list_2 = list(range(100000))
sample_array_2 = np.arange(100000)
%%time
list_sum(sample_list_2, sample_list_2)
; # The semicolon hides the output
CPU times: user 20.6 ms, sys: 3.61 ms, total: 24.2 ms
Wall time: 24.1 ms
%%time
array_sum(sample_array_2, sample_array_2)
;
CPU times: user 32.9 ms, sys: 952 μs, total: 33.8 ms
Wall time: 33.7 ms

With the larger dataset, we see that using NumPy results in code that executes over 50 times faster! Throughout this course (and in the real world), you will find that writing efficient code will be important; arrays and vectorized operations are the most common way of making Python programs run quickly.

A note on np.arange and np.linspace

Usually we use np.arange to return an array that steps from a to b with a fixed step size s. While this is fine in some cases, we sometimes prefer to use np.linspace(a, b, N), which divides the interval [a, b] into N equally spaced points.

np.arange(start, stop, step) produces an array with all the numbers starting at start, incremented up by step, stopping before stop is reached. For example, the value of np.arange(1, 6, 2) is an array with elements 1, 3, and 5 – it starts at 1 and counts up by 2, then stops before 6. np.arange(4, 9, 1) is an array with elements 4, 5, 6, 7, and 8. (It doesn’t contain 9 because np.arange stops before the stop value is reached.)

np.linspace always includes both end points while np.arange will not include the second end point b. For this reason, especially when we are plotting ranges of values we tend to prefer np.linspace.

Notice how the following two statements have different parameters but return the same result.

np.arange(-5, 6, 1.0)
array([-5., -4., -3., -2., -1.,  0.,  1.,  2.,  3.,  4.,  5.])
np.linspace(-5, 5, 11)
array([-5., -4., -3., -2., -1.,  0.,  1.,  2.,  3.,  4.,  5.])

Matrices

Arrays are not limited to one dimension (i.e. vectors) but can also be used to define matrices. For example, we define a \(3\times 3\) matrix A and a \(3\times 1\) vector b (the right-hand side of a linear system).

A = np.array([[2, 1, 3],
              [4, 3, 2],
              [1, 2, 2]], dtype=float)
              
b = np.array([34, 46, 26], dtype=float)

Elementwise multiplication can be performed with * as normal and matrix multiplication by @:

# Computations with matrices
print(A*A)  
print(A@A)
print(A@b)
[[ 4.  1.  9.]
 [16.  9.  4.]
 [ 1.  4.  4.]]
[[11. 11. 14.]
 [22. 17. 22.]
 [12. 11. 11.]]
[192. 326. 178.]

As one final example of just some of the functionality of numpy we demonstrate its ability to solve linear systems. Consider the following system of linear equations:

\[ \begin{aligned} 2x + y + 3z & = 34 \\ 4x + 3y + 3z & = 46 \\ x + 2y + 2z & = 26. \end{aligned} \]

To find solutions \(x\), \(y\) and \(z\) we can use the np.linalg.solve function:

A = np.array([[2, 1, 3],
              [4, 3, 2],
              [1, 2, 2]], dtype=float)

b = np.array([34, 46, 26], dtype=float)

# Task 2
x = np.linalg.solve(A, b)
print(f"x: {x[0]:.1f}, y: {x[1]:.1f}, z: {x[2]:.1f}")

# Task 3
print(f"Verification passed: {np.allclose(A @ x, b)}")

# Bonus
x_inv = np.linalg.inv(A) @ b
print(f"Via inverse — x: {x_inv[0]:.1f}, y: {x_inv[1]:.1f}, z: {x_inv[2]:.1f}")
x: 5.4, y: 3.8, z: 6.5
Verification passed: True
Via inverse — x: 5.4, y: 3.8, z: 6.5

Parting Message

This note is only scratching the surface of what we can do with python but thankfully, you never need to remember everything! Part of being a data scientist is becoming efficient at reading and understanding new code quickly. Never prioritize memorizing code, prioritize writing code and, more importantly, using code to solve problems. Programming is a tool you sharpen through application!

Back to top