print("Hello" + " " + "world!")Hello world!
python programming language including variables, data types, and basic operations.
John Inston
April 2, 2026
April 3, 2026
python syntax and programming concepts.numpy arrays.Now that we have finished our set up we are ready to begin coding in python. We will start with the basics including variable assignment, data types and basic operations.
print() FunctionAs with every introduction to python programming, we first introduce the print() function which displays the output:
A variable is a named container for storing a value. In Python, we create one by assigning a value with =. Here we create three variables called name, age and height:
Alice is 25 years old and 5.6 feet tall.
Variable names should be lowercase with underscores for spaces (e.g. my_variable) and cannot start with a number (same as R).
Python has several built-in data types. The most common are:
| Type | Example | Description |
|---|---|---|
int |
42 |
Whole numbers |
float |
3.14 |
Decimal numbers |
str |
"hello" |
Text (string) |
bool |
True/False |
Boolean (true/false) |
<class 'int'>
<class 'float'>
<class 'str'>
<class 'bool'>
You can convert between types (explored further in our lab on data preparation):
7.0
7
Strings are sequences of characters. Python (unlike R) has a lot of built-in functionality for handling strings:
14
HELLO, PYTHON!
hello, python!
Hello, World!
H
Hello
Note that in python some functions are attached directly to a value or variable and called using dot notation: variable.method(). These are called methods - functions for a specific data type. For example upper() is the method on the string s. This is our first-look at object-oriented programming, something that we will explore as we advance in python.
The cleanest way to embed variables inside strings are f-strings (formatted-strings):
My name is Alice and I am 25 years old.
This is particularly helpful when combined with looping, for example to efficiently generate labels for multiple plots.
A list is an ordered, changeable collection of items. Lists can hold any data type, even mixed types (in this way they differ to vectors) and as such are very flexible for data storage. They are created using square brackets [...] with items separated by commas:
fruits = ["apple", "banana", "cherry"]
print(fruits[0]) # First item: "apple"
print(fruits[-1]) # Last item: "cherry"
print(len(fruits)) # Number of items: 3
# Modifying a list
fruits.append("mango") # Add to end
fruits.insert(1, "grape") # Insert at index 1
fruits.remove("banana") # Remove by value
print(fruits)apple
cherry
3
['apple', 'grape', 'cherry', 'mango']
To index items from a list numerically we again use square brackets (not curly braces { }). Note that (unlike R) python indexes start from 0. We can index from the end of the list using the minus signs:
We can slice a list, which means returning all elements before/after a certain point by using ::
A dictionary stores data as key-value pairs, like a real dictionary where you look up a work (key) to find its definition (value). These are defined with brackets \({...}\) where key values are strings:
Alice
25
{'name': 'Alice', 'age': 26, 'city': 'New York', 'email': 'alice@example.com'}
True
Python supports all standard arithmetic operations:
Branching is a fundamental concept in programming that allows you to control the flow of your program based on certain conditions. In Python, you can use if, elif, and else statements to create conditional logic:
It's a nice day.
Note here that unlike R branching statements have no parentheses or brackets, it has a much cleaner and more intuitive syntax. These The branching conditions are logical statements with a single True/False output. Python uses the standard programming syntax for logical operators:
| Operator | Meaning |
|---|---|
== |
Equal to |
!= |
Not equal to |
> |
Greater than |
< |
Less than |
>= |
Greater than or equal to |
<= |
Less than or equal to |
We note here another divergence from R in the existence of not, and and or rather than mathematical symbols. Python typically reads more like plain English.
No umbrella needed!
The list is empty.
Welcome, regular user!
Here are a few rules of thumb for when to use not:
not x instead of x == False.not items instead of len(items) == 0 to check for empty objects.not x in my_list - or better x not in my_list - to check for absence.A for loop iterates over a sequence. Similar to branching, we see that python looping syntax has no parentheses or braces:
Use range() to loop a specific number of times:
A while loop repeats as long as a condition is true:
Count is 0
Count is 1
Count is 2
Count is 3
Count is 4
Done!
Example: Write a for loop that runs through all of the integers from 1 to 100 and appends them to two lists: 1. nine_multiples if the number is divisible by 9. 2. sevens_list if the number contains a 7.
A function is a reusable block of code. You define it once and call it as many times as you like. The syntax is very simple:
Functions can send a value back using return:
You can set default values for parameters:
Hello, Alice!
Good morning, Bob!
Example: Define a function called compute_radius() that computes the radius r of a sphere for some specified volume V. Recall that the volume \(V\) of a sphere of radius \(r\) is given by \[
V = \frac{4}{3}\pi r^3.
\]
Example: Write a function summation that evaluates the following summation for \(n \geq 1\): \[\sum_{i=1}^{n} \left(i^3 + 5i\right)\]
In this course, we will be using common Python libraries to help us process data. By convention, we import all libraries at the very top of the notebook. There are also a set of standard aliases that are used to shorten the library names. Below are some of the libraries that you may encounter throughout the course, along with their respective aliases.
A useful magic command is %%time, which times the execution of that cell. You can use this by writing it as the first line of a cell. (Note that %% is used for cell magic commands that apply to the entire cell, whereas % is used for line magic commands that only apply to a single line. If you are interested, you can read more about the magic commands in this Tutorials Point article).
CPU times: user 14 μs, sys: 0 ns, total: 14 μs
Wall time: 19.1 μs
Example: Complete summation(n) in the next cell (same formula as above), then use it for the values of n below.
Use your function to compute the sum for…
In Python, normally you can fill a list with elements using a for loop as seen in the example below.
[4, 64, 144, 324, 484, 784, 1024, 1444, 1764, 2304, 2704, 3364, 3844, 4624, 5184, 6084, 6724, 7744, 8464, 9604]
Alternatively, you can create this same list in a single line of code by moving the for-loop and if-statement inside the loop’s creation. This is called a list comprehension.
The syntax for a list comprehension is this: [value for-loop condition] * value is the value you want to put into the list * for-loop is the for-loop that iterates through a list or a range * condition is the if-statement that determines if the value is allowed to be inserted into the list
For more information, you can read this beginner’s tutorial on list comprehensions
zip()If you need to line up 2 or more lists, Python has a built-in function called zip(). This allows you to loop through more than one list at a time such that you get values in each list that have the same index in the other lists. This can also be used within for-loops to make them even more powerful.
2
6
12
20
Example: Write a function list_sum that computes the square of each value in list_1, the cube of each value in list_2, and returns a list containing the element-wise sum of these results. Assume that list_1 and list_2 have the same number of elements. Try to use a list comprehension to write it all on one line. I have started the function for you below:
To test this function we define two lists and call the function:
numpy PackageNumPy (pronounced “NUM-pie”) is the numerical computing module, which we will be using a lot in this course. Here’s a quick recap of NumPy. For more review, read the following materials.
The core of NumPy is the array. Like Python lists, arrays store data; however, they store data in a more efficient manner. In many cases, this allows for faster computation and data manipulation.
Let’s use np.array to create an array. It takes a sequence, such as a list or range (remember that list elements are included between the square brackets [ and ]).
Example: Create an array arr containing the values 1, 2, 3, 4, and 5 (in that order).
In addition to values in the array, we can access attributes such as array’s shape and data type. A full list of attributes can be found here.
NumPy arrays are integer-indexed by position, with the first element indexed as position 0. Elements can be retrieved by enclosing the desired positions in brackets [].
To retrieve consecutive positions, specify the starting index and the ending index separated by : – e.g., arr[from:to]. This syntax is non-inclusive of the left endpoint; notice below that the starting index is not included in the output.
NumPy arrays have several attributes that can be retrieved by name using syntax of the form arr.attr. Some useful attributes are:
.shape, a tuple with the length of each array dimension.size, the length of the first array dimension.dtype, the data type of the entries (float, integer, etc.)Arrays, unlike Python lists, cannot store items of different data types.
array(['1', '3'], dtype='<U21')
Arrays are also useful in performing vectorized operations. Given two or more arrays of equal length, arithmetic will perform element-wise computations across the arrays.
For example, observe the following:
array([5, 7, 9])
Example: Given the array random_arr, assign valid_values to an array containing all values \(x\) of random_arr such that \(2x^4 > 1\).
Example: Recreate the list_sum function using numpy arrays, calling your new function array_sum.
You might have been told that Python is slow, but array arithmetic is carried out very fast, even for large arrays.
For ten numbers, list_sum and array_sum both take a similar amount of time.
CPU times: user 7 μs, sys: 0 ns, total: 7 μs
Wall time: 9.06 μs
[0, 2, 12, 36, 80, 150, 252, 392, 576, 810]
CPU times: user 25 μs, sys: 20 μs, total: 45 μs
Wall time: 48.9 μs
array([ 0, 2, 12, 36, 80, 150, 252, 392, 576, 810])
The time difference seems negligible for a list/array of size 10; depending on your setup, you may even observe that list_sum executes faster than array_sum! However, we will commonly be working with much larger datasets:
CPU times: user 20.6 ms, sys: 3.61 ms, total: 24.2 ms
Wall time: 24.1 ms
CPU times: user 32.9 ms, sys: 952 μs, total: 33.8 ms
Wall time: 33.7 ms
With the larger dataset, we see that using NumPy results in code that executes over 50 times faster! Throughout this course (and in the real world), you will find that writing efficient code will be important; arrays and vectorized operations are the most common way of making Python programs run quickly.
np.arange and np.linspaceUsually we use np.arange to return an array that steps from a to b with a fixed step size s. While this is fine in some cases, we sometimes prefer to use np.linspace(a, b, N), which divides the interval [a, b] into N equally spaced points.
np.arange(start, stop, step) produces an array with all the numbers starting at start, incremented up by step, stopping before stop is reached. For example, the value of np.arange(1, 6, 2) is an array with elements 1, 3, and 5 – it starts at 1 and counts up by 2, then stops before 6. np.arange(4, 9, 1) is an array with elements 4, 5, 6, 7, and 8. (It doesn’t contain 9 because np.arange stops before the stop value is reached.)
np.linspace always includes both end points while np.arange will not include the second end point b. For this reason, especially when we are plotting ranges of values we tend to prefer np.linspace.
Notice how the following two statements have different parameters but return the same result.
Arrays are not limited to one dimension (i.e. vectors) but can also be used to define matrices. For example, we define a \(3\times 3\) matrix A and a \(3\times 1\) vector b (the right-hand side of a linear system).
Elementwise multiplication can be performed with * as normal and matrix multiplication by @:
[[ 4. 1. 9.]
[16. 9. 4.]
[ 1. 4. 4.]]
[[11. 11. 14.]
[22. 17. 22.]
[12. 11. 11.]]
[192. 326. 178.]
As one final example of just some of the functionality of numpy we demonstrate its ability to solve linear systems. Consider the following system of linear equations:
\[ \begin{aligned} 2x + y + 3z & = 34 \\ 4x + 3y + 3z & = 46 \\ x + 2y + 2z & = 26. \end{aligned} \]
To find solutions \(x\), \(y\) and \(z\) we can use the np.linalg.solve function:
A = np.array([[2, 1, 3],
[4, 3, 2],
[1, 2, 2]], dtype=float)
b = np.array([34, 46, 26], dtype=float)
# Task 2
x = np.linalg.solve(A, b)
print(f"x: {x[0]:.1f}, y: {x[1]:.1f}, z: {x[2]:.1f}")
# Task 3
print(f"Verification passed: {np.allclose(A @ x, b)}")
# Bonus
x_inv = np.linalg.inv(A) @ b
print(f"Via inverse — x: {x_inv[0]:.1f}, y: {x_inv[1]:.1f}, z: {x_inv[2]:.1f}")x: 5.4, y: 3.8, z: 6.5
Verification passed: True
Via inverse — x: 5.4, y: 3.8, z: 6.5
This note is only scratching the surface of what we can do with python but thankfully, you never need to remember everything! Part of being a data scientist is becoming efficient at reading and understanding new code quickly. Never prioritize memorizing code, prioritize writing code and, more importantly, using code to solve problems. Programming is a tool you sharpen through application!