PY-L4-26 · Pandas — Selecting Rows & Columns

Learning Goals

3 min

Pick one or many columns with df["col"] and df[["a", "b"]].
Pick rows by label with df.loc[row]; ranges with df.loc[a:b] (inclusive!).
Pick rows by integer position with df.iloc[i] and df.iloc[a:b] (exclusive).
Combine row + column picks in one expression.

Warm-Up · Series vs DataFrame

5 min

import pandas as pd
df = pd.read_csv("students.csv")

print(type(df["name"]))       # → <class 'pandas.core.series.Series'>
print(type(df[["name"]]))     # → <class 'pandas.core.frame.DataFrame'>

Single brackets give back a 1-D Series. Double brackets (a list of columns) give back a DataFrame — even if the list has one element. Two different types; pick the one your next step expects.

Today's big idea

loc = labels, iloc = integers. Memorise the prefix and you stop confusing the two. loc is also the one you assign to — the source of pandas's "chained-assignment" warning.

New Concept · Columns, .loc, .iloc

14 min

Columns

df["name"]              # one column → Series
df[["name", "score"]]   # two columns → DataFrame

.loc — by label

df.loc[0]                  # row with label 0 (Series)
df.loc[0:2]                # rows 0, 1, 2 — INCLUSIVE
df.loc[0, "score"]         # one cell
df.loc[:, "score"]         # every row, score column
df.loc[0:2, ["name", "score"]]   # block

# loc accepts boolean masks too (next lesson)
df.loc[df["score"] > 80]

.iloc — by integer position

df.iloc[0]                 # first row
df.iloc[0:2]               # rows 0 and 1 — EXCLUSIVE (like normal Python slicing)
df.iloc[-1]                # last row
df.iloc[0, 2]              # row 0, column 2
df.iloc[:, 1:3]            # every row, columns 1-2

Set the index for clean .loc work

By default the index is 0, 1, 2... .set_index swaps in a meaningful column:

students = df.set_index("name")
print(students.loc["Aisyah"])
print(students.loc["Aisyah", "score"])

Edit safely — always via .loc

# ✗ Bad — pandas may warn about chained assignment
df[df["name"] == "Aisyah"]["score"] = 100

# ✓ Good
df.loc[df["name"] == "Aisyah", "score"] = 100

If you only remember one rule from this lesson: when assigning, use .loc.

Worked Example · Slice a Bigger Frame

12 min

import pandas as pd

df = pd.read_csv("clean.csv", parse_dates=["date"])

# 1. The first three rows, just two columns
print(df.loc[:2, ["product", "quantity"]])

# 2. Every row of one column as a Series
total_qty = df["quantity"].sum()
print(f"total quantity: {total_qty}")

# 3. By position — middle row, every column
print(df.iloc[len(df) // 2])

# 4. Set a meaningful index, then look someone up by name
by_customer = df.set_index("customer")
print(by_customer.loc["Ahmad"])

# 5. Multi-row + column block via .loc with a label-based slice
print(by_customer.loc["Ahmad":"Mei", ["product", "price"]])

Read aloud

1. "first three rows, product and quantity columns"
2. "every quantity, then sum"
3. "the middle row's full record"
4. "the row labelled Ahmad"
5. "rows from Ahmad through Mei, product and price"

Read the diff

Five lines, five distinct slices. Notice how loc is inclusive at both ends — 0:2 gives rows 0, 1, AND 2. That trips up everyone once.

Try It Yourself

13 min

01 🟢 First column, last 3 rows

Return the first column and the last 3 rows in one expression.

Hint

df.iloc[-3:, 0]

02 🟡 Two columns, three named rows

Set the index to the customer column. Print Ahmad's and Mei's product and price.

Hint

bc = df.set_index("customer")
print(bc.loc[["Ahmad", "Mei"], ["product", "price"]])

03 🔴 Safe assignment

Add a new column tax equal to 6% of price. Then bump every row where product == "Nasi" to a tax rate of 8% — using .loc properly.

Hint

df["tax"] = df["price"] * 0.06
df.loc[df["product"] == "Nasi", "tax"] = df.loc[df["product"] == "Nasi", "price"] * 0.08
print(df[["product", "price", "tax"]])

Mini-Challenge · Re-Order the Frame

8 min

Build a one-line expression that returns the DataFrame with columns in a custom order: customer, product, quantity, price, total, date, order_id. (Compute total first if it doesn't exist.)

Show one possible solution

df["total"] = df["quantity"] * df["price"]
df = df[["customer", "product", "quantity", "price", "total", "date", "order_id"]]

Selecting with a column-name list returns a new DataFrame in that order. The simplest re-ordering trick in pandas.

Recap

3 min

Three notations, three roles. df["col"] picks columns. df.loc picks by label (inclusive). df.iloc picks by integer position (exclusive). Always assign via .loc. Tomorrow we filter rows with boolean masks.

Homework

4 min

Take your real-CSV from yesterday. Produce five slices:

Two columns only.
Every row, but in reverse column order.
The first 10 rows of a specific column.
A single cell by label.
A single cell by integer position.

Write the line and the one-sentence English meaning side by side. Submit a markdown file.

df[["customer", "product"]]            # 2 cols
df.iloc[:, ::-1]                       # columns reversed
df.loc[:9, "price"]                    # first 10 rows of price
df.set_index("customer").loc["Mei", "product"]   # cell by label
df.iat[2, 3]                           # cell by int position (faster than iloc[r,c])

df.loc[0] # row with label 0 (Series) df.loc[0:2] # rows 0, 1, 2 — INCLUSIVE df.loc[0, "score"] # one cell df.loc[:, "score"] # every row, score column df.loc[0:2, ["name", "score"]] # block # loc accepts boolean masks too (next lesson) df.loc[df["score"] > 80]

df.iloc[0] # first row df.iloc[0:2] # rows 0 and 1 — EXCLUSIVE (like normal Python slicing) df.iloc[-1] # last row df.iloc[0, 2] # row 0, column 2 df.iloc[:, 1:3] # every row, columns 1-2

import pandas as pd df = pd.read_csv("clean.csv", parse_dates=["date"]) # 1. The first three rows, just two columns print(df.loc[:2, ["product", "quantity"]]) # 2. Every row of one column as a Series total_qty = df["quantity"].sum() print(f"total quantity: {total_qty}") # 3. By position — middle row, every column print(df.iloc[len(df) // 2]) # 4. Set a meaningful index, then look someone up by name by_customer = df.set_index("customer") print(by_customer.loc["Ahmad"]) # 5. Multi-row + column block via .loc with a label-based slice print(by_customer.loc["Ahmad":"Mei", ["product", "price"]])