Learning Goals 5 min
Before we use ML on a Nano 33 BLE, we need the vocabulary. Just enough to discuss training, classification, neural networks, and overfitting — without scary maths. By the end of this lesson you will:
- Define feature, label, training, inference, model.
- Describe a simple neural network (input layer → hidden layer → output layer) with one analogy each.
- Explain three common ML failure modes: underfitting, overfitting, training-on-bad-data.
Warm-Up 10 min
No hardware. Just paper.
The classic ML problem
Imagine a basket of fruit. You want a machine to sort "apple" from "banana". You measure 50 examples: length, width, weight. Each labelled apple or banana. You feed this to a learning algorithm. It builds a model that, given new measurements, predicts the label.
That's machine learning in one paragraph. Let's unpack each piece.
New Concept · The vocabulary 25 min
Five terms
| Term | Meaning | Fruit example |
|---|---|---|
| Feature | A measurable property of one example | Length, width, weight |
| Label | The answer for one example | "apple" or "banana" |
| Dataset | A collection of (features, label) pairs | 50 measured fruits |
| Training | The process of learning a model from a dataset | Algorithm finds rules separating apples from bananas |
| Inference | Using a trained model to predict on new data | New fruit: length 18, width 3, weight 110 → predicts banana |
The simplest model: a rule
Even before neural networks, classification can be a simple rule:
- If length / width > 2 → banana.
- Else → apple.
Works ~95% of the time. Some apples are oblong, some bananas are short — accuracy isn't 100%. ML algorithms find better rules, especially when features interact in non-obvious ways.
The neural network analogy
Imagine 3 layers of small judges:
- Input layer: takes the features (length, width, weight).
- Hidden layer: each "neuron" looks at all inputs, weights them, decides "does this look like X?" (X might be "long thing" or "light thing" or "round thing" — the network learns what X means).
- Output layer: takes the hidden layer's votes, decides "apple" or "banana".
Training adjusts the weights between layers so the predictions match the labels. Modern deep networks have many hidden layers and millions of weights — but the principle is the same.
Three failure modes
- Underfitting: model too simple. Can't separate apples from bananas. Solution: more features, bigger model.
- Overfitting: model memorises the training data. Perfect on training fruits, terrible on new ones. Solution: more data, simpler model, regularisation.
- Training-on-bad-data: if your 50 bananas were all yellow and your 50 apples all red, the model learns "colour" — then a green apple looks like a banana to it. Solution: representative training data.
The training/test split
Take your 50 fruits. Use 40 for training; hold back 10 for "testing". Train on the 40. Predict on the 10. The test accuracy is your honest estimate of how good the model is. If training accuracy is 100% but test is 60%, you've overfit.
Edge AI workflow recap
From L04-31:
- Collect features (sensor readings) labelled with the right class.
- Train a small model.
- Validate with a held-out test set.
- Convert the model to TFLite Micro format.
- Deploy onto Nano 33 BLE Sense.
- Inference at runtime on live sensor data.
Worked Example · Tiny dataset on paper 25 min
Imagine 10 fruits:
| # | Length (cm) | Weight (g) | Label |
|---|---|---|---|
| 1 | 6 | 120 | apple |
| 2 | 7 | 140 | apple |
| 3 | 6.5 | 130 | apple |
| 4 | 8 | 180 | apple |
| 5 | 7.5 | 150 | apple |
| 6 | 20 | 120 | banana |
| 7 | 18 | 110 | banana |
| 8 | 22 | 130 | banana |
| 9 | 17 | 100 | banana |
| 10 | 19 | 105 | banana |
Find a rule by inspection
Length > 10 → banana. Else → apple. 100% on this dataset.
Is weight useful?
Both apples (120–180 g) and bananas (100–130 g) overlap. Weight alone is a worse feature than length. ML algorithms automatically weigh features by usefulness.
What if you add "orange"?
Oranges might be 7 cm × 150 g. Indistinguishable from apples by length alone. Need a third feature: colour, shape, surface texture. Algorithms can handle 10s or 100s of features.
What if a label is wrong?
Suppose you accidentally label fruit #6 (the 20 cm one) as "apple". Now the "length > 10" rule fails on it. ML training is robust to a few wrong labels — averages them out across many examples. But many wrong labels = bad model.
Try a tiny neural network mentally
Hidden neuron: "is length > 12?". Output: "if hidden neuron says yes, banana; else apple". That's a 1-neuron network. Real networks have hundreds, learning combinations like "long AND lightweight = banana; round AND heavy = apple".
Try It Yourself · Pen-and-paper 15 min
You collect 5 examples of waving and 5 of clapping using an accelerometer. What's the "feature" for each?
Reveal
A sequence of (x, y, z) accel values over time. Might be 50 samples × 3 channels = 150 features per example. Or a summary: peak frequency, mean magnitude, etc.
You trained a model with 90% accuracy on the training set and 50% on the test set. Underfit or overfit?
Reveal
Overfit — memorised the training, can't generalise.
You collect 100 examples of "wave hand" from your dominant hand only. The deployed product fails for left-handed users. What went wrong?
Reveal
Training data not representative. Solution: collect from both hands; even better, include variation in motion paths and speeds.
Why does 99% accuracy not mean the model is good if the "positive" class is rare?
Reveal
If positives are 1 in 100, predicting "always negative" gives 99% accuracy and is useless. Look at precision / recall / F1 score / confusion matrix instead. Class imbalance hides under accuracy.
Mini-Challenge · Design a dataset 10 min
You want to recognise three gestures with the Nano 33 BLE Sense: wave, punch, clap.
- How many examples per gesture do you need? (50 is a starting point.)
- What variation should each gesture include? (Different speeds? Both hands? Multiple people?)
- What's the feature representation? (Raw accel sequence? Summary statistics?)
- How would you split into train / test? (80 / 20 standard.)
Recap 5 min
ML vocabulary: feature, label, dataset, training, inference. Models learn from labelled examples; performance depends on data quality + representation. Watch for underfit / overfit / bad data. Tomorrow we collect real training data from the Nano 33 BLE Sense.
- Feature
- A measurable input variable. For gesture recognition: accelerometer values over time.
- Label
- The correct answer for one example ("wave", "punch").
- Dataset
- A collection of labelled examples.
- Training
- The process of learning model parameters from the dataset.
- Inference
- Using the trained model to predict the label for new data.
- Model
- The learned mathematical function mapping features to predictions.
- Underfitting
- Model too simple; poor on both train and test.
- Overfitting
- Model memorises training data; great on train, bad on test.
- Train / test split
- Holding back some data to honestly evaluate the model. 80/20 typical.
- Class imbalance
- One label has many more examples than another. Skews training; needs careful handling.
- Neural network
- A model made of layered "neurons" that combine features non-linearly. The dominant model class today.
Homework 5 min
- Sign up for Google Colab if you haven't. Free Python ML environment.
- Read ahead to ARD-L04-33 (Capturing Training Data). Bring the Nano 33 BLE Sense + a laptop.