Learning Goals 5 min
"AI" usually means "data uploaded to a cloud server with a GPU". Edge AI moves that inference onto small devices — phones, doorbells, sensors. With TensorFlow Lite Micro a Nano 33 BLE Sense can run a real neural network and recognise gestures. By the end of this lesson you will:
- Compare cloud AI vs edge AI and explain why each fits different problems.
- Identify the resource constraints of edge devices (RAM, flash, no GPU, milliwatt power budget).
- Name three real edge AI uses: keyword spotting ("Hey Google"), gesture recognition, image classification.
Warm-Up 10 min
No new hardware today — conceptual lesson. Bring a Nano 33 BLE Sense if you have it; you'll meet its sensors.
Where AI runs today
| Location | Hardware | Model size | Latency |
|---|---|---|---|
| Cloud (OpenAI, Google) | A100 GPU clusters | ~1 TB (LLMs) | 200 ms – seconds + network |
| Phone (Siri / Google Lens) | Mobile NPU | ~100 MB | ~50 ms |
| Edge AI (Nano 33 BLE Sense) | nRF52840 ARM Cortex-M4 | ~50 KB | ~10 ms |
Edge AI runs the smallest models on the smallest chips. The trade-off: 50 KB of model vs 1 TB. You can't run ChatGPT on a Nano. But you CAN classify gestures, detect keywords ("hey"), recognise objects from a tiny camera. The use cases just have to be small enough.
New Concept · Edge vs cloud trade-offs 25 min
Why edge over cloud
- Privacy: data never leaves the device. Camera frames, microphone audio, biometric sensors — all stay local.
- Latency: no round trip to the cloud. 10 ms decisions instead of 200+ ms.
- Reliability: works offline. The doorbell that says "Amazon delivery" doesn't care if your WiFi is down.
- Cost: no per-API-call fee. One $30 board, lifetime free inference.
- Power: edge inference can be milliwatts. Cloud inference is kilowatts.
Why cloud over edge
- Model size: GPT-class models simply don't fit on edge hardware.
- Updates: cloud models update centrally; edge models need OTA flashes.
- Collective data: cloud sees all users; edge sees one.
Edge AI examples in the wild
| Product | Model | Chip class |
|---|---|---|
| Alexa "wake word" | ~50 KB keyword spotter | ARM Cortex-M4 |
| Apple Face ID | Custom NN on the Neural Engine | Apple A-series NPU |
| Ring doorbell person-detect | ~200 KB image classifier | NPU on the doorbell SoC |
| Smart-watch heart-rhythm detection | Tiny LSTM | ARM Cortex-M |
| Industrial "is this machine failing" vibration analyser | ~10 KB autoencoder | STM32 |
The hardware: Nano 33 BLE Sense
Arduino's edge-AI poster board. Includes:
- nRF52840 ARM Cortex-M4 + 1 MB flash + 256 KB RAM.
- IMU (LSM9DS1 — 9-axis: accel + gyro + magnetometer).
- Microphone (MP34DT05).
- Temperature + humidity sensor (HTS221).
- Pressure sensor (LPS22HB).
- Gesture / proximity / light (APDS-9960).
- BLE radio.
That's a wearable's worth of sensors on a £30 board. The right brain for gesture / sound / breath / etc. recognition.
The Edge AI workflow
- Collect training data on the device (acceleration during "wave" gesture, etc.).
- Train a small model on a laptop or in the cloud (Google Colab, Edge Impulse).
- Convert the trained model to TensorFlow Lite Micro (a tiny inference runtime).
- Deploy the model + runtime onto the device as a C array.
- Run the model on live sensor data in your Arduino sketch.
We'll work through this end-to-end over L04-32 to L04-36.
Worked Example · Run the bundled gesture-classifier demo 20 min
If you have a Nano 33 BLE Sense
- Install the "Arduino_TensorFlowLite" library (Tools → Manage Libraries).
- File → Examples → Arduino_TensorFlowLite → magic_wand.
- Upload to your Nano 33 BLE Sense.
- Open Serial Monitor at 9600 baud.
- Hold the board flat. Move it in a clear "W", "O", or "ring" gesture.
- The serial monitor prints which gesture it recognised (with confidence).
That's a real neural network classifying real-time IMU data on a £30 board with no internet. The TensorFlow Lite runtime occupies ~80 KB of flash; the model itself is ~20 KB.
If you don't have one
You can train + simulate models on Google Colab and watch the inference. The hardware is needed only for the actual on-device deployment. We'll provide simulation paths in L04-33.
Try It Yourself · Paper exercises 15 min
Edge or cloud: a real-time pedestrian detector on a car dashboard. Why?
Reveal
Edge. Latency matters (cars move fast); reliability matters (mid-tunnel = no cloud); privacy matters. NVIDIA Drive / Tesla FSD chips run these on-vehicle.
Edge or cloud: GPT-style chatbot. Why?
Reveal
Cloud. Model too big (~1 TB of weights), needs heavy GPUs. Even on-device LLMs need 10+ GB of RAM — not in "edge" territory.
Why is "Alexa, wake word" specifically on-device, while the actual command interpretation happens in the cloud?
Reveal
The mic must listen 24/7. Streaming audio to the cloud constantly = privacy violation + bandwidth cost + latency. A tiny on-device wake-word model triggers ~100% of the time; only after wake-up does the device stream the next 2–5 s to the cloud for full interpretation. Edge + cloud combined.
You want a baby monitor that detects when the baby is crying. Edge, cloud, or both? Justify.
Reveal
Pure edge: privacy (no streaming a baby's sounds to the cloud), reliability (works on flaky WiFi), latency (instant alert). Small audio model (~50 KB) trained on baby-cry vs not, on a Nano 33 BLE Sense.
Mini-Challenge · Plan your edge AI project 10 min
Sketch in your notebook one product idea for each of:
- Gesture recognition (accelerometer).
- Keyword spotting (microphone).
- Vibration anomaly detection (industrial).
For each, note: model size, training data needed, why edge over cloud, target hardware.
Recap 5 min
Edge AI = small models on small chips. Wins on privacy, latency, offline reliability, cost. Loses on model size and central updates. Nano 33 BLE Sense is Arduino's edge AI flagship — Cortex-M4 + 9-axis IMU + mic + climate + light. Tomorrow we learn ML basics; the rest of Cluster F applies them.
- Edge AI
- Running inference on the data-generating device, not in the cloud.
- Inference
- Running a trained model to make a prediction. Distinct from training (which produces the model).
- TensorFlow Lite Micro
- Google's ML runtime for microcontrollers. ~16 KB code footprint. Runs on Cortex-M0+ and up.
- NPU (Neural Processing Unit)
- Specialised AI chip. Phones (Apple Neural Engine, Google Tensor) include them; Arduino-class don't.
- Model size
- How many bytes the trained weights occupy. Tiny models: < 100 KB. Big models: GB+.
- Wake word
- A small phrase ("Hey Google") detected on-device before activating cloud-based interpretation.
- Keyword spotting
- Detecting one of a small vocabulary of words. Doable on edge.
- Anomaly detection
- Spotting "this isn't the normal pattern". Small autoencoder models excel.
Homework 5 min
- If you have a Nano 33 BLE Sense, upload the magic_wand example. Try the gestures.
- Read ahead to ARD-L04-32 (ML Fundamentals).
- If you have a laptop, install Python 3 and the
tensorflowpackage (or sign up for Google Colab — easier).