A06 - Counting Cars
Assignment: GitHub Classroom
Late Policy
- You have until the assigned due date, after that you will receive 0 points.
A city transportation department has deployed NVIDIA Jetson devices at major intersections to collect video feeds. They want to run real-time vehicle counting on-device (at the edge) instead of streaming all video back to the cloud. This saves bandwidth, reduces latency, and keeps data local.
Your team has written a vehicle-counting script that uses a YOLOv8 model. It works on your laptop, but deploying it to dozens of Jetson devices in the field is painful:
- Each Jetson may run a different JetPack version.
- Installing PyTorch, OpenCV, and CUDA libraries by hand takes hours per device.
- “Works on my machine” bugs appear constantly.
- Updating the software across the fleet is a nightmare.
Your job: containerize the pipeline with Docker so it can be built once and deployed reliably to any Jetson in the fleet — whether it is a Jetson Orin Nano, Orin NX, or the new Jetson Thor.
Lab Setup
You will work on a shared Jetson Thor device (JetPack 7.x, L4T R38.4.0, CUDA 13.0). The instructor has pre-pulled the community base image (dustynv/ultralytics:r37.0, ~26 GB), so you do not need to download it.
Note on the container tag: The device runs L4T R38.4.0, but the container is tagged
r37.0. There is nor38-tagged ultralytics image available —r37.0is the closest match and is backward-compatible with the R38.x host. CUDA 13.0 is available and GPU inference works correctly with this image.
Because you share the device with other students:
-
Tag your Docker images with your name:
docker build -t carcounter-YOURNAME:latest ... -
Write output to your own directory:
logs-YOURNAME/ -
Do NOT delete shared images (
docker system pruneis off-limits). - The sample videos are in a shared location — mount read-only (see below).
Shared dataset on athor01.evl.uic.edu:
/opt/CarCounter/data/
├── niu00.mp4 ← sample traffic video 1
└── niu01.mp4 ← sample traffic video 2
When running containers, mount this directory read-only:
-v /opt/CarCounter/data:/app/data:ro
Docker & Edge Computing Background
What Is Docker?
Docker packages an application together with all its dependencies (libraries, system tools, configuration) into a container — a lightweight, portable unit that runs the same way everywhere. Unlike a virtual machine, a container shares the host OS kernel, so there is almost no performance overhead.
Key concepts:
| Term | Meaning |
|---|---|
| Image | A read-only template containing the OS, libraries, and code |
| Container | A running instance of an image |
| Dockerfile | A recipe that builds an image layer by layer |
| Registry | A repository for sharing images (Docker Hub, NGC, GHCR) |
| Volume | A directory shared between the host and container |
| Runtime | The engine that executes containers (Docker, containerd) |
Why Docker on Edge Devices?
Edge devices live in the real world — rooftops, intersections, factory floors. You cannot SSH into every device to debug a broken pip install. Docker solves this:
| Problem | Docker Solution |
|---|---|
| Dependency hell across devices | Everything bundled in the image |
| Hours of manual setup per node |
docker pull + docker run
|
| JetPack version mismatches | Image tagged per JetPack version |
| Risky in-place updates | Atomic: pull new image, stop old, start new |
| Rollback after bad update | Keep previous image, switch back instantly |
| Reproducibility | Same image = same behavior everywhere |
NVIDIA Container Runtime
Standard Docker cannot access the GPU. NVIDIA provides nvidia-container-runtime (part of the NVIDIA Container Toolkit), which injects the GPU driver and CUDA libraries into containers at launch time.
On Jetson, this is pre-installed with JetPack. You enable it with:
docker run --runtime=nvidia ...
Note: On the shared device, the instructor has already configured
nvidiaas the default Docker runtime, so GPU access works automatically — you do not need to pass--runtime=nvidiaon every command. However, it is good practice to include it explicitly (and you will see it in the examples below), because on other devices the default may berunc, which silently falls back to CPU. If you ever seetorch.cuda.is_available()returnFalse, a missing--runtime=nvidiaflag is the most likely cause.The default runtime is configured system-wide in
/etc/docker/daemon.json(managed by the instructor — students do not need to edit this file).
Community Containers: jetson-containers
Building GPU-accelerated Python libraries from source on Jetson takes hours. The community project jetson-containers maintains pre-built Docker images with popular ML frameworks optimized for Jetson:
| Container | Contents |
|---|---|
dustynv/l4t-pytorch |
PyTorch + torchvision on L4T |
dustynv/ultralytics |
YOLOv8 (ultralytics) + PyTorch + OpenCV |
dustynv/l4t-ml |
Full ML stack (PyTorch, TF, ONNX, etc.) |
dustynv/deepstream |
NVIDIA DeepStream for video analytics |
Images are tagged by JetPack version. You must match the tag to your device:
| JetPack | L4T Version | Example Tag | Devices |
|---|---|---|---|
| 7.x | R38.x |
r37.0 * |
Jetson Thor (this device) |
| 6.x | R36.x | r36.4.0 |
Orin Nano/NX (new JetPack) |
| 5.x | R35.x | r35.4.1 |
Orin Nano/NX (older JetPack) |
* No
r38-tagged ultralytics image exists. Ther37.0tag is the closest available and is backward-compatible with the R38.x host. If no exact-match tag exists for your L4T version, use the closest lower version.
Critical: A container built for JetPack 6 will NOT work on JetPack 5 or 7. The CUDA toolkit and driver versions must match. Our shared Thor is JetPack 7.x (L4T R38.4.0) — use the
r37.0tag, which is the latest available and works correctly on this host.
Container Image Size on Edge
A full ML container can be 15–30 GB. The dustynv/ultralytics:r37.0 image used in this assignment is ~26 GB. Jetson devices often have limited storage (16–64 GB eMMC/NVMe). This means:
- You cannot casually pull five different containers.
- Multi-stage builds and
.dockerignorehelp reduce image size. - Extending an existing community image (instead of building from scratch) is both faster and smaller.
Dockerfile: FROM, the Most Important Line
When you write FROM dustynv/ultralytics:r37.0, you inherit everything in that community image — PyTorch, CUDA, OpenCV, ultralytics — for free. Your Dockerfile only adds your application code on top:
Community image (~26 GB): CUDA, cuDNN, PyTorch, OpenCV, ultralytics
↓
Your layer (< 10 MB): countCars.py, config, sample data
↓
Final image (~26 GB): Ready to deploy
If you tried FROM ubuntu:22.04 and installed everything yourself, you would spend hours building and end up with a larger, less optimized image.
Docker Compose for Orchestration
docker-compose.yml lets you define multi-container applications declaratively. Even for a single service, it is useful because it captures the exact docker run flags (runtime, volumes, environment) in a file that can be version-controlled and shared:
# Instead of remembering this every time:
# docker run --runtime=nvidia -v ./data:/app/data -v ./logs:/app/logs ...
# You write it once in docker-compose.yml and run:
# docker compose up
Repository Structure
carcounter/
├── README.md ← You are here
├── DOCKER_SETUP.md ← Docker + NVIDIA runtime setup guide
├── EXAMPLES.md ← Diagrams and visual explanations
├── requirements.txt ← Python dependencies (for reference)
├── .gitignore
│
├── data/
│ ├── trafficVideoManifest.json ← Describes expected sample video
│ └── downloadSample.sh ← Downloads a sample traffic video
│
├── scripts/
│ ├── countCars.py ← Vehicle counting pipeline (TODOs 3a–3d)
│ └── benchmarkEdge.py ← Model-size benchmark (TODOs 4a–4d)
│
├── docker/
│ ├── Dockerfile ← Container build recipe (TODO 2)
│ └── docker-compose.yml ← Orchestration config (TODO 5)
│
├── reflection.txt ← Reflection questions (TODO 6)
└── logs/ ← Output directory (git-ignored)
Your Tasks
TODO 1 — Verify Docker & NVIDIA Container Runtime
Follow DOCKER_SETUP.md to verify (or install) Docker and the NVIDIA container runtime on your Jetson device. By the end of this step you should be able to run:
# Verify Docker
docker --version
# Verify NVIDIA runtime (Thor = CUDA 13.0)
docker run --rm --runtime=nvidia nvidia/cuda:13.0.0-base-ubuntu22.04 nvidia-smi
And see the Thor GPU listed in the nvidia-smi output.
Verification:
docker info | grep -i runtime
# Should show "nvidia" in the list of runtimes
# NOTE: On the shared device, the default runtime is "nvidia" (set by the instructor).
# The --runtime=nvidia flag is optional here but good practice for portability.
# Confirm the base image is already cached (pre-pulled by instructor):
docker images | grep ultralytics
# Should show: dustynv/ultralytics r37.0 ... ~26 GB
TODO 2 — Complete the Dockerfile
Open docker/Dockerfile. It contains a skeleton with comments explaining each section. Your job:
- Choose the right
FROMimage and tag for Thor (r37.0). - Set the working directory.
- Copy application files into the container.
- Install any additional pip packages needed.
- Set the default command.
Note on Step 4: The base image (
dustynv/ultralytics:r37.0) already includes all packages listed inrequirements.txt— ultralytics, OpenCV, and NumPy are pre-installed. Check whether anything is actually missing before adding aRUN pip3 installline. If everything is already provided, you can skip this step entirely.
Verification (use YOUR name in the tag):
docker build -t carcounter-YOURNAME:latest -f docker/Dockerfile .
# Test that ultralytics is available inside the container.
# Do NOT prepend "python3" to the command — the ENTRYPOINT already handles that.
# The correct command is:
docker run --rm --runtime=nvidia carcounter-YOURNAME:latest -c "import ultralytics; print('OK')"
# Common mistake: if you accidentally run "docker run ... python3 -c ...",
# you will get an error like "can't open file '/app/python3'" because the
# ENTRYPOINT (python3) receives "python3" as a filename argument.
# To fix this, simply omit "python3" from the command, or override the
# entrypoint explicitly:
# docker run --rm --runtime=nvidia --entrypoint python3 carcounter-YOURNAME:latest -c "import ultralytics; print('OK')"
TODO 3 — Implement Vehicle Counting (scripts/countCars.py)
The pipeline script has four functions for you to implement:
| TODO | Function | Purpose |
|---|---|---|
| 3a | resolveDevice() |
Map “cpu”/”cuda”/”auto” to YOLO device string |
| 3b | loadModel() |
Load a YOLOv8 model |
| 3c | detectVehicles() |
Run YOLO on one frame, filter for vehicle classes |
| 3d | processVideo() |
Loop through video frames, collect per-frame counts |
Everything else (argument parsing, result saving, frame annotation) is provided.
Verification:
# Run inside your container:
docker run --rm --runtime=nvidia \
-v /opt/CarCounter/data:/app/data:ro -v $(pwd)/logs-YOURNAME:/app/logs \
carcounter-YOURNAME:latest \
scripts/countCars.py --video /app/data/niu00.mp4 --output /app/logs --device auto
# Expected output in logs-YOURNAME/:
# carCounts.csv — per-frame vehicle counts
# summary.json — aggregate statistics
# timing.json — performance metrics
# detection_*.png — annotated sample frames
Note: YOLO model weights (e.g., yolov8n.pt) are downloaded from GitHub on the first run inside the container. Since the container filesystem is ephemeral, this download happens on every
docker run. The downloads are small and fast (~6 MB for yolov8n), but will fail without internet access. To avoid repeated downloads, you can pre-download models duringdocker buildby adding aRUNlayer, or mount a persistent volume for the model cache directory.
TODO 4 — Implement Edge Benchmarking (scripts/benchmarkEdge.py)
This script compares different YOLO model sizes on your Jetson to find the right accuracy–speed tradeoff for edge deployment.
| TODO | Function | Purpose |
|---|---|---|
| 4a | warmupModel() |
Run warmup inference with proper GPU sync |
| 4b | benchmarkModel() |
Time inference across N frames, compute FPS |
| 4c | generateReport() |
Format a human-readable comparison table |
| 4d | main() |
Parse args, benchmark each model, save results |
Verification:
docker run --rm --runtime=nvidia \
-v /opt/CarCounter/data:/app/data:ro -v $(pwd)/logs-YOURNAME:/app/logs \
carcounter-YOURNAME:latest \
scripts/benchmarkEdge.py --video /app/data/niu00.mp4 --output /app/logs
# Expected output in logs-YOURNAME/:
# benchmarkReport.txt — human-readable comparison
# benchmarkResults.json — structured timing data
TODO 5 — Complete docker-compose.yml
Open docker/docker-compose.yml. Complete the service definition so that docker compose up builds and runs the entire pipeline:
- Build from the Dockerfile in this directory.
- Enable NVIDIA GPU runtime.
- Mount
data/andlogs-YOURNAME/as volumes. - Pass the video path and output directory as command arguments.
- Set the image name to
carcounter-YOURNAME:latest.
GPU access in Compose: Use
runtime: nvidiain your service definition. This is the standard approach for Docker Compose v2 (which is what this device runs). You may see references to an alternativedeploy.resources.reservationsblock — that syntax is for Compose v3+ / Docker Swarm mode and is not needed here.
Verification:
cd docker
STUDENT=YOURNAME docker compose up --build
# Should process the video and write results to ../logs-YOURNAME/
TODO 6 — Reflection
Fill out reflection.txt with your real observations and measurements.
Evaluation Criteria
| Criterion | Weight |
|---|---|
Docker + NVIDIA runtime verified (nvidia-smi in container) |
10% |
| Dockerfile builds and runs correctly | 20% |
countCars.py produces correct vehicle counts |
25% |
benchmarkEdge.py compares model sizes with valid timings |
20% |
docker-compose.yml orchestrates full pipeline |
10% |
| Reflection with real measurements and analysis | 15% |
Tips
- Always check your JetPack version first. Every container choice depends on it. On this device: JetPack 7.x, L4T R38.4.0, CUDA 13.0.
-
Start small. Get
docker run --runtime=nvidiaworking before writing the Dockerfile. -
Include
--runtime=nvidiafor portability. The shared device defaults to the nvidia runtime, but other devices may not. Including the flag explicitly is good practice and ensures GPU access everywhere. - Read the community docs. The jetson-containers README has a compatibility matrix and usage examples.
-
Watch image sizes. Run
docker imagesto see how much space you are using. The base image alone is ~26 GB. -
Use
--maxFrames 100during development to speed up iteration. - GPU warmup matters. The first inference on the GPU includes a one-time CUDA setup. See EXAMPLES.md for details.
Additional Resources
- Docker Documentation: Getting Started Official Docker documentation covering images, containers, Dockerfiles, volumes, and Compose. This is the authoritative reference for all Docker concepts used in this assignment, from writing your first Dockerfile to orchestrating services with docker-compose.yml.
-
NVIDIA Container Toolkit Documentation Official NVIDIA documentation for enabling GPU access inside Docker containers. Covers installation, runtime configuration, and troubleshooting for the nvidia-container-runtime that makes
--runtime=nvidiawork on Jetson and desktop GPUs. - jetson-containers (GitHub) Community project by Dustin Franklin (NVIDIA) providing pre-built Docker images for Jetson devices. Includes a compatibility matrix mapping JetPack versions to container tags, build instructions, and a catalog of available ML framework containers (PyTorch, ultralytics, TensorFlow, etc.).
- Ultralytics YOLOv8 Documentation Official documentation for the ultralytics library and YOLOv8 models. Covers model loading, inference, detection result parsing, export to TensorRT, and deployment on edge devices. This is the primary reference for understanding the YOLO API used in countCars.py.
- NVIDIA Jetson Developer Guide Official NVIDIA documentation covering Jetson hardware platforms, JetPack SDK components, CUDA toolkit versions, and system configuration. Essential for understanding the relationship between JetPack version, L4T version, and CUDA version that determines which container tags to use.
- Docker Compose Specification Official reference for docker-compose.yml syntax, including service definitions, volume mounts, runtime configuration, environment variables, and the deploy/resources section for GPU access. Use this when writing and debugging your docker-compose.yml in TODO 5.
-
PyTorch CUDA Semantics Official PyTorch documentation explaining asynchronous GPU execution,
torch.cuda.synchronize(), device placement, and memory management. This is critical for understanding why GPU warmup and synchronization are required for accurate benchmarking in benchmarkEdge.py.
