A city transportation department has deployed NVIDIA Jetson devices at major intersections to collect video feeds. They want to run real-time vehicle counting on-device (at the edge) instead of streaming all video back to the cloud. This saves bandwidth, reduces latency, and keeps data local.

Your team has written a vehicle-counting script that uses a YOLOv8 model. It works on your laptop, but deploying it to dozens of Jetson devices in the field is painful:

  • Each Jetson may run a different JetPack version.
  • Installing PyTorch, OpenCV, and CUDA libraries by hand takes hours per device.
  • “Works on my machine” bugs appear constantly.
  • Updating the software across the fleet is a nightmare.

Your job: containerize the pipeline with Docker so it can be built once and deployed reliably to any Jetson in the fleet — whether it is a Jetson Orin Nano, Orin NX, or the new Jetson Thor.

Lab Setup

You will work on a shared Jetson Thor device (JetPack 7.x, L4T R38.4.0, CUDA 13.0). The instructor has pre-pulled the community base image (dustynv/ultralytics:r37.0, ~26 GB), so you do not need to download it.

Note on the container tag: The device runs L4T R38.4.0, but the container is tagged r37.0. There is no r38-tagged ultralytics image available — r37.0 is the closest match and is backward-compatible with the R38.x host. CUDA 13.0 is available and GPU inference works correctly with this image.

Because you share the device with other students:

  • Tag your Docker images with your name: docker build -t carcounter-YOURNAME:latest ...
  • Write output to your own directory: logs-YOURNAME/
  • Do NOT delete shared images (docker system prune is off-limits).
  • The sample videos are in a shared location — mount read-only (see below).

Shared dataset on athor01.evl.uic.edu:

/opt/CarCounter/data/
├── niu00.mp4    ← sample traffic video 1
└── niu01.mp4    ← sample traffic video 2

When running containers, mount this directory read-only:

-v /opt/CarCounter/data:/app/data:ro

Docker & Edge Computing Background

What Is Docker?

Docker packages an application together with all its dependencies (libraries, system tools, configuration) into a container — a lightweight, portable unit that runs the same way everywhere. Unlike a virtual machine, a container shares the host OS kernel, so there is almost no performance overhead.

Key concepts:

Term Meaning
Image A read-only template containing the OS, libraries, and code
Container A running instance of an image
Dockerfile A recipe that builds an image layer by layer
Registry A repository for sharing images (Docker Hub, NGC, GHCR)
Volume A directory shared between the host and container
Runtime The engine that executes containers (Docker, containerd)

Why Docker on Edge Devices?

Edge devices live in the real world — rooftops, intersections, factory floors. You cannot SSH into every device to debug a broken pip install. Docker solves this:

Problem Docker Solution
Dependency hell across devices Everything bundled in the image
Hours of manual setup per node docker pull + docker run
JetPack version mismatches Image tagged per JetPack version
Risky in-place updates Atomic: pull new image, stop old, start new
Rollback after bad update Keep previous image, switch back instantly
Reproducibility Same image = same behavior everywhere

NVIDIA Container Runtime

Standard Docker cannot access the GPU. NVIDIA provides nvidia-container-runtime (part of the NVIDIA Container Toolkit), which injects the GPU driver and CUDA libraries into containers at launch time.

On Jetson, this is pre-installed with JetPack. You enable it with:

docker run --runtime=nvidia ...

Note: On the shared device, the instructor has already configured nvidia as the default Docker runtime, so GPU access works automatically — you do not need to pass --runtime=nvidia on every command. However, it is good practice to include it explicitly (and you will see it in the examples below), because on other devices the default may be runc, which silently falls back to CPU. If you ever see torch.cuda.is_available() return False, a missing --runtime=nvidia flag is the most likely cause.

The default runtime is configured system-wide in /etc/docker/daemon.json (managed by the instructor — students do not need to edit this file).

Community Containers: jetson-containers

Building GPU-accelerated Python libraries from source on Jetson takes hours. The community project jetson-containers maintains pre-built Docker images with popular ML frameworks optimized for Jetson:

Container Contents
dustynv/l4t-pytorch PyTorch + torchvision on L4T
dustynv/ultralytics YOLOv8 (ultralytics) + PyTorch + OpenCV
dustynv/l4t-ml Full ML stack (PyTorch, TF, ONNX, etc.)
dustynv/deepstream NVIDIA DeepStream for video analytics

Images are tagged by JetPack version. You must match the tag to your device:

JetPack L4T Version Example Tag Devices
7.x R38.x r37.0 * Jetson Thor (this device)
6.x R36.x r36.4.0 Orin Nano/NX (new JetPack)
5.x R35.x r35.4.1 Orin Nano/NX (older JetPack)

* No r38-tagged ultralytics image exists. The r37.0 tag is the closest available and is backward-compatible with the R38.x host. If no exact-match tag exists for your L4T version, use the closest lower version.

Critical: A container built for JetPack 6 will NOT work on JetPack 5 or 7. The CUDA toolkit and driver versions must match. Our shared Thor is JetPack 7.x (L4T R38.4.0) — use the r37.0 tag, which is the latest available and works correctly on this host.

Container Image Size on Edge

A full ML container can be 15–30 GB. The dustynv/ultralytics:r37.0 image used in this assignment is ~26 GB. Jetson devices often have limited storage (16–64 GB eMMC/NVMe). This means:

  • You cannot casually pull five different containers.
  • Multi-stage builds and .dockerignore help reduce image size.
  • Extending an existing community image (instead of building from scratch) is both faster and smaller.

Dockerfile: FROM, the Most Important Line

When you write FROM dustynv/ultralytics:r37.0, you inherit everything in that community image — PyTorch, CUDA, OpenCV, ultralytics — for free. Your Dockerfile only adds your application code on top:

Community image (~26 GB):  CUDA, cuDNN, PyTorch, OpenCV, ultralytics
         ↓
Your layer (< 10 MB):   countCars.py, config, sample data
         ↓
Final image (~26 GB):     Ready to deploy

If you tried FROM ubuntu:22.04 and installed everything yourself, you would spend hours building and end up with a larger, less optimized image.

Docker Compose for Orchestration

docker-compose.yml lets you define multi-container applications declaratively. Even for a single service, it is useful because it captures the exact docker run flags (runtime, volumes, environment) in a file that can be version-controlled and shared:

# Instead of remembering this every time:
# docker run --runtime=nvidia -v ./data:/app/data -v ./logs:/app/logs ...

# You write it once in docker-compose.yml and run:
# docker compose up

Repository Structure

carcounter/
├── README.md                     ← You are here
├── DOCKER_SETUP.md               ← Docker + NVIDIA runtime setup guide
├── EXAMPLES.md                   ← Diagrams and visual explanations
├── requirements.txt              ← Python dependencies (for reference)
├── .gitignore
│
├── data/
│   ├── trafficVideoManifest.json    ← Describes expected sample video
│   └── downloadSample.sh           ← Downloads a sample traffic video
│
├── scripts/
│   ├── countCars.py                 ← Vehicle counting pipeline (TODOs 3a–3d)
│   └── benchmarkEdge.py            ← Model-size benchmark (TODOs 4a–4d)
│
├── docker/
│   ├── Dockerfile                   ← Container build recipe (TODO 2)
│   └── docker-compose.yml          ← Orchestration config (TODO 5)
│
├── reflection.txt                   ← Reflection questions (TODO 6)
└── logs/                            ← Output directory (git-ignored)

Your Tasks

TODO 1 — Verify Docker & NVIDIA Container Runtime

Follow DOCKER_SETUP.md to verify (or install) Docker and the NVIDIA container runtime on your Jetson device. By the end of this step you should be able to run:

# Verify Docker
docker --version

# Verify NVIDIA runtime (Thor = CUDA 13.0)
docker run --rm --runtime=nvidia nvidia/cuda:13.0.0-base-ubuntu22.04 nvidia-smi

And see the Thor GPU listed in the nvidia-smi output.

Verification:

docker info | grep -i runtime
# Should show "nvidia" in the list of runtimes
# NOTE: On the shared device, the default runtime is "nvidia" (set by the instructor).
# The --runtime=nvidia flag is optional here but good practice for portability.

# Confirm the base image is already cached (pre-pulled by instructor):
docker images | grep ultralytics
# Should show: dustynv/ultralytics   r37.0   ...   ~26 GB

TODO 2 — Complete the Dockerfile

Open docker/Dockerfile. It contains a skeleton with comments explaining each section. Your job:

  1. Choose the right FROM image and tag for Thor (r37.0).
  2. Set the working directory.
  3. Copy application files into the container.
  4. Install any additional pip packages needed.
  5. Set the default command.

Note on Step 4: The base image (dustynv/ultralytics:r37.0) already includes all packages listed in requirements.txt — ultralytics, OpenCV, and NumPy are pre-installed. Check whether anything is actually missing before adding a RUN pip3 install line. If everything is already provided, you can skip this step entirely.

Verification (use YOUR name in the tag):

docker build -t carcounter-YOURNAME:latest -f docker/Dockerfile .

# Test that ultralytics is available inside the container.
# Do NOT prepend "python3" to the command — the ENTRYPOINT already handles that.
# The correct command is:
docker run --rm --runtime=nvidia carcounter-YOURNAME:latest -c "import ultralytics; print('OK')"

# Common mistake: if you accidentally run "docker run ... python3 -c ...",
# you will get an error like "can't open file '/app/python3'" because the
# ENTRYPOINT (python3) receives "python3" as a filename argument.
# To fix this, simply omit "python3" from the command, or override the
# entrypoint explicitly:
# docker run --rm --runtime=nvidia --entrypoint python3 carcounter-YOURNAME:latest -c "import ultralytics; print('OK')"

TODO 3 — Implement Vehicle Counting (scripts/countCars.py)

The pipeline script has four functions for you to implement:

TODO Function Purpose
3a resolveDevice() Map “cpu”/”cuda”/”auto” to YOLO device string
3b loadModel() Load a YOLOv8 model
3c detectVehicles() Run YOLO on one frame, filter for vehicle classes
3d processVideo() Loop through video frames, collect per-frame counts

Everything else (argument parsing, result saving, frame annotation) is provided.

Verification:

# Run inside your container:
docker run --rm --runtime=nvidia \
    -v /opt/CarCounter/data:/app/data:ro -v $(pwd)/logs-YOURNAME:/app/logs \
    carcounter-YOURNAME:latest \
    scripts/countCars.py --video /app/data/niu00.mp4 --output /app/logs --device auto

# Expected output in logs-YOURNAME/:
#   carCounts.csv      — per-frame vehicle counts
#   summary.json       — aggregate statistics
#   timing.json        — performance metrics
#   detection_*.png    — annotated sample frames

Note: YOLO model weights (e.g., yolov8n.pt) are downloaded from GitHub on the first run inside the container. Since the container filesystem is ephemeral, this download happens on every docker run. The downloads are small and fast (~6 MB for yolov8n), but will fail without internet access. To avoid repeated downloads, you can pre-download models during docker build by adding a RUN layer, or mount a persistent volume for the model cache directory.


TODO 4 — Implement Edge Benchmarking (scripts/benchmarkEdge.py)

This script compares different YOLO model sizes on your Jetson to find the right accuracy–speed tradeoff for edge deployment.

TODO Function Purpose
4a warmupModel() Run warmup inference with proper GPU sync
4b benchmarkModel() Time inference across N frames, compute FPS
4c generateReport() Format a human-readable comparison table
4d main() Parse args, benchmark each model, save results

Verification:

docker run --rm --runtime=nvidia \
    -v /opt/CarCounter/data:/app/data:ro -v $(pwd)/logs-YOURNAME:/app/logs \
    carcounter-YOURNAME:latest \
    scripts/benchmarkEdge.py --video /app/data/niu00.mp4 --output /app/logs

# Expected output in logs-YOURNAME/:
#   benchmarkReport.txt    — human-readable comparison
#   benchmarkResults.json  — structured timing data

TODO 5 — Complete docker-compose.yml

Open docker/docker-compose.yml. Complete the service definition so that docker compose up builds and runs the entire pipeline:

  • Build from the Dockerfile in this directory.
  • Enable NVIDIA GPU runtime.
  • Mount data/ and logs-YOURNAME/ as volumes.
  • Pass the video path and output directory as command arguments.
  • Set the image name to carcounter-YOURNAME:latest.

GPU access in Compose: Use runtime: nvidia in your service definition. This is the standard approach for Docker Compose v2 (which is what this device runs). You may see references to an alternative deploy.resources.reservations block — that syntax is for Compose v3+ / Docker Swarm mode and is not needed here.

Verification:

cd docker
STUDENT=YOURNAME docker compose up --build
# Should process the video and write results to ../logs-YOURNAME/

TODO 6 — Reflection

Fill out reflection.txt with your real observations and measurements.


Evaluation Criteria

Criterion Weight
Docker + NVIDIA runtime verified (nvidia-smi in container) 10%
Dockerfile builds and runs correctly 20%
countCars.py produces correct vehicle counts 25%
benchmarkEdge.py compares model sizes with valid timings 20%
docker-compose.yml orchestrates full pipeline 10%
Reflection with real measurements and analysis 15%

Tips

  • Always check your JetPack version first. Every container choice depends on it. On this device: JetPack 7.x, L4T R38.4.0, CUDA 13.0.
  • Start small. Get docker run --runtime=nvidia working before writing the Dockerfile.
  • Include --runtime=nvidia for portability. The shared device defaults to the nvidia runtime, but other devices may not. Including the flag explicitly is good practice and ensures GPU access everywhere.
  • Read the community docs. The jetson-containers README has a compatibility matrix and usage examples.
  • Watch image sizes. Run docker images to see how much space you are using. The base image alone is ~26 GB.
  • Use --maxFrames 100 during development to speed up iteration.
  • GPU warmup matters. The first inference on the GPU includes a one-time CUDA setup. See EXAMPLES.md for details.

Additional Resources

  • Docker Documentation: Getting Started Official Docker documentation covering images, containers, Dockerfiles, volumes, and Compose. This is the authoritative reference for all Docker concepts used in this assignment, from writing your first Dockerfile to orchestrating services with docker-compose.yml.
  • NVIDIA Container Toolkit Documentation Official NVIDIA documentation for enabling GPU access inside Docker containers. Covers installation, runtime configuration, and troubleshooting for the nvidia-container-runtime that makes --runtime=nvidia work on Jetson and desktop GPUs.
  • jetson-containers (GitHub) Community project by Dustin Franklin (NVIDIA) providing pre-built Docker images for Jetson devices. Includes a compatibility matrix mapping JetPack versions to container tags, build instructions, and a catalog of available ML framework containers (PyTorch, ultralytics, TensorFlow, etc.).
  • Ultralytics YOLOv8 Documentation Official documentation for the ultralytics library and YOLOv8 models. Covers model loading, inference, detection result parsing, export to TensorRT, and deployment on edge devices. This is the primary reference for understanding the YOLO API used in countCars.py.
  • NVIDIA Jetson Developer Guide Official NVIDIA documentation covering Jetson hardware platforms, JetPack SDK components, CUDA toolkit versions, and system configuration. Essential for understanding the relationship between JetPack version, L4T version, and CUDA version that determines which container tags to use.
  • Docker Compose Specification Official reference for docker-compose.yml syntax, including service definitions, volume mounts, runtime configuration, environment variables, and the deploy/resources section for GPU access. Use this when writing and debugging your docker-compose.yml in TODO 5.
  • PyTorch CUDA Semantics Official PyTorch documentation explaining asynchronous GPU execution, torch.cuda.synchronize(), device placement, and memory management. This is critical for understanding why GPU warmup and synchronization are required for accurate benchmarking in benchmarkEdge.py.