Edge AI workloads rely heavily on GPU acceleration for neural network inference. Unlike datacenter GPUs (A100, H100), NVIDIA Jetson devices have integrated GPUs with unified memory architecture, where the GPU and CPU share the same RAM.

You will:

  • Use tegrastats to query GPU metrics
  • Use jtop to monitor GPU utilization visually
  • Collect GPU metrics during a synthetic workload
  • Distinguish between GPU utilization and memory utilization
  • Identify whether a workload is GPU-bound or memory-bound

These skills are essential for optimizing edge AI inference pipelines and understanding GPU bottlenecks.


Objective and Expected Learning Outcomes

By completing this assignment, you will be able to:

  1. Use tegra to query GPU metrics on Jetson devices.
  2. Monitor GPU utilization with jtop during workloads.
  3. Collect GPU metrics programmatically using Python.
  4. Distinguish between GPU compute utilization and memory bandwidth utilization.
  5. Identify GPU bottlenecks in edge inference workloads.
  6. Understand unified memory architecture on edge SoCs.

Edge Devices

This lab must be completed on an NVIDIA Jetson device:

  • Jetson Nano (4GB or 2GB)
  • Jetson Xavier NX
  • Jetson AGX Xavier
  • Jetson AGX Orin
  • Jetson Orin Nano
  • Jetson Orin NX
  • Jetson Thor (if available)

Your results will vary based on GPU cores, memory bandwidth, and CUDA cores available on your device.


What You Are Given

Your repository includes:

  • scripts/monitorGPU.py (has TODOs for GPU monitoring)
  • scripts/stressGPU.py (GPU workload generator - complete, no edits needed)
  • requirements.txt (placeholder)
  • reflection.txt (reflection template)

Rules

  • Follow course guidelines about working on the development branch!
  • Do all work on your Jetson device (Nano, Orin, Thor, etc.).
  • You must commit logs/GPUMetrics.csv.
  • You must commit logs/GPUMetrics.png (visualization).
  • Do not commit .venv.
  • Ensure your code handles missing GPU data gracefully (some metrics not available on all Jetsons).

Prerequisites

Your instructor has pre-installed system tools (jtop, tegrastats). You will set up your Python environment.

Step 1: Verify System Tools

# Verify system tools are installed
jtop --version
tegrastats

If these fail, contact your instructor.

Step 2: Create Python Virtual Environment

# Create virtual environment
python3 -m venv .venv

# Activate it
source .venv/bin/activate

# Upgrade build tools first
pip install --upgrade pip setuptools wheel

# Install dependencies
pip install -r requirements.txt

Step 3: Verify Python Packages

python3 -c "from jtop import jtop; print('jtop OK')"
python3 -c "import matplotlib; print('matplotlib OK')"
python3 -c "import numpy; print('numpy OK')"

Instructions

Follow the steps below carefully and in order.


Part A: Verify GPU Access

Before collecting metrics, verify your Jetson GPU is accessible.

Step A1: Check tegrastats

tegrastats

You should see output showing:

  • GPU name (e.g., “NVIDIA Tegra Orin”)
  • GPU utilization
  • Memory usage
  • Temperature (may not be available)

Note: Some fields may show [N/A] on Jetson devices due to unified memory architecture.

Step A2: Check jtop GPU page

jtop

Press 2 or navigate to the GPU page. You should see:

  • GPU utilization percentage
  • GPU frequency (current and max)
  • GPU memory usage (shared with system RAM)

Press q to quit.


Part B: Observe GPU Metrics Interactively

Step B1: Observe idle GPU

While jtop is running with no workloads:

  • Note idle GPU% (should be 0% or very low)
  • Note GPU frequency (may be throttled down when idle)
  • Note memory usage

Step B2: Create a GPU workload

Open a second terminal and run:

python3 scripts/stressGPU.py --duration 30 --intensity medium

Observe in jtop:

  • Does GPU% increase to 80-100%?
  • Does GPU frequency increase to maximum?
  • Does temperature rise (if data is available)?
  • Does memory usage increase?

This demonstrates what GPU-bound workloads look like.


Part C: Complete the GPU Monitoring Script

Open scripts/monitorGPU.py. There are three TODOs.

TODO 1: Collect GPU metrics

In the collectGpuMetrics function, extract GPU data using the provided helper:

Note: A helper function getGpuMetrics(stats, gpu) is provided in the starter code. This handles differences between Jetson platforms (older Jetsons, Orin, and Thor).

# TODO: Extract GPU metrics using the helper
gpuUtil, gpuFreq, gpuFreqMax = getGpuMetrics(jetson.stats, jetson.gpu)

TODO 2: Store sample data

# TODO: Create sample dictionary
# Required keys: timestamp, gpuUtil, gpuFreq, gpuFreqPct
sample = {
    'timestamp': timestamp,
    'gpuUtil': gpuUtil,            # GPU utilization (0-100%)
    'gpuFreq': gpuFreq,            # Current frequency (MHz)
    'gpuFreqPct': (gpuFreq / gpuFreqMax * 100) if gpuFreqMax > 0 else 0
}
samples.append(sample)

TODO 3: Create visualization

In the plotGpuMetrics function:

# TODO: Plot GPU utilization and frequency
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 8))

# Plot GPU utilization
ax1.plot(timestamps, gpuUtils, 'g-', linewidth=2, label='GPU Util')
ax1.axhline(y=meanUtil, color='r', linestyle='--', label=f'Mean: {meanUtil:.1f}%')
ax1.set_ylabel('GPU Utilization (%)')
ax1.set_title(f'GPU Performance - {platformName}')
ax1.legend()
ax1.grid(True, alpha=0.3)
ax1.set_ylim([0, 100])

# Plot GPU frequency percentage
ax2.plot(timestamps, freqPcts, 'b-', linewidth=2, label='GPU Freq %')
ax2.axhline(y=meanFreq, color='r', linestyle='--', label=f'Mean: {meanFreq:.1f}%')
ax2.set_xlabel('Time (seconds)')
ax2.set_ylabel('GPU Frequency (% of max)')
ax2.legend()
ax2.grid(True, alpha=0.3)
ax2.set_ylim([0, 100])

Part D: Run the Monitoring Script

Step D1: Collect baseline (idle) metrics

python3 scripts/monitorGPU.py --duration 30 --interval 1.0 --output logs/GPUMetrics.csv

This collects 30 seconds of idle GPU data.

Step D2: Verify output

head -n 5 logs/GPUMetrics.csv

Expected format:

timestamp,gpuUtil,gpuFreq,gpuFreqPct
0.0,0.0,114,5.2
1.0,2.1,114,5.2

Step D3: Check visualization

ls -lh logs/GPUMetrics.png

The plot should show GPU utilization and frequency over time.


Part E: Collect Under GPU Load

Run the monitoring script while simultaneously running a GPU workload:

Terminal 1:

python3 scripts/monitorGPU.py --duration 30 --interval 1.0 --output logs/GPUMetricsLoad.csv

Terminal 2 (start within 5 seconds):

python3 scripts/stressGPU.py --duration 20 --intensity medium

This captures GPU metrics during a compute-intensive workload.

Compare:

head -n 10 logs/GPUMetrics.csv       # Idle
head -n 10 logs/GPUMetricsLoad.csv   # Under load

You should see much higher GPU utilization and frequency under load.


Part F: Analysis and Reflection

Complete reflection.txt with:

  • Platform name.
  • GPU name.
  • Max GPU frequency.
  • Idle GPU average.
  • Load GPU average.
  • Observations during stress.
  • Reflection on identifying GPU bottlenecks.

Part G: Submission

Verify Required Files

ls -lh logs/
ls -lh scripts/monitorGPU.py
ls -lh reflection.txt
ls -lh requirements.txt

Required:

  • logs/GPUMetrics.csv
  • logs/GPUMetrics.png
  • scripts/monitorGPU.py
  • reflection.txt
  • logs/GPUMetricsLoad.csv is optional but recommended

On development Branch: Commit and Push

All work for this lab must be committed to your development branch, not directly on main.

  1. Verify you are on the development branch:
git branch

If needed:

git checkout development
  1. Stage your completed files. You may run git add multiple times as you work:
git add scripts/monitorGPU.py
git add reflection.txt
git add logs/GPUMetrics.csv logs/GPUMetrics.png
  1. Commit your changes once the lab is complete:
git commit -m "Complete Lab 2: GPU Performance Measurement."
  1. Push your development branch to GitHub:
git push origin development
  1. Open a pull request from development to main on GitHub. This pull request is your official submission and must follow the course assignment guidelines.

Do not merge the pull request yourself unless explicitly instructed.


Submission Requirements

To receive credit, you must:

  1. Complete scripts/monitorGPU.py with all TODOs implemented.
  2. Generate logs/GPUMetrics.csv with at least 30 seconds of data.
  3. Generate logs/GPUMetrics.png visualization with both utilization and frequency plots.
  4. Complete reflection.txt with platform info and analysis.
  5. Commit and push all required files.

Evaluation Criteria

You will receive full credit if:

  1. scripts/monitorGPU.py runs successfully on a Jetson device.
  2. logs/GPUMetrics.csv contains valid GPU metrics with timestamp, utilization, and frequency.
  3. logs/GPUMetrics.png shows dual-plot visualization (utilization + frequency).
  4. reflection.txt is filled out with a thoughtful analysis of GPU-bound vs memory-bound.
  5. Code handles missing GPU data gracefully (some Jetson models have limited metrics).
  6. .venv is not committed.

Common Issues

“tegrastats: command not found” or “jtop: command not found”

  • Contact your instructor - these are system tools that must be pre-installed

“Import error: No module named ‘jtop’“

  • Ensure your virtual environment is activated: source .venv/bin/activate
  • Reinstall: pip install jetson-stats

“GPU metrics show [N/A]”

  • Normal on Jetson devices with unified memory architecture
  • Use jtop instead for comprehensive metrics

“GPU utilization always shows 0%”

  • Ensure GPU workload is actually running
  • Check with: jtop and navigate to GPU page
  • Verify CUDA is installed: nvcc --version

“Script works on Orin but not Nano”

  • Different Jetson models have different GPU metrics available
  • Ensure code uses .get() with defaults to handle missing keys

“GPU frequency not scaling up”

  • Check power mode: sudo nvpmodel -q
  • Set to max performance: sudo nvpmodel -m 0
  • Enable jetson_clocks: sudo jetson_clocks

Additional Resources

  • jetson-stats Documentation Official documentation for jetson-stats, including the jtop interactive monitoring tool and programmatic access via Python. This is the primary reference for collecting CPU, GPU, memory, and power metrics on NVIDIA Jetson devices and is well-suited for both interactive exploration and scripted data collection.
  • NVIDIA Jetson Platform Guide Official NVIDIA documentation describing the hardware specifications and architectural details of all Jetson platforms. This resource is useful for understanding differences in CPU core counts, GPU configurations, memory capacity, and performance characteristics across Jetson devices.
  • NVIDIA tegrastats Utility Official NVIDIA documentation for the tegrastats command-line utility, which provides low-level, real-time CPU, GPU, memory, power, and thermal statistics on Jetson devices. This is useful for lightweight monitoring, scripting, and validation when jtop is not available.
  • nvidia-ml-py Documentation Official Python bindings for the NVIDIA Management Library (NVML), which replace the deprecated pynvml package. This library is intended for datacenter and desktop NVIDIA GPUs rather than Jetson platforms, and is useful when collecting GPU utilization and power metrics on systems with discrete GPUs.
  • NVML API Reference Low-level reference documentation for the NVIDIA Management Library. This resource is useful for understanding the underlying GPU metrics exposed through nvidia-ml-py and for advanced monitoring or tooling on non-Jetson systems.
  • CUDA Programming Guide Official CUDA documentation describing the GPU execution model, memory hierarchy, and kernel behavior. This guide helps interpret GPU utilization results and understand how different workloads map to GPU resources.
  • TensorRT Best Practices NVIDIA documentation on optimizing deep learning inference using TensorRT. This resource provides guidance on performance tuning, precision selection, and deployment strategies that can influence GPU utilization and power behavior during inference workloads.

GPU Monitoring Library Note:

For your future work, use the appropriate library for your platform:

  • Jetson devices: Use jetson-stats (jtop)
  • Datacenter/Desktop NVIDIA GPUs: Use nvidia-ml-py (not pynvml, which is deprecated)

If you find older tutorials referencing pynvml, that package is now deprecated.