A03 - GPU Performance Measurement
Assignment: GitHub Classroom
Late Policy
- You have until the assigned due date, after that you will receive 0 points.
Edge AI workloads rely heavily on GPU acceleration for neural network inference. Unlike datacenter GPUs (A100, H100), NVIDIA Jetson devices have integrated GPUs with unified memory architecture, where the GPU and CPU share the same RAM.
You will:
- Use
tegrastatsto query GPU metrics - Use
jtopto monitor GPU utilization visually - Collect GPU metrics during a synthetic workload
- Distinguish between GPU utilization and memory utilization
- Identify whether a workload is GPU-bound or memory-bound
These skills are essential for optimizing edge AI inference pipelines and understanding GPU bottlenecks.
Objective and Expected Learning Outcomes
By completing this assignment, you will be able to:
- Use
tegrato query GPU metrics on Jetson devices. - Monitor GPU utilization with
jtopduring workloads. - Collect GPU metrics programmatically using Python.
- Distinguish between GPU compute utilization and memory bandwidth utilization.
- Identify GPU bottlenecks in edge inference workloads.
- Understand unified memory architecture on edge SoCs.
Edge Devices
This lab must be completed on an NVIDIA Jetson device:
- Jetson Nano (4GB or 2GB)
- Jetson Xavier NX
- Jetson AGX Xavier
- Jetson AGX Orin
- Jetson Orin Nano
- Jetson Orin NX
- Jetson Thor (if available)
Your results will vary based on GPU cores, memory bandwidth, and CUDA cores available on your device.
What You Are Given
Your repository includes:
-
scripts/monitorGPU.py(has TODOs for GPU monitoring) -
scripts/stressGPU.py(GPU workload generator - complete, no edits needed) -
requirements.txt(placeholder) -
reflection.txt(reflection template)
Rules
- Follow course guidelines about working on the development branch!
- Do all work on your Jetson device (Nano, Orin, Thor, etc.).
- You must commit
logs/GPUMetrics.csv. - You must commit
logs/GPUMetrics.png(visualization). - Do not commit
.venv. - Ensure your code handles missing GPU data gracefully (some metrics not available on all Jetsons).
Prerequisites
Your instructor has pre-installed system tools (jtop, tegrastats). You will set up your Python environment.
Step 1: Verify System Tools
# Verify system tools are installed
jtop --version
tegrastats
If these fail, contact your instructor.
Step 2: Create Python Virtual Environment
# Create virtual environment
python3 -m venv .venv
# Activate it
source .venv/bin/activate
# Upgrade build tools first
pip install --upgrade pip setuptools wheel
# Install dependencies
pip install -r requirements.txt
Step 3: Verify Python Packages
python3 -c "from jtop import jtop; print('jtop OK')"
python3 -c "import matplotlib; print('matplotlib OK')"
python3 -c "import numpy; print('numpy OK')"
Instructions
Follow the steps below carefully and in order.
Part A: Verify GPU Access
Before collecting metrics, verify your Jetson GPU is accessible.
Step A1: Check tegrastats
tegrastats
You should see output showing:
- GPU name (e.g., “NVIDIA Tegra Orin”)
- GPU utilization
- Memory usage
- Temperature (may not be available)
Note: Some fields may show [N/A] on Jetson devices due to unified memory architecture.
Step A2: Check jtop GPU page
jtop
Press 2 or navigate to the GPU page. You should see:
- GPU utilization percentage
- GPU frequency (current and max)
- GPU memory usage (shared with system RAM)
Press q to quit.
Part B: Observe GPU Metrics Interactively
Step B1: Observe idle GPU
While jtop is running with no workloads:
- Note idle GPU% (should be 0% or very low)
- Note GPU frequency (may be throttled down when idle)
- Note memory usage
Step B2: Create a GPU workload
Open a second terminal and run:
python3 scripts/stressGPU.py --duration 30 --intensity medium
Observe in jtop:
- Does GPU% increase to 80-100%?
- Does GPU frequency increase to maximum?
- Does temperature rise (if data is available)?
- Does memory usage increase?
This demonstrates what GPU-bound workloads look like.
Part C: Complete the GPU Monitoring Script
Open scripts/monitorGPU.py. There are three TODOs.
TODO 1: Collect GPU metrics
In the collectGpuMetrics function, extract GPU data using the provided helper:
Note: A helper function getGpuMetrics(stats, gpu) is provided in the starter code. This handles differences between Jetson platforms (older Jetsons, Orin, and Thor).
# TODO: Extract GPU metrics using the helper
gpuUtil, gpuFreq, gpuFreqMax = getGpuMetrics(jetson.stats, jetson.gpu)
TODO 2: Store sample data
# TODO: Create sample dictionary
# Required keys: timestamp, gpuUtil, gpuFreq, gpuFreqPct
sample = {
'timestamp': timestamp,
'gpuUtil': gpuUtil, # GPU utilization (0-100%)
'gpuFreq': gpuFreq, # Current frequency (MHz)
'gpuFreqPct': (gpuFreq / gpuFreqMax * 100) if gpuFreqMax > 0 else 0
}
samples.append(sample)
TODO 3: Create visualization
In the plotGpuMetrics function:
# TODO: Plot GPU utilization and frequency
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 8))
# Plot GPU utilization
ax1.plot(timestamps, gpuUtils, 'g-', linewidth=2, label='GPU Util')
ax1.axhline(y=meanUtil, color='r', linestyle='--', label=f'Mean: {meanUtil:.1f}%')
ax1.set_ylabel('GPU Utilization (%)')
ax1.set_title(f'GPU Performance - {platformName}')
ax1.legend()
ax1.grid(True, alpha=0.3)
ax1.set_ylim([0, 100])
# Plot GPU frequency percentage
ax2.plot(timestamps, freqPcts, 'b-', linewidth=2, label='GPU Freq %')
ax2.axhline(y=meanFreq, color='r', linestyle='--', label=f'Mean: {meanFreq:.1f}%')
ax2.set_xlabel('Time (seconds)')
ax2.set_ylabel('GPU Frequency (% of max)')
ax2.legend()
ax2.grid(True, alpha=0.3)
ax2.set_ylim([0, 100])
Part D: Run the Monitoring Script
Step D1: Collect baseline (idle) metrics
python3 scripts/monitorGPU.py --duration 30 --interval 1.0 --output logs/GPUMetrics.csv
This collects 30 seconds of idle GPU data.
Step D2: Verify output
head -n 5 logs/GPUMetrics.csv
Expected format:
timestamp,gpuUtil,gpuFreq,gpuFreqPct
0.0,0.0,114,5.2
1.0,2.1,114,5.2
Step D3: Check visualization
ls -lh logs/GPUMetrics.png
The plot should show GPU utilization and frequency over time.
Part E: Collect Under GPU Load
Run the monitoring script while simultaneously running a GPU workload:
Terminal 1:
python3 scripts/monitorGPU.py --duration 30 --interval 1.0 --output logs/GPUMetricsLoad.csv
Terminal 2 (start within 5 seconds):
python3 scripts/stressGPU.py --duration 20 --intensity medium
This captures GPU metrics during a compute-intensive workload.
Compare:
head -n 10 logs/GPUMetrics.csv # Idle
head -n 10 logs/GPUMetricsLoad.csv # Under load
You should see much higher GPU utilization and frequency under load.
Part F: Analysis and Reflection
Complete reflection.txt with:
- Platform name.
- GPU name.
- Max GPU frequency.
- Idle GPU average.
- Load GPU average.
- Observations during stress.
- Reflection on identifying GPU bottlenecks.
Part G: Submission
Verify Required Files
ls -lh logs/
ls -lh scripts/monitorGPU.py
ls -lh reflection.txt
ls -lh requirements.txt
Required:
logs/GPUMetrics.csvlogs/GPUMetrics.pngscripts/monitorGPU.pyreflection.txt-
logs/GPUMetricsLoad.csvis optional but recommended
On development Branch: Commit and Push
All work for this lab must be committed to your development branch, not directly on main.
- Verify you are on the
developmentbranch:
git branch
If needed:
git checkout development
- Stage your completed files. You may run
git addmultiple times as you work:
git add scripts/monitorGPU.py
git add reflection.txt
git add logs/GPUMetrics.csv logs/GPUMetrics.png
- Commit your changes once the lab is complete:
git commit -m "Complete Lab 2: GPU Performance Measurement."
- Push your
developmentbranch to GitHub:
git push origin development
- Open a pull request from
developmenttomainon GitHub. This pull request is your official submission and must follow the course assignment guidelines.
Do not merge the pull request yourself unless explicitly instructed.
Submission Requirements
To receive credit, you must:
- Complete
scripts/monitorGPU.pywith all TODOs implemented. - Generate
logs/GPUMetrics.csvwith at least 30 seconds of data. - Generate
logs/GPUMetrics.pngvisualization with both utilization and frequency plots. - Complete
reflection.txtwith platform info and analysis. - Commit and push all required files.
Evaluation Criteria
You will receive full credit if:
-
scripts/monitorGPU.pyruns successfully on a Jetson device. -
logs/GPUMetrics.csvcontains valid GPU metrics with timestamp, utilization, and frequency. -
logs/GPUMetrics.pngshows dual-plot visualization (utilization + frequency). -
reflection.txtis filled out with a thoughtful analysis of GPU-bound vs memory-bound. - Code handles missing GPU data gracefully (some Jetson models have limited metrics).
-
.venvis not committed.
Common Issues
“tegrastats: command not found” or “jtop: command not found”
- Contact your instructor - these are system tools that must be pre-installed
“Import error: No module named ‘jtop’“
- Ensure your virtual environment is activated:
source .venv/bin/activate - Reinstall:
pip install jetson-stats
“GPU metrics show [N/A]”
- Normal on Jetson devices with unified memory architecture
- Use jtop instead for comprehensive metrics
“GPU utilization always shows 0%”
- Ensure GPU workload is actually running
- Check with:
jtopand navigate to GPU page - Verify CUDA is installed:
nvcc --version
“Script works on Orin but not Nano”
- Different Jetson models have different GPU metrics available
- Ensure code uses
.get()with defaults to handle missing keys
“GPU frequency not scaling up”
- Check power mode:
sudo nvpmodel -q - Set to max performance:
sudo nvpmodel -m 0 - Enable jetson_clocks:
sudo jetson_clocks
Additional Resources
- jetson-stats Documentation Official documentation for jetson-stats, including the jtop interactive monitoring tool and programmatic access via Python. This is the primary reference for collecting CPU, GPU, memory, and power metrics on NVIDIA Jetson devices and is well-suited for both interactive exploration and scripted data collection.
- NVIDIA Jetson Platform Guide Official NVIDIA documentation describing the hardware specifications and architectural details of all Jetson platforms. This resource is useful for understanding differences in CPU core counts, GPU configurations, memory capacity, and performance characteristics across Jetson devices.
- NVIDIA tegrastats Utility Official NVIDIA documentation for the tegrastats command-line utility, which provides low-level, real-time CPU, GPU, memory, power, and thermal statistics on Jetson devices. This is useful for lightweight monitoring, scripting, and validation when jtop is not available.
- nvidia-ml-py Documentation Official Python bindings for the NVIDIA Management Library (NVML), which replace the deprecated pynvml package. This library is intended for datacenter and desktop NVIDIA GPUs rather than Jetson platforms, and is useful when collecting GPU utilization and power metrics on systems with discrete GPUs.
- NVML API Reference Low-level reference documentation for the NVIDIA Management Library. This resource is useful for understanding the underlying GPU metrics exposed through nvidia-ml-py and for advanced monitoring or tooling on non-Jetson systems.
- CUDA Programming Guide Official CUDA documentation describing the GPU execution model, memory hierarchy, and kernel behavior. This guide helps interpret GPU utilization results and understand how different workloads map to GPU resources.
- TensorRT Best Practices NVIDIA documentation on optimizing deep learning inference using TensorRT. This resource provides guidance on performance tuning, precision selection, and deployment strategies that can influence GPU utilization and power behavior during inference workloads.
GPU Monitoring Library Note:
For your future work, use the appropriate library for your platform:
- Jetson devices: Use
jetson-stats(jtop) - Datacenter/Desktop NVIDIA GPUs: Use
nvidia-ml-py(not pynvml, which is deprecated)
If you find older tutorials referencing pynvml, that package is now deprecated.
