A04 - Pump Watch - Edge Computing Systems / Spring 2026

Rendering note: The worked examples (in repo EXAMPLES.md) use Mermaid diagrams and LaTeX math blocks. They render natively on GitHub and in VS Code with the appropriate extensions. If diagrams or equations are not rendering correctly, view the files on GitHub.

A municipal water utility operates unmanned pumping stations across the region. Many of these sites have unreliable or expensive connectivity. When a pump fails unexpectedly, the cost is high: emergency dispatches, water service disruptions, and potential equipment damage that could have been caught early.

The utility wants to deploy small AI-capable devices (NVIDIA Jetson) at each station to monitor pump health using sensor telemetry: vibration, motor temperature, and motor current. The system must detect early signs of problems locally and send only alerts and summaries to the cloud, not raw data streams.

Your dataset captures 30 minutes of pump operation at 50 Hz, including three realistic failure scenarios:

Cavitation bursts (360 - 540s): air pockets form in the pump impeller, causing intermittent vibration spikes. If uncaught, cavitation erodes impeller blades over weeks.
Bearing wear ramp (840 - 1200s): a gradually worsening bearing increases vibration and current draw. Early detection avoids a catastrophic bearing seizure.
Overheat step (1440 - 1620s): motor temperature rises suddenly, perhaps from a blocked cooling vent or overload. This can trigger thermal shutdown within minutes.

In this assignment, you will build a normal-only anomaly detector for the PumpWatch system using a tiny autoencoder trained on healthy operation data. You will implement windowed feature extraction, train the model, choose a detection threshold, and produce a plot that shows when your detector flags each failure event.

Objective and Expected Learning Outcomes

By completing this assignment, you will be able to:

Extract windowed features (mean and standard deviation) from time-series sensor data.
Train a tiny autoencoder on normal-only data using PyTorch.
Use reconstruction error (MSE) as an anomaly score.
Select a detection threshold using percentiles of training scores.
Produce a clear anomaly detection plot with ground truth comparison.
Explain detection timing, false positives, and false negatives.

What You Are Given

Your repository includes:

data/pumpWatchTelemetry_30min_50Hz.csv – 30 minutes of pump telemetry at 50 Hz
data/pumpWatchTelemetry_manifest.json – ground truth anomaly intervals and dataset metadata
scripts/detectAnomalies.py – starter code with 4 TODOs
requirements.txt – Python dependencies
reflection.txt – reflection questions to answer after running

Rules

Follow course guidelines (on the course website) about working on the development branch.
Keep commits small and meaningful, and ensure your pull request description clearly summarizes the changes and intent.
Do not commit .venv or any virtual environment directory.

Setup

Create a virtual environment and install dependencies:

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Dataset

The telemetry CSV contains 50 Hz samples over 30 minutes (~90,000 rows) with three sensor channels: vibration, motor temperature proxy, and motor current proxy.

The manifest defines three anomaly events:

Event	Start (s)	End (s)	Pattern
Cavitation bursts	360	540	Intermittent (2s on, 6s off)
Bearing wear ramp	840	1200	Gradual drift
Overheat step	1440	1620	Sudden step

The recommended training window is 0 to 330 seconds (normal-only, before first anomaly).

Step-by-Step Walkthrough

Part A: Feature Extraction (TODO 1)

Open scripts/detectAnomalies.py and implement windowFeatures().

Each window is a (windowSize, 3) array of sensor values. Compute:

Mean of each channel (3 values)
Standard deviation of each channel (3 values)

This produces a 6-dimensional feature vector per window.

meanVals = window.mean(axis=0)
stdVals = window.std(axis=0) + 1e-6
feats = np.concatenate([meanVals, stdVals], axis=0).astype(np.float32)

Concept Diagram

flowchart LR
    A["Sliding Window<br/>(windowSize × 3)<br/>Raw Sensor Data"]
    B["vibration<br/>Mean + Std"]
    C["motorTempProxy<br/>Mean + Std"]
    D["motorCurrentProxy<br/>Mean + Std"]
    E["Feature Vector<br/>(6 values)<br/>[μ1, μ2, μ3, σ1, σ2, σ3]"]

    A --> B
    A --> C
    A --> D

    B --> E
    C --> E
    D --> E

Numerical Example

Assume windowSize = 4 and 3 sensor channels:

vibration
motorTempProxy
motorCurrentProxy

So one window has the shape (4, 3).

Fake Sensor Window

time step	vibration	motorTempProxy	motorCurrentProxy
1	1.0	10.0	100.0
2	2.0	12.0	110.0
3	3.0	11.0	90.0
4	4.0	13.0	95.0

As a NumPy array:

window = np.array([
    [1.0, 10.0, 100.0],
    [2.0, 12.0, 110.0],
    [3.0, 11.0,  90.0],
    [4.0, 13.0,  95.0],
], dtype=np.float32)

Step 1: Mean of Each Channel

Compute the mean across the time dimension (axis=0).

Vibration mean:

\[\mu_1 = (1 + 2 + 3 + 4) / 4 = 2.5\]

Temperature mean:

\[\mu_2 = (10 + 12 + 11 + 13) / 4 = 11.5\]

Current mean:

\[\mu_3 = (100 + 110 + 90 + 95) / 4 = 98.75\]

meanVals = [2.5, 11.5, 98.75]

Step 2: Standard Deviation of Each Channel

NumPy uses population standard deviation by default (ddof=0):

\[\sigma = \sqrt{\frac{1}{N} \sum (x_i - \mu)^2}\]

Vibration standard deviation:

\[\sigma_1 = \sqrt{1.25} \approx 1.1180\]

Temperature standard deviation:

\[\sigma_2 = \sqrt{1.25} \approx 1.1180\]

Current standard deviation:

\[\sigma_3 = \sqrt{54.6875} \approx 7.3951\]

stdVals = [1.1180, 1.1180, 7.3951]

Step 3: Final 6D Feature Vector

The final feature vector is the concatenation:

[ mean(vibration), mean(temp), mean(current),
  std(vibration),  std(temp),  std(current) ]

Numerically:

features =
[2.5, 11.5, 98.75, 1.1180, 1.1180, 7.3951]

Corresponding Code

meanVals = window.mean(axis=0)
stdVals = window.std(axis=0) + 1e-6
feats = np.concatenate([meanVals, stdVals], axis=0).astype(np.float32)

Common Mistakes to Avoid

Using axis=1 instead of axis=0. axis=0 computes the mean/std across time steps (down each column). axis=1 would compute across the 3 channels per time step, which is not what you want.
Forgetting to concatenate mean and std into a single vector. The output must be a 1D array of length 6, not two separate arrays or a (2, 3) matrix.
Omitting + 1e-6 on the standard deviation. Without epsilon, a perfectly constant channel produces std = 0, which causes division-by-zero during normalization in TODO 2.

Part B: Training Data Selection and Normalization (TODO 2)

Select normal-only windows from the pre-anomaly period and normalize features using statistics computed from training windows only.

Training windows must satisfy both:

tSec <= trainEndSec
isAnomaly == 0

Then:

Build trainFeat from those windows only
Compute mu and sigma from trainFeat
Normalize training windows and all windows using:

trainXn = (trainFeat - mu) / (sigma + 1e-6)
allXn   = (allFeat - mu) / (sigma + 1e-6)

Concept Diagram

flowchart LR
    A["All windows + features<br/>(tSec, isAnomaly, 6D feats)"]
    B["Filter: tSec <= trainEndSec"]
    C["Filter: isAnomaly == 0"]
    D["Training feature matrix<br/>(Ntrain × 6)"]
    E["Compute mu, sigma"]
    F["Normalize ALL windows<br/>(training stats only)"]

    A --> B --> C --> D --> E --> F

Tiny Example: Filtering Training Windows

Imagine your per-window DataFrame looks like this:

windowId	tSec	isAnomaly
0	10	0
1	20	0
2	340	0
3	380	1

If trainEndSec = 330, then training rows must satisfy:

tSec <= 330 and isAnomaly == 0

So windows 0 and 1 are training windows. Windows 2 and 3 are excluded from training.

Tiny Example: Normalization Numbers

Suppose training windows have 6D features:

trainFeat =
[ 2.0, 10.0, 100.0, 1.0, 0.5, 5.0 ]
[ 4.0, 14.0, 120.0, 3.0, 1.5, 7.0 ]

Compute per-feature mean and std:

mu = [3.0, 12.0, 110.0, 2.0, 1.0, 6.0]
sigma = [1.0, 2.0, 10.0, 1.0, 0.5, 1.0]

Normalize a sample window:

x = [4.0, 10.0, 100.0, 1.0, 1.5, 5.0]

xNorm = (x - mu) / sigma
      = [ 1, -1, -1, -1, 1, -1 ]

Common Mistakes to Avoid

Do not compute normalization stats from the whole dataset. That leaks anomaly information into training.
Always add epsilon when dividing:
- sigma = sigma + 1e-6

Part C: Threshold Selection via Training Percentile (TODO 3)

After training, compute reconstruction MSE for training windows. Use a high percentile of training scores as the anomaly threshold.

threshold = float(np.percentile(trainMse, cfg.percentile))
predicted = (allMse > threshold).astype(np.int32)

Concept Diagram

flowchart LR
    A["Training windows (normal only)"]
    B["Autoencoder reconstructs features"]
    C["Compute MSE per window<br/>(trainScores)"]
    D["threshold = percentile(trainScores, 99.5)"]
    E["Predict anomaly if score > threshold"]

    A --> B --> C --> D --> E

Numerical Example: Percentile Threshold

Suppose your training reconstruction errors are:

trainScores = [0.08, 0.10, 0.11, 0.09, 0.12, 0.10, 0.13, 0.09]

A 99.5th percentile threshold will be near the maximum value, meaning:

Most normal windows stay below the threshold
Only very unusual behavior gets flagged

In code:

threshold = float(np.percentile(trainMse, cfg.percentile))

Prediction rule:

predicted = (allMse > threshold).astype(np.int32)

Why This Works

Because training is normal-only, the training score distribution represents:

“How weird normal can look.”

A high percentile says:

“Flag something only if it is worse than almost all normal windows.”

Common Mistakes to Avoid

Do not compute the threshold from all windows. It must come from training windows only.
Ensure you use the same MSE definition for training and full scoring.

Part D: Save Outputs and Generate Plot (TODO 4)

Save the per-window anomaly scores to CSV and generate the final plot.

The starter code builds outDf with these columns:

Column	Description
`tSec`	Window center time
`mse`	Reconstruction MSE (anomaly score)
`threshold`	Detection threshold (same value for every row)
`predictedAnomaly`	0 or 1
`isAnomaly`	Ground truth label

Save it with outDf.to_csv(cfg.outputCsv, index=False) and call makePlot(outDf, manifest, cfg.outputPlot).

Concept Diagram

flowchart LR
    A["All windows + normalized features"]
    B["Autoencoder reconstruction"]
    C["Score each window (MSE)"]
    D["predicted = (mse > threshold)"]
    E["Save logs/anomalyScores.csv"]
    F["makePlot() -> logs/anomalyDetection.png"]

    A --> B --> C --> D --> E
    D --> F

What Your Output CSV Should Contain

The starter code builds outDf with these columns:

Column	Description
`tSec`	Window center time
`mse`	Reconstruction MSE (anomaly score)
`threshold`	Detection threshold (same value for every row)
`predictedAnomaly`	0 or 1
`isAnomaly`	Ground truth label

Example rows:

tSec	mse	threshold	predictedAnomaly	isAnomaly
100.0	0.11	0.30	0	0
361.0	0.80	0.30	1	1
900.0	0.35	0.30	1	1

Tiny Example: Turning Scores into Predictions

If:

mse = 0.42
threshold = 0.30

Then:

predictedAnomaly = 1 because 0.42 > 0.30

That single rule is the entire anomaly detector.

Common Mistakes to Avoid

Forgetting to create the output directory (logs/)
Saving a CSV without tSec (making plots hard to interpret)
Generating the plot but not saving it to the required path
Never calling makePlot() in the script

Running

mkdir -p logs

python3 scripts/detectAnomalies.py \
  --csv data/pumpWatchTelemetry_30min_50Hz.csv \
  --manifest data/pumpWatchTelemetry_manifest.json \
  --outputCsv logs/anomalyScores.csv \
  --outputPlot logs/anomalyDetection.png

What Your Plot Must Show

Your anomaly detection plot must include:

Anomaly score (reconstruction MSE) over time
Threshold line
Predicted anomaly regions are clearly marked
Ground truth anomaly intervals from the manifest

A viewer must be able to answer: what happened, when it started, and when your detector flagged it.

Submission

Verify Required Files

Before committing, verify that your script produced the required outputs:

ls -lh logs/
ls -lh scripts/detectAnomalies.py
ls -lh reflection.txt
ls -lh requirements.txt

Required files:

logs/anomalyScores.csv
logs/anomalyDetection.png
scripts/detectAnomalies.py
reflection.txt

On `development` Branch: Commit and Push

All work for this lab must be committed to your development branch, not directly on main.

Verify you are on the development branch:

git branch

If needed:

git checkout development

Stage your completed files:

git add scripts/detectAnomalies.py
git add reflection.txt
git add logs/anomalyScores.csv logs/anomalyDetection.png

Commit your changes once the lab is complete:

git commit -m "Complete PumpWatch anomaly detection."

Push your development branch to GitHub:

git push origin development

Open a pull request from development to main on GitHub.
This pull request is your official submission and must follow the course assignment guidelines.

Do not merge the pull request yourself unless explicitly instructed.

Evaluation Criteria

You receive full credit if:

All 4 TODOs are correctly implemented.
The script runs without errors and produces valid outputs.
Your detection plot clearly shows scores, threshold, predictions, and ground truth.
Reflection answers include specific numbers and timestamps.
Your work is committed correctly, and .venv is not committed.

Optional: Try a Second Dataset

Once your detector works on the primary dataset, you can test it on a second telemetry file that contains different failure types. No code changes are needed — just swap the --csv and --manifest arguments:

python3 scripts/detectAnomalies.py \
  --csv data/pumpWatchTelemetry2_30min_50Hz.csv \
  --manifest data/pumpWatchTelemetry2_manifest.json \
  --outputCsv logs/anomalyScores2.csv \
  --outputPlot logs/anomalyDetection2.png

This dataset has the same format (30 min, 50 Hz, same sensor channels) but simulates three different pump problems:

Event	Start (s)	End (s)	Pattern
Impeller imbalance	360	540	Harmonic vibration ramp-up then sustained
Seal degradation	780	1020	Gradual current drop + vibration/temp rise
Dry run	1320	1500	Sharp temp spike + erratic vibration + current drop

Your detector retrains on the new dataset’s normal period and evaluates against its anomalies. Compare how well it detects these different failure modes versus the original three — do some patterns produce higher scores? Are any harder to catch?

Additional Resources

PyTorch Documentation Official documentation for PyTorch. Useful for understanding tensors, training loops, and model definitions.
PyTorch: nn.Module Reference for how PyTorch models are structured.
NumPy: percentile The authoritative reference for computing percentile-based anomaly thresholds.
Pandas Documentation Useful for inspecting, filtering, and saving the per-window anomaly score output.
Matplotlib Documentation Reference for generating plots and saving them to PNG.
Autoencoders (Concept Overview) A high-level explanation of what autoencoders are and why reconstruction error can be used for anomaly detection.

Objective and Expected Learning Outcomes

What You Are Given

Rules

Setup

Dataset

Step-by-Step Walkthrough

Part A: Feature Extraction (TODO 1)

Concept Diagram

Numerical Example

Fake Sensor Window

Step 1: Mean of Each Channel

Step 2: Standard Deviation of Each Channel

Step 3: Final 6D Feature Vector

Corresponding Code

Common Mistakes to Avoid

Part B: Training Data Selection and Normalization (TODO 2)

Concept Diagram

Tiny Example: Filtering Training Windows

Tiny Example: Normalization Numbers

Common Mistakes to Avoid

Part C: Threshold Selection via Training Percentile (TODO 3)

Concept Diagram

Numerical Example: Percentile Threshold

Why This Works

Common Mistakes to Avoid

Part D: Save Outputs and Generate Plot (TODO 4)

Concept Diagram

What Your Output CSV Should Contain

Tiny Example: Turning Scores into Predictions

Common Mistakes to Avoid

Running

What Your Plot Must Show

Submission

Verify Required Files

On development Branch: Commit and Push

Evaluation Criteria

Optional: Try a Second Dataset

Additional Resources

On `development` Branch: Commit and Push