Building a Temporal CNN for DDoS Detection

Published on 2026-02-18

The capstone started as a straightforward ML project: train a model to detect DDoS attacks. It became something more interesting when I realized the real problem wasn't the model — it was everything that happens before and after the model.

---

The Dataset

BCCC-Cloud-DDoS-2024. 540,000+ network flow records — real DDoS traffic and real benign traffic captured from cloud infrastructure. Not synthetic. Not replayed. Actual attack captures, which means actual noise, actual label ambiguity, actual edge cases.

Packet sizes, inter-arrival times, TCP flag distributions, flow duration statistics, byte ratios, protocol breakdowns. That's the raw representation of a network flow if you extract everything you possibly can.

---

Why Temporal CNN Over LSTM

The obvious choice for sequential network traffic data is LSTM. Most papers use it. Here's why I didn't:

Network flows have variable-length sequences with non-uniform spacing. An LSTM processes these step by step, and on sparse data — flows with few packets but long durations — it either overfits the sparse signal or gets lost in the padding.

A Temporal Convolutional Network uses dilated causal convolutions with fixed receptive fields. It sees the sequence in parallel, the dilation controls how far back it looks, and it's stable on variable-length inputs. For traffic classification where the pattern lives in how packet arrivals cluster over time rather than in long-range dependencies, TCN is the better architectural fit.

Training is also faster. Parallelizable across the sequence dimension, unlike the sequential hidden state updates in LSTM.

---

The Feature Selection Problem

It's actually a liability.

Features derived from the same underlying signal — like multiple byte-count metrics — are correlated. Correlated features don't add information; they add noise and inflate model complexity. A model trained on 317 features memorizes the training distribution instead of learning the signal.

I built an automated feature selection pipeline: variance thresholding to remove near-zero-variance features, then mutual information ranking against attack labels, then recursive elimination with cross-validation. Final result: 32 features out of 317. 90% reduction.

What survived: inter-arrival time statistics, SYN/ACK ratio, packets-per-second ramp rate, flow duration, bytes-per-packet distribution. The features that describe how a flow behaves over time — not just what it looks like at a snapshot.

Training efficiency improved 85%. Detection accuracy held. The 32-feature model generalizes better than the 317-feature model because it's not memorizing noise.

The lesson: feature selection is the work. Not a preprocessing checkbox before the "real" ML begins.

---

Making Alerts Actionable

Detection accuracy at 95%+ sounds like success. It isn't, if a SOC analyst gets an alert that says:

ALERT | flow_id=47293 | label=DDoS | confidence=0.94

That tells them nothing actionable. Which source IPs? What attack vector? What rate? How long?

I integrated the Gemini API into the post-detection layer. The model produces a classification and confidence. A structured prompt feeds the flow's key feature values into Gemini, which outputs a natural-language explanation:

> "Coordinated SYN flood — 47 source IPs, packet rate 400× baseline, sustained 3-minute window targeting port 443. Consistent with volumetric DDoS. Source IPs show no prior legitimate traffic in session history."

The detection finds the attack. The explanation tells you what to do about it.

---

What's Left

Tuning the false positive rate on the 32-feature model. High detection with low false positives is the actual benchmark in a real SOC environment — not accuracy on a balanced test set. Currently profiling which benign traffic patterns the model misclassifies and whether those are fixable with better feature engineering or require a different classification threshold per traffic class.

---

Takeaways

TCN is a better fit than LSTM for variable-length, sparse network flow sequences
Feature selection isn't preprocessing — it's the core engineering decision that determines whether your model generalizes
A detection system that can't explain its output is half a system
False positive rate matters more than accuracy in production security contexts