Notes from Security Internships

Published on 2025-10-15

Three internships, three domains: identity and access management, cloud security, network intrusion detection. Each one taught me something about security that I couldn't have learned from a textbook or a lab environment — because real systems are messier, more constrained, and more interesting than anything designed for learning.

---

Identity Automation — iMerit

The problem wasn't technical at the start. It was organizational: Ivanti, Google Workspace, and Active Directory were all supposed to reflect the same state of who works at the company and what they have access to. They didn't. Employees who had left still had active accounts in one system. New hires were provisioned in two of three systems and waiting on the third. Access modifications happened manually, which meant they happened inconsistently.

I built Python pipelines that automated the full identity lifecycle — provision on hire, modify on role change, deprovision on exit — across all three systems via REST APIs. The pipelines didn't just execute actions; they checked consistency first: if the state in one system diverged from the source of truth, the pipeline flagged it before making changes.

Result: Onboarding time cut ~60%. Access errors dropped significantly. More importantly, access state became auditable — you could query it and trust the answer.

What I learned: RBAC looks clean in theory. In practice, it has years of accumulated exceptions, role sprawl, and legacy permissions that nobody wants to touch in case something breaks. Automation forces you to confront the actual state of the system, not the intended state.

---

Cloud Security — Invisbl

The goal was to make AWS IAM align with NIST 800-53 AC controls and the CIS AWS Foundations Benchmark. That means: least privilege, separation of duties, no wildcard permissions in production, MFA everywhere, no root API keys in use.

Mapping the current state to those controls required building detection first. I built pipelines using Python, Lambda, and Elasticsearch that continuously flagged:

Privilege escalation events (a role gaining permissions it didn't have yesterday)
Configuration drift (security group rules changing outside the change management process)
Access anomalies (API calls from unexpected regions, service accounts used outside their intended scope)

The detection pipeline didn't just log events — it enriched them with context: which IAM principal, what action, what resource, when, from where, and whether this behavior had precedent in the account's history.

What I learned: Cloud security is fundamentally a logging and detection problem. The attack surface — IAM policies, service roles, cross-account trust — is too large to manually audit. You need continuous visibility. And visibility requires that your logs are structured, centralized, and queryable. If you can't search your logs in under 30 seconds, you don't have visibility; you have a pile of data.

Also: CIS benchmarks are a floor, not a ceiling. They tell you what not to do. They don't tell you what your threat model actually requires.

---

Network Intrusion Detection — Anna University

This was research, not a production system — but the constraints were real. The dataset was UNSW-NB15: 2.5 million network flow records, 9 attack categories (DoS, exploits, reconnaissance, brute force, backdoor, shellcode, worms, fuzzers, generic). The task was multi-class classification from flow-level features.

I built a hybrid LSTM-CNN architecture. The LSTM captured sequential behavior across packets in a flow; the CNN extracted local feature patterns. Together they handled both the temporal dynamics and the feature interactions that matter for distinguishing, say, a reconnaissance scan from a brute force attempt.

The interesting work was feature selection. 42 features after initial processing. Many were correlated — multiple byte count metrics measuring the same underlying signal. I used a two-stage metaheuristic pipeline: Sine Cosine Algorithm for global search across the feature space, then Particle Swarm Optimization for local refinement. Result: 42 → 18 features. False positive rate down 25%. Detection latency improved.

The insight that stayed with me: SCA and PSO are from completely different domains — swarm intelligence, physics-inspired optimization — and they solved a network security feature selection problem better than manual domain-guided selection. The abstraction of "search over a feature subset space" doesn't care where the search algorithm came from.

What I learned: More features is not better signal. Most features in high-dimensional datasets are correlated noise. The model that trains on 18 carefully selected features generalizes better than the model that trains on all 42, because it's learning the signal instead of memorizing the noise distribution. Feature selection is the work, not a step before the work.

---

The Thread Across All Three

Every one of these problems was fundamentally about knowing the actual state of a system — not its intended state, its documented state, or its theoretical state. The IAM system that's supposed to be clean but has years of drift. The AWS account that's supposed to follow least privilege but has accumulated exceptions. The network that's supposed to be normal but has flows that don't match any known-good behavior.

Security is what happens when the gap between intended state and actual state gets exploited. The job is to close that gap, and to detect when it opens again.