🖥️Building a Real Lab: VLANs, DNS Control, and Domestic SLAs

How I built an enterprise-style home lab with VLAN segmentation, conditional DNS forwarding, WireGuard ingress, and observability - and what broke along the way.

For privacy and security reasons, VLAN names and certain domain details have been altered or obfuscated. The architecture, however, is functionally accurate.

Over the past year, I rebuilt my home lab to better reflect the kind of networks I work with regularly - segmented environments, defined ingress points, and enough observability to catch things when they break. This isn’t a post about specs or rack lighting. It’s a practical write-up on what I built, what went wrong, and what turned out to be worth doing properly.

Network Segmentation: More Than Just VLANs

Challenge: I needed to enforce clear boundaries between device types. Media services shouldn't be able to discover IoT devices. Infrastructure systems shouldn't assume the workstation VLAN is trusted. The aim was to simulate the kind of trust separation you’d expect in any reasonably cautious environment.

What I did:

I deployed VLANs across a pair of Cisco Catalyst 9200 switches, trunked via 802.1Q back to a Mikrotik RB5009 acting as the central L3 router and firewall. VLANs were carved out to mirror realistic segmentation you'd see in a small-scale enterprise setup:
- VLAN 10 – Infra: Proxmox, hypervisor interfaces, DNS
- VLAN 20 – Media: Jellyfin, Sonarr, DLNA, AirPlay
- VLAN 30 – IoT: Smart home devices and bridged automations, isolated
- VLAN 40 – VPN Ingress: WireGuard zone, no lateral trust
- VLAN 50 – Clients: Laptops, mobile devices, trusted clients
- VLAN 60 – Management: Switch/router GUIs, SSH access, Proxmox API
- VLAN 70 – VoIP: SIP lab setups, isolated call routing
- VLAN 80 – Guest: Internet-only, filtered via DNS sinkholing
- VLAN 90 – Dev/Test: Scratchpad for short-lived services
- VLAN 100 – Backup: NAS sync, snapshot transfers, rsync jobs
- VLAN 110 – Honeypot: A sink for traffic that shouldn’t exist. Alerting on PCAP identified packets.
Routing, filtering, and NAT were all handled at the Mikrotik edge with L3 firewalling rather than trusting the switches.

What went wrong:

DHCP failover drift: Transitioning from a Vigor3900 to the RB5009 caused issues - stale leases and mismatched gateways lingered because VRRP didn’t cleanly hand off state.
Overhelpful IGMP proxy: mDNS and discovery traffic leaked between VLANs thanks to aggressive IGMP proxy settings. Printers and TVs showed up where they shouldn't have.
Jumbo frame trouble: VLAN 100 suffered packet loss during heavy backup windows. Root cause was older Catalyst stack members choking on unaligned MTUs and exhausting buffer pools.
DNS recursion headaches: Overlapping static records created caching mismatches. Unbound’s recursive logic didn’t play nice with mixed A and CNAME zones, especially under .lan.beachill.net.
Silent mis-tagging in Proxmox: Initially, VMs bridged into the wrong VLANs because bridge-vlan-aware wasn’t enabled and tagging was inconsistent across trunks.

Fixes:

Enabled bridge-vlan-aware yes in Proxmox and reworked bridge configs to respect tagging. VM NICs now bind cleanly to expected trunks.
The RB5009 now owns DHCP across all VLANs with centralised lease tracking and logs.
mDNS and LLMNR are now explicitly blocked between VLANs. An old Sky+ HD box was not happy to have it's LLDP and mDNS blocked, so was removed and replaced with a FreeSat box instead.
Cleaned up Unbound zones and introduced a more predictable naming pattern under .lan.beachill.net to avoid recursive overlaps.

DNS Architecture: Conditional Forwarding with Control

Goal: Build an internal DNS setup that didn’t leak queries externally, could resolve internal zones using .lan.beachill.net (a real domain, internally scoped), and gave visibility into what each VLAN was doing. Nothing fancy - just reliable resolution, clean separation, and auditability.

What I deployed:

AdGuard Home: Handles client-level filtering and logs, with upstream queries forwarded to Unbound. It runs on its own VM, with VLAN-specific access rules.
Unbound: Runs as the recursive resolver and handles all internal DNS logic. It serves .lan.beachill.net zones directly via local-zone rules.

Issue #1: Local zones wouldn’t resolve from IoT VLAN. Unbound was only bound to loopback by default. So IoT VLAN clients hit AdGuard, but the AdGuard → Unbound hop failed. It was an easy fix - bound Unbound to all internal interfaces and explicitly allowed each subnet under access-control:.

Issue #2: AdGuard forwarding was slow. AdGuard started timing out on upstream lookups, particularly during DHCP storms or when media boxes woke up. Running dig +trace showed the delay was always in the upstream hop. Switched Unbound to be the direct resolver for critical services (infra, backup VLANs), and kept AdGuard as a filter for general and IoT clients.

What worked well:

Used tcpdump to verify IoT clients were hitting AdGuard on port 53 and nowhere else. Blocked all direct DNS to WAN via Mikrotik firewall, and confirmed nothing slipped through to 1.1.1.1 or 9.9.9.9.
Set up Grafana dashboards to break down query types and volume per VLAN. The volume of rubbish traffic from Firesticks, TVs, and Alexa devices is staggering. Easily 50x more than regular clients - most of it pointless telemetry or ads!

Remote Access: WireGuard with Explicit Routing

Target: Enable remote access to key internal services like Jellyfin, NAS, and Grafana — without falling into the “just VPN to everything” trap. The goal was to apply a basic Zero Trust model: no assumed access, explicit route scopes, and isolation of VPN ingress from trusted VLANs.

Setup:

WireGuard server lives in VLAN 40, which is treated like an external ingress zone. It gets no special treatment internally.
Clients are assigned scoped AllowedIPs - only routes relevant to their use case (e.g. media VLAN or infra subnet) are pushed.
NAT rules on the RB5009 only allow specific ports per peer, rate-limited with connection tracking to throttle brute attempts or noisy devices.

Peer identity is enforced by public key, and every peer has its own firewall and routing profile — nothing flat, nothing assumed.

Problem:

Clients could connect, but hostname resolution failed - nas01.lan.beachill.net and similar names wouldn’t resolve.

Fix:

Pushed internal DNS config via client-specific configs, depending on platform (Travel Firestick, iOS).
Ensured clients used Unbound exclusively while the tunnel was up, and blocked fallback to WAN DNS.
Added logging rules on the Mikrotik to trace any cross-VLAN attempts or DNS leaks. This also doubled as a useful audit layer to detect unexpected access paths.

This setup now acts as a lightweight Zero Trust enforcement point - no blanket access, no open trust from the tunnel, and no chance of accidentally bridging VPN clients into the wrong VLAN without explicitly defined policy😂

VM Deployment and Observability

Virtualisation stack:

The core virtualisation runs on Proxmox VE, ZFS-backed with ECC memory and hardware passthrough where needed. It's stable, fast, and gives me the flexibility to spin up and tear down services quickly without getting stuck manually managing each one.

Gotchas:

Proxmox doesn’t expose tagged VLANs well via the UI. You’ll need to manually edit bridge-ports and explicitly define tag behaviour in /etc/network/interfaces. Otherwise, VMs end up untagged or bridged incorrectly - and debugging that is a bit of a pain.
The GitHub Actions runner tried to pull dependencies over random high ports, which were being denied by the Mikrotik’s egress filtering. I ended up defining per-VM outbound rules with specific port and DNS profiles to avoid this silently failing in pipelines.

Monitoring:

Node Exporter + Prometheus for system metrics collection across VMs.
DNS telemetry from AdGuard, streamed into Loki, with dashboards in Grafana to visualise DNS behaviour by VLAN.
Currently testing packet-level mirroring to a TAP VM, for deeper analysis with Wireshark and possibly Suricata later.

Lessons Learned

Default configs will burn you: VLANs, DHCP, and Proxmox bridges all need explicit tagging and tight interface control. Assumptions lead to silent breakage.
DNS is brittle when split across tools: If you're mixing Unbound, AdGuard, and multiple VLANs, verify resolution paths with dig and tcpdump - not just the browser. Misrouting one record can cause hours of silent failure. If you don't think it's DNS, it's DNS haha.
Mikrotik logs are your friend: They’ll catch things like NAT floods, interface flapping, or rogue services broadcasting where they shouldn't. Enable them early, filter them well.
Treat the lab like production - if it's easy to misconfigure here, it’s even easier at scale. Though, a lot more forgiving with family not being able to watch their anticipated series rather than co-workers unable to log expenses🤣.

Next Steps

Enable IPv6 and test dual-stack exposure: Introduce IPv6 across core VLANs, validate NAT64/DNS64 behaviour, and test service reachability with fallback detection.
Deploy k3s to replace static VMs: Begin migrating less stateful services to containers via k3s. Focus on enforcing Kubernetes network policies per VLAN and testing ingress/egress boundaries.
Side-channel detection lab: Build a testbed using Frida, tcpdump, and intentionally noisy payloads to simulate exfil techniques. Monitor for data leakage patterns and validate detection workflows.

Closing

Yes, monitoring uptime for Jellyfin has become my new SRE priority - not because of SLA breaches, but because someone in the house needs their "MAFS: Australia" fix without buffering or outages.

This lab isn’t about RGB racks or screenshot-worthy dashboards. It’s about recreating the kind of weird edge cases that production environments love to throw at you, learning how things break under pressure, and stress-testing the tools and patterns I use in actual day jobs.

If you're building something similar, want config snippets, or just want to share your own weird bugs - get in touch, or check out the other posts.

PreviousBarnsley, Belonging, and Basic Decency NextUbiquiti *.supp file decryption key found, method to inspect support and configuration packages

Last updated 1 month ago