Apache CloudStack serves as a robust orchestrator for Infrastructure as a Service (IaaS), where the management of network services like DHCP and DNS is critical for guest VM lifecycle operations. In a standard CloudStack deployment, these services are not centralized in the physical layer; instead, they are pushed to the edge of the virtual network through the Virtual Router (VR). This architectural choice solves the problem of IP address exhaustion and broadcast domain leakage by encapsulating network traffic within specific accounts or VPCs. The “Problem-Solution” context here involves bridging the gap between the CloudStack Management Server, which holds the IP state in its database, and the Guest VM, which requires immediate network identity upon instantiation. By leveraging the VR, CloudStack achieves high concurrency and minimizes latency during the boot sequence. This manual explores the mechanisms of the dnsmasq process within the VR and the orchestration logic that synchronizes the database state with the active network configuration.
Technical Specifications
| Requirement | Default Port/Operating Range | Protocol/Standard | Impact Level (1-10) | Recommended Resources |
| :— | :— | :— | :— | :— |
| DHCP Service | UDP 67 (Server), UDP 68 (Client) | RFC 2131 | 10 | 1 vCPU / 256MB RAM |
| DNS Resolution | UDP/TCP 53 | RFC 1035 | 9 | Low Latency Storage |
| Meta Data Service | TCP 80 | HTTP/1.1 | 8 | Persistent VR Boot |
| Bridge Interface | N/A | IEEE 802.1Q (VLAN) | 10 | 1Gbps+ Throughput |
| Management Link | TCP 3922 | SSH/Proprietary | 9 | Control Plane Access |
The Configuration Protocol
Environment Prerequisites:
To implement the CloudStack DHCP and DNS architecture, the environment must meet specific baseline standards. The primary requirement is a functional CloudStack Management Server version 4.15 or higher, running on a distribution compatible with the cloudstack-common library. Network-wise, the physical switches must support 802.1Q VLAN tagging or VXLAN encapsulation, depending on the chosen isolation method. The hypervisor (KVM, XenServer, or VMware) must have the cloud-managed bridge or distributed virtual switch configured. User permissions must allow the Management Server to execute sudo commands on the hypervisor hosts to manage the life cycle of the System VMs.
Section A: Implementation Logic:
The engineering design of CloudStack relies on a decentralized model. When a Guest VM is created, the Management Server calculates the next available IP within the assigned CIDR. This data is not sent directly to the VM. Instead, the Management Server uses a “push” orchestration model. It identifies the Virtual Router associated with the Guest VM network and transmits a JSON payload containing the MAC address, IP address, and hostname via the hypervisor specific control channel (e.g., the virtio-serial or link-local network). Inside the VR, a Python based agent receives this payload and updates the /etc/dhcphosts.txt and /etc/hosts files. The dnsmasq service is then signaled to reload its configuration without dropping current connections. This design ensures that the DHCP lease is idempotent; even if the VR is rebooted, the persistent state on the Management Server can re-provision the local configuration files, maintaining high availability without requiring a massive central DHCP pool.
Step-By-Step Execution
Step 1: Virtual Router Instantiation
The deployment begins when the orchestration engine triggers the creation of the System VM based on the systemvm-template.
System Note: This action initiates the cloud-early-config script within the VR initrd. It sets up the basic networking interfaces (eth0 for management, eth1 for public, and eth2 for guest traffic). The kernel uses iptables to ensure that DHCP requests from the guest network are only accepted on the eth2 interface, preventing cross-tenant interference.
Step 2: Provisioning Guest IP via Management Server
The Management Server executes an API call to the hypervisor, passing specialized metadata.
System Note: On KVM, this uses the libvirt API to write to the VR disk or pass parameters through a temporary ISO. The logic controller on the VR, cloud-python, monitors for these changes. This step is critical because it binds the virtual MAC address of the guest to a specific IP before the guest even sends its first DHCPDISCOVER packet.
Step 3: Dnsmasq Configuration Update
The VR agent updates the internal configuration files to reflect the new guest.
System Note: The agent modifies /etc/dnsmasq.d/cloudstack.conf. It uses chmod 644 to ensure the file remains readable by the dnsmasq user but protected from unauthorized guest access. By using systemctl kill -s SIGHUP dnsmasq, the service re-reads the files without a full restart, which prevents packet-loss for existing VM lease renewals.
Step 4: Hostname and DNS Mapping
The VR populates the local DNS cache to allow internal service discovery.
System Note: The entry is added to /etc/hosts in the format: “[IP Address] [Hostname] [Internal Domain]”. This facilitates high throughput for internal name resolution, as the guest does not need to query external recursive servers for local resources. The dnsmasq service acts as both a forwarder and an authoritative source for the local zone.
Step 5: DHCP Lease Assignment and Verification
The Guest VM broadcasts a discovery packet, and the VR responds with the DHCPOFFER.
System Note: Use tcpdump -i eth2 port 67 or port 68 inside the VR to verify the four-way handshake (Discover, Offer, Request, Acknowledge). The packet payload contains the gateway IP (normally the VR IP on eth2), the netmask, and the DNS server addresses. If signal-attenuation occurs due to bridge congestion, the guest may fail to obtain an IP, resulting in an APIPA address.
Section B: Dependency Fault-Lines:
The most frequent failure point is a “stuck” Virtual Router that fails to process the configuration payload. This often stems from storage latency; if the VR disk is on a slow SAN, the cloud-python agent might timeout during the boot-up configuration phase. Another common bottleneck is the maximum concurrency limit of the dnsmasq process. In environments with thousands of VMs per network, the default file descriptor limit in the Linux kernel may be exceeded. If the hypervisor’s bridge is not properly aged, MAC address flapping can occur, causing the DHCP packets to be routed to the wrong virtual port. Lastly, firewall rules on the Guest VM itself may inadvertently block UDP port 68, preventing the lease from being accepted even if the VR is functioning perfectly.
THE TROUBLESHOOTING MATRIX
Section C: Logs & Debugging:
When a Guest VM fails to receive an IP, the architect must perform a bottom-up analysis.
1. Check the Management Server Logs: Examine /var/log/cloudstack/management/management-server.log. Look for “ApplyNetworkConfigCommand” failures. This indicates the server could not communicate with the VR.
2. Inspect the Virtual Router State: SSH into the VR using the link-local IP (typically on the 169.254.0.0/16 range) and check /var/log/cloud.log. This log tracks the execution of the configuration scripts.
3. Validate Dnsmasq Integrity: Run dnsmasq –test to check for syntax errors in the generated configuration files. Check /var/log/messages for “address in use” or “permission denied” errors related to the DHCP socket.
4. Verify Hypervisor Bridging: On the host, use brctl show or ovs-vsctl show to ensure the guest’s VIF (Virtual Interface) is plugged into the correct bridge instance corresponding to the VR eth2 interface.
OPTIMIZATION & HARDENING
Performance Tuning:
To minimize latency and maximize throughput for DNS queries, increase the dns-forward-max value in the VR’s dnsmasq.conf. This allows more concurrent upstream queries. Furthermore, optimizing the conntrack table in the Linux kernel on the VR is essential. High-traffic environments should increase net.netfilter.nf_conntrack_max to prevent packet drops during heavy network churn.
Security Hardening:
Security is maintained through strict encapsulation. The VR should be configured with iptables rules that permit DHCP/DNS traffic only from the local guest network. Ensure that the metadata service (port 80) is filtered so it is only accessible to guests, preventing external observers from scraping VM instance data. Use the -z flag in dnsmasq to protect against DNS amplification attacks by limiting responses to local queries.
Scaling Logic:
As the guest population grows, a single VR can become a mechanical bottleneck. For high-scale environments, transition from a basic Shared Network to a Redundant Virtual Router setup. This creates an active-passive pair using keepalived and VRRP. This configuration ensures that if the primary VR undergoes a kernel panic or hardware failure, the secondary VR takes over the IP and continues serving DHCP/DNS requests, maintaining the idempotent nature of the network.
THE ADMIN DESK
How do I force-sync a VM’s DHCP lease?
Restart the dnsmasq service in the VR and then trigger a “Reconnect” on the VM’s NIC via the CloudStack UI. This forces the Management Server to re-send the metadata and triggers a new DHCPDISCOVER from the guest.
What if the DNS is slow for external sites?
Check the /etc/resolv.conf inside the Virtual Router. It inherits DNS from the physical host or the Zone settings. If the upstream servers have high latency, the VR’s forwarding will also be delayed.
Can I use a custom DNS server instead of the VR?
Yes. In the Network Domain settings or the Zone configuration, specify the “Internal DNS” and “External DNS” fields. CloudStack will push these specific IPs to the guests via DHCP Option 6.
Why does my VM show an IP in the UI but not internal to the OS?
This usually indicates the VR agent successfully updated its local files, but the guest OS failed the DHCP handshake. Check for firewall blocks on the guest or bridge mismatches on the hypervisor.
How do I increase the DHCP lease time?
Modify the dhcp-range parameter in the VR’s template or manually edit /etc/dnsmasq.conf. Note that manual changes are lost upon VR reboot; use global “Configuration” settings in the CloudStack UI for persistence.