CloudStack Direct Networking represents the architectural bridge between virtualized control planes and the raw performance of physical hardware. In a standard cloud environment, network isolation is often achieved through complex Layer 3 overlays or Virtual Routers that manage DHCP, NAT, and firewalling. However, these layers introduce significant latency and overhead, which are unacceptable for high-performance computing (HPC) or database-heavy workloads. Direct Networking bypasses these virtualized constraints by placing instances directly on the physical network segment. For Bare Metal deployments, this is not merely an optimization but a foundational requirement. The Bare Metal service in CloudStack relies on Direct Networking to facilitate PXE booting, IPMI power management, and unencapsulated data transfer. This approach eliminates the processing bottleneck inherent in software-defined network stacks, ensuring that the payload reaches the hardware interface with minimal signal attenuation or jitter. By mapping logical cloud resources to physical switch ports, administrators achieve a deterministic environment where throughput is limited only by the physical wire speed and the efficiency of the network interface controller.
TECHNICAL SPECIFICATIONS
| Requirement | Default Port/Range | Protocol/Standard | Impact Level (1-10) | Recommended Resources |
| :— | :— | :— | :— | :— |
| IPMI Management | 623 (UDP) | IPMI v2.0 / RMCP+ | 10 | 1 vCPU / 2GB RAM (Proxy) |
| PXE Booting | 67, 68, 69 | DHCP / TFTP | 9 | High-speed SSD for Images |
| CloudStack API | 8080, 8443 | HTTP/HTTPS | 7 | 4 Cores / 8GB RAM |
| Direct Network VLAN | 1 – 4094 | IEEE 802.1Q | 8 | 10GbE SFP+ Hardware |
| Agent Communication | 8250 | TCP | 6 | Minimal (Kernel Overhead) |
THE CONFIGURATION PROTOCOL
Environment Prerequisites:
Before initiating a Bare Metal deployment via CloudStack Direct Networking, the infrastructure must adhere to specific hardware and software benchmarks. The management server must be running CloudStack 4.15 or higher, utilizing Java 11 as the runtime environment. On the hardware side, all target servers must support IPMI v2.0 with LANplus (cipher suite 3 or 17) enabled in the BIOS/UEFI. The physical switch infrastructure must support IEEE 802.1Q tagging and provide a dedicated VLAN for the management traffic, separate from the public or guest traffic. Ensure that the ipmitool utility is installed on the management server or the designated Bare Metal proxy host to facilitate out-of-band communication.
Section A: Implementation Logic:
The engineering logic behind Direct Networking centers on the principle of networking transparency. In a virtualized KVM environment, the hypervisor acts as a bridge; however, a Bare Metal host possesses no such intermediary. Therefore, the CloudStack Management Server must orchestrate the network state by interacting directly with the physical switch (via plug-ins) or by assuming that the network is “flat.” By utilizing Direct Networking, we eliminate the Virtual Router, which is the traditional source of latency and potential packet-loss in heavy concurrency scenarios. This design ensures that the throughput is maximized and the payload remains unencapsulated, reducing the CPU cycles wasted on header stripping and encapsulation lookup. This is particularly critical when managing thermal-inertia in dense rack configurations, as the reduction in CPU overhead directly correlates to a lower thermal footprint per unit of compute.
Step-By-Step Execution
1. Enable the Bare Metal Service Provider
The first action is to activate the Bare Metal plugin within the CloudStack global configuration. This is an idempotent operation that prepares the database schema for physical host entries.
Command: cmk update configuration name=baremetal.enabled value=true
System Note: This command updates the configuration table in the cloud database. It triggers a listener within the management server to start the BareMetalPlanner, which replaces the standard VM-centric deployment logic with hardware-direct logic.
2. Configure the PXE Server Environment
Bare Metal provisioning requires a reliable Preboot Execution Environment (PXE). You must set up a TFTP and DHCP server that the CloudStack management server can control.
Command: yum install tftp-server dhcp -y
System Note: The installation modifies /etc/xinetd.d/tftp and /etc/dhcp/dhcpd.conf. The management server will eventually inject host-specific reservations into these files to map MAC addresses to specific OS images.
3. Initialize the Physical Network for Direct Mode
Navigate to the Infrastructure section and define a new physical network. Choose the “Direct” traffic type instead of “Isolated” or “Vpc.”
Command: cloudstack-setup-databases cloud:password@localhost –deploy-as-direct
System Note: This step configures the physical_network table. By selecting Direct, you are instructing the kernel of the CloudStack agent to skip the creation of vbridge or vnet interfaces for guest isolation, opting instead for a direct binding to the physical eth0 or bond0 interface.
4. Provide the Bare Metal ISO and Template
Unlike virtual templates, Bare Metal templates are often raw disk images or Kickstart files. Register the template and ensure it is marked as “Bare Metal.”
Command: cp /path/to/image.iso /var/lib/tftpboot/; chmod 644 /var/lib/tftpboot/image.iso
System Note: The chmod command ensures that the TFTP service handles the file with the correct read permissions, preventing “Access Denied” errors during the initial PXE handshake when the hardware requests the bootloader.
5. Add the Physical Host via IPMI
Register the bare metal host by providing its IPMI IP address, username, and password. This allows CloudStack to control the power state.
Command: ipmitool -I lanplus -H
System Note: Executing this manually first verifies that the out-of-band network is reachable. When added to CloudStack, the cloudstack-management service will use this connection to trigger a “Power On” event during the deployment phase.
Section B: Dependency Fault-Lines:
The most common point of failure in Direct Networking is a failure in the DHCP/PXE handshake. If the Management Server cannot modify the DHCP configuration due to permission issues on /etc/dhcp/dhcpd.conf, the host will boot into a “No Boot Device Found” state. Another bottleneck is the MTU (Maximum Transmission Unit) mismatch. If the physical switch is configured for Jumbo Frames (9000 bytes) but the CloudStack Direct Network is set to the default 1500, packet-loss will occur during large data transfers, leading to corrupted OS installations. Always ensure the bridge-utils package is updated to avoid race conditions during interface initialization.
THE TROUBLESHOOTING MATRIX
Section C: Logs & Debugging:
When a Bare Metal deployment fails, the primary source of truth is the /var/log/cloudstack/management/management-server.log. Look for strings such as “BareMetalPostMigrationCommand” or “Unable to power on host via IPMI.”
1. IPMI Errors: If you see “Unable to establish IPMI v2 / RMCP+ session,” verify the cipher suite on the physical BMC (Baseboard Management Controller). You can use ipmitool -I lanplus -H
2. PXE Timeouts: Check /var/log/messages for tftpd entries. If the log shows “File not found” but the file exists, verify the path in the CloudStack global setting baremetal.tftp.dir.
3. Network Isolation: If the host boots but cannot reach the internet, verify the VLAN tagging on the physical switch port. The port must be in “Trunk” mode if multiple VLANs are used, or “Access” mode if the Direct Network is untagged.
OPTIMIZATION & HARDENING
#### Performance Tuning
To maximize throughput and minimize latency in Direct Networking, the system architect should tune the kernel parameters of the proxy host. Increasing the net.core.netdev_max_backlog to 5000 and the net.ipv4.tcp_max_syn_backlog to 10000 ensures that the system can handle high concurrency during “boot storms” where dozens of bare metal nodes are provisioned simultaneously. Additionally, disabling the irqbalance service and manually pinning network interrupts to specific CPU cores can reduce context-switching overhead.
#### Security Hardening
Direct Networking exposes the management network to potential risks. Hardening is mandatory. Implement strict iptables or nftables rules on the management server to allow IPMI traffic (623) only from known management IPs. Ensure that the IPMI interface on the physical hardware is on a completely isolated “OOB” (Out-of-Band) network that is not routable from the guest network. Change default IPMI passwords immediately using ipmitool user set password.
#### Scaling Logic
As the Bare Metal footprint expands, a single TFTP/DHCP server will become a bottleneck. The scaling logic dictates moving to a distributed PXE architecture where regional PXE relays handle local rack traffic while reporting back to the central CloudStack Management Server. This reduces the signal-attenuation of control traffic over long-distance links and ensures that the payload of the OS image is delivered over a local, high-bandwidth path.
THE ADMIN DESK
Q: Why does the host fail to boot even after successful IPMI power-on?
A: This usually indicates the TFTP server is unreachable or the file path is incorrect. Verify that the tftp-server service is running and that the firewall allows incoming connections on port 69.
Q: Can I use Direct Networking with existing virtual machines?
A: Yes. Direct Networking in CloudStack allows for a “Shared” network model where both Bare Metal hosts and VMs reside on the same physical VLAN, facilitating high-speed communication between virtual and physical tiers.
Q: How do I handle MTU issues in a Direct Network environment?
A: Ensure the physical switch and the host’s NIC are both configured for the same MTU. In CloudStack, you can specify the MTU in the physical network settings to ensure the overhead is handled correctly.
Q: Is it possible to automate the registration of dozens of hosts?
A: Use the CloudStack API or the cmk (CloudMonkey) CLI tool to script the addHost command. This ensures an idempotent and repeatable deployment process, reducing manual configuration errors in large-scale infrastructures.