Managing the Secondary Storage VM SSVM

CloudStack Secondary Storage VM (SSVM) serves as the persistent orchestration layer for image management within a distributed cloud environment. It facilitates the movement of ISO images, templates, and snapshots between secondary storage and primary storage. By acting as a proxy for hypervisor communication, it reduces the complexity of managing storage across multiple clusters and zones. This architectural design solves the problem of cross-network storage bottlenecks by localizing the heavy lifting of payload transfers to specialized virtual appliances. In high-demand environments, the SSVM ensures that template downloads do not saturate the management network or interfere with tenant traffic. It maintains a stateful connection to the management server through the Link Local CIDR; this ensures that storage operations remain consistent even during network jitter or high packet-loss scenarios. The SSVM provides the necessary abstraction for multi-hypervisor support, allowing XenServer, KVM, and VMware hosts to interact with a centralized image repository without requiring proprietary drivers on every hardware node.

The SSVM handles all interactions with the Object Store or NFS (Network File System) mounts, providing an idempotent interface for the Management Server to request disk volume operations. Without a functional SSVM, the cloud infrastructure loses its ability to provision new instances from templates or back up existing volumes. It is the “circulatory system” of the cloud storage layer, ensuring that data resides where it is needed for compute execution.

Technical Specifications

| Requirement | Default Port / Operating Range | Protocol / Standard | Impact Level (1-10) | Recommended Resources |
| :— | :— | :— | :— | :— |
| NFS Connectivity | Port 2049 (TCP/UDP) | NFS v3 / v4 | 10 | 100 Mbps Min Bandwidth |
| Management Link | Port 8250 | TCP / Java Agent | 9 | 1 vCPU / 2GB RAM |
| Public Internet | Port 80 / 443 | HTTP / HTTPS | 7 | Static Public IP |
| Internal Communication | Port 3922 | SSH / Port Forward | 8 | Link Local (169.254.x.x) |
| Console Proxy | Port 443 / 8080 | Websockets / VNC | 6 | High IOPS Storage |

THE CONFIGURATION PROTOCOL

Environment Prerequisites:

Before deploying or modifying the CloudStack Secondary Storage VM, the infrastructure must meet the following criteria:
1. The Management Server must be running CloudStack version 4.15 or higher to support the latest system VM templates.
2. A valid System VM Template must be registered and seeded into the secondary storage area. Use the cloud-install-sys-tmplt script for this purpose.
3. The network must permit the 169.254.0.0/16 CIDR range for the Link Local interface between the Hypersivor and the VM.
4. NFS shares must be exported with no_root_squash permissions to allow the SSVM root user to write template metadata.
5. DNS servers provided to the SSVM must be capable of resolving external template sources like download.cloudstack.org.

Section A: Implementation Logic:

The engineering design of the SSVM relies on network encapsulation and traffic isolation. The VM is typically equipped with three or four network interfaces: Public, Private, Link-Local, and Storage. This multi-homed approach ensures that external template downloads (Public) do not compete for bandwidth with disk-to-disk transfers (Storage) or management commands (Private). The logic follows a “Control vs. Data” plane separation; the Management Server sends small control packets to the SSVM agent, which then executes data-heavy tasks like the decompression of large QCOW2 or VHD files. By offloading these CPU-intensive decompression tasks to the SSVM, the host hypervisor’s resources remain dedicated to tenant workloads, effectively managing thermal-inertia and preventing CPU starvation on the compute nodes.

Step-By-Step Execution

Step 1: Verification of System VM Template Registration

Ensure the system template is correctly flagged as “Ready” in the database before attempting an SSVM restart.
System Note: The Management Server queries the vm_template table to identify the latest version of the System VM image. If the ready flag is false, the kernel will fail to inflate the root disk, resulting in a boot loop. Use cmk list templates templatefilter=system to verify.

Step 2: Global Setting Calibration

Adjust the global configuration parameters to define how the SSVM interacts with the network.
System Note: Modifying secstorage.allowed.internal.sites or secstorage.copy.password updates the agent.properties file within the VM via an encrypted payload during the next boot cycle. This ensures that the SSVM agent services are aware of the security boundaries.

Step 3: Triggering SSVM Redployment

When encountering persistent agent state issues, the VM must be destroyed to trigger an idempotent recreation.
System Note: Execute cloud-sysvmadm -d [db_ip] -u [user] -p [password] -a -r. This command instructs the high-availability (HA) worker to terminate the cloud-secstorage process and delete the virtual instance. The management server then detects the missing state and initiates a cold boot using the latest gold-master template.

Step 4: Validate Internal Service Status

Once the VM is running, log into the Link Local address to verify the health of the storage agent.
System Note: Use ssh -i /var/cloudstack/management/.ssh/id_rsa -p 3922 root@[Link_Local_IP]. Once inside, run systemctl status cloud-agent. This verifies that the Java-based agent has successfully completed its handshake with the Management Server and joined the “Up” state in the database.

Step 5: Network Connectivity Self-Test

Run the built-in diagnostic script within the SSVM to ensure all required paths are open.
System Note: Execute /usr/local/cloud/systemvm/ssvm-check.sh. This script tests DNS resolution, NFS mount points, and portal connectivity. It checks if the iptables rules correctly route traffic through the eth1 (private) or eth2 (public) interfaces to prevent packet-loss during image syncing.

Section B: Dependency Fault-Lines:

The most common point of failure for the SSVM is the storage mount state. If the NFS server undergoes a reboot or network interruption, the SSVM may hold a stale file handle. This causes the secstorage service to hang while waiting for I/O, leading to a reporting status of “Down” in the CloudStack UI. Another significant bottleneck is DNS resolution; if the SSVM cannot resolve the hostname of the management server or the template source, the image download will time out. Ensure that the resolv.conf within the VM is correctly populated by the Management Server during the provisioning phase.

THE TROUBLESHOOTING MATRIX

Section C: Logs & Debugging:

Effective debugging requires access to the internal logs of the SSVM. The primary log file is located at /var/log/cloud/cloud.log. This file records the ingestion of management commands and the output of shell scripts used for volume manipulation.

1. Error: “Unable to mount secondary storage”
Check for firewall blocks on port 2049. Use rpcinfo -p [NFS_IP] from within the SSVM to see if the mount services are visible.
2. Error: “Connection refused on port 8250”
This indicates the SSVM can see the Management Server but cannot establish the agent session. Verify that the Management Server is listening on the expected IP and that the SSL certificates (if used) are not expired.
3. Error: “Storage health check failed”
Look for the string Exception: java.io.IOException in the logs. This often points to a full disk on the secondary storage or incorrect directory permissions preventing the creation of the template/tmp folder.

To monitor real-time throughput and identify latency issues, use nload eth1 or nload eth2 inside the SSVM. If you see high signal-attenuation or packet drops, investigate the physical switch ports or the virtual bridge (cloudbr0) on the hypervisor.

OPTIMIZATION & HARDENING

Performance Tuning: To increase concurrency for template downloads, adjust the storage.max.concurrent.copy.tasks setting in the Global Settings. For high-bandwidth environments, increase the RAM for the SSVM to 4GB and set the -Xmx flag in the cloud-agent startup script to allow for larger Java heap sizes. This reduces GC (Garbage Collection) pauses during heavy I/O.
Security Hardening: Implement strict iptables rules on the Management Server network to only permit traffic from the SSVM’s private IP on port 8250. Within the SSVM, ensure that the ssh-keygen generated keys for the root user are rotated regularly. Disable unneeded services like apache2 if only using the VM for basic NFS management.
Scaling Logic: In environments with multiple zones, deploy at least two SSVMs per zone. CloudStack will automatically load-balance tasks between them. If one SSVM hits a thermal-inertia limit or high CPU load, the Management Server diverts new copy tasks to the second instance, ensuring high availability and consistent throughput.

THE ADMIN DESK

How do I restart the SSVM without deleting it?
Log into the SSVM via SSH on the Link Local interface and run systemctl restart cloud. This restarts the agent without needing to wait for a full VM reboot, minimizing downtime during minor configuration changes or log rotations.

Why is my template stuck at 0% download?
This usually indicates a DNS failure or an inability to reach the public gateway. Check /var/log/cloud/cloud.log for “UnknownHostException.” Verify that the SSVM has a valid public IP and that the gateway is reachable via ping 8.8.8.8.

Can I use a custom resource offering for the SSVM?
Yes. In Global Settings, update service.offering.id.storage.vm. This allows you to assign more CPU and RAM to the SSVM, which is essential for high-concurrency environments processing multiple large snapshots or templates simultaneously across the cloud.

How do I clean up stale mounts in the SSVM?
If the SSVM shows a “Stale File Handle” error, run umount -f /mnt/SecStorage/ followed by mount -a. If the lock persists, you must destroy the VM using cloud-sysvmadm to force a fresh mount on a new instance.

Leave a Comment