Tracking and Billing CloudStack Network Usage

CloudStack Network Usage represents the core telemetry engine for quantifying resource consumption within high-density Infrastructure-as-a-Service (IaaS) deployments. In a multi-tenant cloud; the ability to distinguish between administrative overhead and billable customer payloads is essential for financial auditing and capacity planning. The tracking of CloudStack Network Usage addresses the complex requirement of aggregating traffic metadata from distributed hypervisor agents into a centralized, queryable database. This process must account for various traffic types; including Public IP traffic, Private Gateway egress, and Virtual Private Cloud (VPC) site-to-site VPN flows. The primary challenge involves achieving high throughput in data collection without introducing latency into the virtualized networking stack. By deploying a dedicated Usage Server; administrators create an idempotent pipeline that transforms raw packet counts into granular billing records; ensuring that thermal-inertia in the hardware clusters remains manageable by offloading heavy analytical processing from the primary Management Server.

Technical Specifications

| Requirement | Default Port/Operating Range | Protocol/Standard | Impact Level (1-10) | Recommended Resources |
| :— | :— | :— | :— | :— |
| Usage Server Service | Port 8080 / 443 | Java/Jetty | 8 | 4 vCPU / 8GB RAM |
| Database Connection | Port 3306 | MySQL/MariaDB | 9 | SSD-backed Storage |
| Usage Job Interval | 1800 – 3600 seconds | Cron/Internal Scheduler | 6 | Minimum 100 IOPS |
| Management API | Port 8096 | HTTP/REST | 7 | Low Latency Link |
| Network Telemetry | Layer 2 through 4 | IEEE 802.1Q / VXLAN | 10 | 10GbE Interface |

The Configuration Protocol

Environment Prerequisites:

Successful deployment of CloudStack Network Usage tracking requires a functioning Apache CloudStack environment version 4.15 or higher. The underlying operating system should be a Linux distribution such as RHEL 8 or Ubuntu 20.04 LTS; ensuring support for the cloudstack-usage package. Database schemas must be synchronized using the cloudstack-setup-databases script to ensure the cloud_usage database is present. Precise time synchronization via NTP is mandatory across all Management Servers; Usage Servers; and Hypervisors to prevent discrepancies in timestamped billing records. Furthermore; the firewall must permit ingress traffic on port 3306 for the Usage Server to communicate with the central database.

Section A: Implementation Logic:

The engineering design of CloudStack Network Usage tracking relies on an asynchronous polling mechanism. Rather than capturing packets in real-time; which would introduce significant packet-loss and processing overhead; the system queries the hypervisor agents for cumulative statistics. These statistics represent the total bytes transmitted (TX) and received (RX) on specific virtual interfaces. The Usage Server then performs a delta calculation between the current poll and the previous record. This design is inherently idempotent; if a collection job fails; the subsequent job will capture the cumulative total; ensuring no data is lost. The system accounts for encapsulation overhead by calculating the difference between raw eth0 statistics and the virtualized payload within GRE or VXLAN tunnels.

Step-By-Step Execution

1. Installation of the Usage Component

On the designated analytics node; execute the command yum install cloudstack-usage or apt-get install cloudstack-usage.
System Note: This action installs the necessary Java archive files and creates the cloud system user. It also registers the cloudstack-usage service within the systemd init system; allowing for automated recovery during host reboots.

2. Database Connectivity Configuration

Navigate to the configuration directory and modify the file at /etc/cloudstack/usage/db.properties. Ensure the variables db.usage.host, db.usage.username, and db.usage.password match the credentials created during the initial CloudStack installation.
System Note: Correcting these parameters allows the Usage Server to map the cloud_usage schema. Failure to secure this file with chmod 600 may lead to credential exposure to non-privileged threads.

3. Enabling Usage Collection in Global Settings

Access the CloudStack Management Console; navigate to Global Settings; and search for the parameter usage.execution.timezone. Set this to match the UTC offset of your regional billing cycle. Additionally; ensure enable.usage.server is set to true.
System Note: Changing these values triggers an update in the configuration table of the cloud database. This forces the Management Server to start advertising usage records for the Usage Server to pick up.

4. Service Initialization and Verification

Execute the command systemctl start cloudstack-usage followed by systemctl enable cloudstack-usage. Verify the status using systemctl status cloudstack-usage.
System Note: Upon startup; the service invokes the kernel-level Java Virtual Machine (JVM). It begins scanning the usage_event table to identify unprocessed network events, such as IP assignment or network creation.

5. Configuring Traffic Type Aggregation

Utilize the CloudStack API or the cloudstack-config tool to define which traffic types are billable. Use the command update traffic type to specify whether Public, Guest, or Management traffic should be tracked independently.
System Note: This configuration instructs the hypervisor agents to maintain separate counters for different VLAN tags; mitigating the risk of billing customers for internal management throughput.

Section B: Dependency Fault-Lines:

The most frequent bottleneck in CloudStack Network Usage tracking is the exhaustion of the database connection pool. If the max_connections setting in my.cnf is too low; the Usage Server will fail to commit records; leading to gaps in billing data. Another critical fault-point is the desynchronization of the local system clock. If the Usage Server clock drifts backward; the delta calculation for network bytes may result in negative values; stalling the usage job entirely. Finally; ensure that the libvirt or XenServer agents are not experiencing signal-attenuation or high latency on the management network; as this prevents the successful delivery of traffic statistics payloads to the collector.

THE TROUBLESHOOTING MATRIX

Section C: Logs & Debugging:

The primary log file for diagnosing usage failures is located at /var/log/cloudstack/usage/usage.log. When investigating missing data; search for the string “Usage server is not running” or “Exception in usage job”. If the service starts but no data is generated; inspect the cloud_usage.usage_network table directly using the MySQL client with the command SELECT * FROM cloud_usage.usage_network WHERE device_id = ‘X’;.

Common error codes and their physical/logic counterparts:
1. Error “Connection Refused”: Check if the MySQL service is bound only to 127.0.0.1 in /etc/my.cnf.d/server.cnf.
2. Error “Data Integrity Violation”: This usually indicates a duplicate entry in the usage_event table; requiring a manual cleanup of orphaned records.
3. High CPU Load on Usage Node: This often correlates with a large backlog of records; consider increasing the usage.aggregation.range in Global Settings to process data in larger, less frequent batches.

OPTIMIZATION & HARDENING

Performance Tuning:
To enhance throughput; increase the concurrency of the usage collector by adjusting the usage.stats.job.aggregation.range. Setting this to a higher value reduces the frequency of database commits; which lowers disk I/O pressure but increases memory footprint. For large-scale deployments; place the cloud_usage database on a separate physical volume to minimize contention with the operational cloud database. This reduces the thermal-inertia of the primary storage array by distributing write operations.

Security Hardening:
Restrict access to the Usage Server by implementing iptables or nftables rules that only allow the Management Server and local admins to connect. Ensure that the database user for usage tracking has limited permissions; it only requires SELECT, INSERT, and UPDATE privileges on the cloud_usage and cloud databases. Disable all unused protocols on the usage node to reduce the attack surface.

Scaling Logic:
As the cloud environment expands to thousands of virtual machines; a single Usage Server may become a bottleneck. While CloudStack traditionally uses a single active Usage Server; you can scale by optimizing the underlying database performance. Implement read-replicas for the billing team to query usage data without impacting the primary ingestion service. Ensure that the network throughput between the hypervisors and the Usage Server remains high to prevent packet-loss during the transmission of telemetry metadata.

THE ADMIN DESK

How do I restart the usage job manually?
Stop the service with systemctl stop cloudstack-usage; clear the usage_job table in the database; and restart the service. This forces the system to re-evaluate the timelines for all pending usage records from the last successful checkpoint.

What happens if the Usage Server is offline for 24 hours?
No data is lost. The CloudStack Network Usage system is designed to be resilient; upon restart; the server will identify the last processed timestamp and aggregate all accumulated data from the hypervisor agents since that point.

Why is my network usage showing zero for all users?
Verify that the enable.usage.server flag is set to true in Global Settings. Additionally; ensure that the Virtual Routers are running; as they are responsible for reporting the ingress/egress statistics for guest networks in most CloudStack zones.

Can I track usage for internal Private Gateway traffic?
Yes. You must configure the traffic type for the Private Gateway in the Infrastructure section of the UI. Once set; the Usage Server will begin recording TX/RX bytes for the specific VLAN associated with the gateway.

How do I verify if the Usage Server is connected to the DB?
Run netstat -antp | grep 3306. You should see an established connection from the java process (Usage Server) to the MySQL host. If the state is SYN_SENT or TIME_WAIT; check your network routing and firewall rules.

Leave a Comment