Setting Up the CloudStack Usage Server for Tracking

The CloudStack Usage Server acts as the central telemetry and accounting hub for large scale infrastructure deployments. Its primary function involves the aggregation and normalization of raw event data generated by the CloudStack Management Server. In a production cloud environment, resource consumption across compute, storage, and network layers occurs asynchronously; the Usage Server provides the necessary idempotent processing logic to transform these discrete events into billable records. Without this service, infrastructure providers face significant visibility gaps regarding resource throughput and allocation. It addresses the critical requirement for granular tracking of virtual machine uptime, volume snapshots, public IP usage, and network egress/ingress metrics. By decoupling the usage processing from the core management logic, the system ensures that performance overhead is minimized on the control plane. This architecture allows the platform to maintain low latency during API requests while high-volume accounting tasks are handled by a dedicated background process that interacts directly with the database.

TECHNICAL SPECIFICATIONS

| Requirement | Default Port/Range | Protocol/Standard | Impact Level (1-10) | Recommended Resources |
| :— | :— | :— | :— | :— |
| Java Runtime Environment | N/A | OpenJDK 11 or 17 | 10 | 2 vCPU / 4GB RAM |
| Database Connectivity | 3306 | MySQL/MariaDB | 9 | Low Latency SSD |
| Management Interaction | 8080/8250 | TCP/REST | 7 | 1Gbps Uplink |
| Usage Job Interval | 1800s (Default) | Internal Scheduler | 6 | N/A |
| Storage Capacity | N/A | SQL Schema | 8 | 50GB+ (Growth dependent) |

THE CONFIGURATION PROTOCOL

Environment Prerequisites:

Successful deployment requires a pre-existing Apache CloudStack Management Server installation. The host operating system should be a Linux distribution such as RHEL 8/9 or Ubuntu 22.04 LTS. Ensure that the mysql-connector-java library is present and that the system user has sudo or root privileges. Firewall rules must allow bidirectional traffic between the Usage Server and the MySQL host. Security standards dictate that the database user for usage processing must have specific grants for the cloud_usage and cloud databases; general administrative access is discouraged to minimize the security blast radius.

Section A: Implementation Logic:

The engineering design of the CloudStack Usage Server centers on a scheduled polling mechanism. It does not receive live streams: instead, it queries the usage_event table within the primary cloud database. The logic utilizes a “Last Processed ID” marker to ensure that no event is counted twice, maintaining strict idempotency. Once events are fetched, the server categorizes them into usage types (e.g., Running VM, Allocated VM, IP Address Usage) and calculates the delta between event timestamps. The final records are written to a separate schema denominated cloud_usage, ensuring that heavy read/write operations for billing do not lock tables required for real-time cloud management.

Step-By-Step Execution

1. Repository Synchronization and Package Acquisition

Configure the official CloudStack package repository entries within /etc/yum.repos.d/cloudstack.repo or /etc/apt/sources.list.d/cloudstack.list. Execute yum install cloudstack-usage or apt-get install cloudstack-usage to pull the binary assets and service scripts.

System Note: This action populates the /usr/share/cloudstack-usage/ directory with the necessary JAR files. The package manager creates the cloud user and group if they do not exist, ensuring that the service runs under a restricted security context rather than as a privileged root user.

2. Database Connectivity Mapping

Navigate to /etc/cloudstack/usage/db.properties. Update the db.usage.host, db.usage.user, and db.usage.password fields to match your MySQL environment. Ensure the db.usage.name points correctly to the cloud database for event reading.

System Note: Modifying this file establishes the persistence layer connection string. The kernel will later use these credentials to open socket connections. Use chmod 600 on this file to prevent unauthorized users from reading sensitive database credentials.

3. Java Virtual Machine Heap Adjustment

Edit the service configuration file located at /etc/default/cloudstack-usage or /etc/sysconfig/cloudstack-usage. Locate the JAVA_OPTS variable and set the memory parameters, for example: -Xms2g -Xmx4g.

System Note: This step tunes the thermal-inertia of the process memory. By pre-allocating the heap space, you prevent the JVM from frequently requesting memory pages from the Linux kernel during heavy processing spikes, which significantly reduces CPU overhead and garbage collection latency.

4. Service Initialization and Status Check

Execute systemctl enable cloudstack-usage followed by systemctl start cloudstack-usage. Verify the operational state using systemctl status cloudstack-usage.

System Note: The systemctl command registers the process with the system-d init system. This ensures the usage tracker survives a system reboot. The underlying kernel assigns a Process ID (PID) and begins monitoring the resource consumption of the thread pool.

Section B: Dependency Fault-Lines:

Hardware bottlenecks often manifest as signal-attenuation in database response times. If the MySQL server is hosted on a high-latency network segment, the Usage Server may encounter packet-loss or timeout errors during bulk inserts. Furthermore, library conflicts between the installed OpenJDK version and the CloudStack release version can lead to immediate service crashes upon startup. Always ensure that the mysql-connector-java version is compatible with the MySQL server protocol version to avoid handshake failures.

THE TROUBLESHOOTING MATRIX

Section C: Logs & Debugging:

The primary forensic resource is the log file located at /var/log/cloudstack/usage/usage.log. When diagnosing failures, look for specific exception strings.

  • Connection Refused: This indicates a network-level blockage or a misconfigured port in db.properties. Use nmap -p 3306 to verify connectivity.
  • Access Denied for User: This confirms that while the database is reachable, the credentials provided lack the necessary permissions to select from the usage_event table or insert into cloud_usage.
  • UsageJob failed to start: This often results from a locked record in the cloud.configuration table. Check the usage.stats.job.executing parameter in the database; if it is stuck at “1” during a crash, the next job will not initiate.

Visual cues of failure often include a flat-line in resource reports or “0” values across all billing categories despite active VM instances. Use identifying SQL queries such as SELECT * FROM cloud_usage.usage_event WHERE processed IS NULL; to determine if the server is actually moving data through the pipeline.

OPTIMIZATION & HARDENING

Performance Tuning:
To increase throughput for environments with thousands of concurrent virtual machines, adjust the usage.aggregation.range in the global settings. Reducing the aggregation window from 1440 minutes to shorter intervals can prevent the server from attempting to process massive payloads in a single burst. Additionally, ensure the MySQL server has optimized indexes on the created and removed columns within the usage_event table to minimize disk I/O wait times.

Security Hardening:
Enforce TLS encryption for all database traffic by appending useSSL=true to the connection string in the properties file. Apply strict firewall rules via iptables or nftables to restrict access to the usage server host only from the management network. Encapsulation of this traffic within a management-specific VLAN prevents sniffing of sensitive infrastructure metrics by unauthorized workloads on the public network.

Scaling Logic:
The CloudStack Usage Server is typically a singleton process within a zone. For high-availability, do not run multiple instances concurrently against the same database as this can lead to race conditions and duplicate billing. Instead, deploy the service on a high-availability cluster or use an orchestrator to ensure that if the primary node fails, the service is restarted on a standby node with the same configuration file state.

THE ADMIN DESK

How do I restart a stuck usage job?
Stop the cloudstack-usage service. Access the cloud database and update the configuration table, setting usage.stats.job.executing to false. Restart the service to trigger a new parsing cycle immediately.

Where can I find the raw usage records?
All processed records are stored in the cloud_usage schema. Common tables include usage_vm_instance, usage_network_offering, and usage_volume. Access these via a standard SQL client using the credentials defined in the properties file.

Can I run the usage server on the same host as the management server?
Yes, for small to medium deployments. However, for high-traffic environments, separate the services to prevent resource contention. The Usage Server can consume significant CPU during the aggregation phase of millions of events.

What is the impact of a stopped usage server?
Usage events will accumulate in the usage_event table indefinitely. No data is lost; however, billing reports will remain stagnant until the service is restored and processes the backlog. Memory overhead on the DB may increase.

How do I verify if the usage server is actually working?
Check the cloud_usage.cloud_usage table for the most recent start_date. If the date is more than 24 hours behind current system time, the aggregation engine is likely stalled or hitting a logic exception.

Leave a Comment