System Architecture – High Availability (HA) Cluster

System Architecture – High Availability (HA) Cluster

A High Availability (HA) deployment of XPLG is achieved through a multi-node clustered architecture, designed to provide maximum resilience, scalability, and operational continuity for enterprise environments.

XPLG’s cluster management is fully automated, dynamically distributing workloads across all available nodes and processes to ensure optimal performance, resource utilization, and fault tolerance.


Key Benefits of HA Deployment

  • Horizontal Scalability
    XPLG scales linearly by adding nodes or processes, enabling support for growing data volumes and increasing user demand without architectural changes.

  • Separation of Concerns
    User-facing workloads (UI, queries, dashboards) are decoupled from backend processing (ingestion, indexing, analytics), ensuring consistent performance and high quality of service.

  • High Availability by Design
    All nodes are capable of executing core functions such as ingestion, processing, and alerting—eliminating single points of failure.

  • Fault Tolerance
    In case of node or process failure, workloads are automatically redistributed across the cluster, ensuring uninterrupted data flow and service continuity.

  • Disaster Recovery (DR)
    Built-in mechanisms continuously maintain configuration backups, enabling rapid and reliable system restoration.

  • Resilient Processing
    The system ensures completion of in-flight operations (data ingestion, indexing, alerts, reports) even after failures, maintaining data integrity and operational consistency.


Architecture Overview

The diagram below illustrates a fully distributed XPLG deployment, including:

  • Multiple UI nodes handling user interaction

  • Dedicated management and processing nodes

  • Distributed data ingestion (listeners)

  • Integration with both on-premise and cloud data sources

  • Optional load balancing layers for user traffic and ingestion pipelines

This architecture enables seamless operation across hybrid environments, combining on-premise infrastructure with cloud-native data sources.

 

XPLG_ARCH.png

 

 

XPLG_ARCH.png

 

Diagram Annotations

[0] Authentication & Access Control
Integration with Active Directory / SSO enables centralized authentication and secure user access management.

[1] Users Traffic Load Balancer (Optional)
Distributes user traffic across multiple UI nodes to ensure efficient resource utilization and consistent performance.

[2] UI Nodes Array
A set of processes dedicated to handling user interaction, including queries, dashboards, and visualization.

[3] Storage
A shared file system accessible by all cluster nodes, managing both:

  • Hot data (frequently accessed)

  • Cold data (long-term retention)

[4] Management & Processing Nodes Array
Responsible for cluster orchestration and data processing:

  • M (Master) – orchestrates and manages cluster operations

  • P (Processors) – handle ingestion, indexing, monitoring, and reporting tasks

  • L (Listeners) – receive incoming data from external sources and shippers

[5] Listeners Load Balancer (Optional)
Distributes incoming data streams across multiple listeners to ensure balanced ingestion and processing.

Cloud Connectivity Architecture

[6] PortX
An XPLG PortX instance or cluster deployed in the cloud (VM, Docker, or Kubernetes/OpenShift). PortX acts as a data gateway, enabling secure and efficient data streaming to the central XPLG cluster.

[7] Cloud Services APIs
PortX integrates with native cloud service APIs to collect telemetry and operational data.

[8] Cloud Data Sources
Includes cloud-based infrastructure, applications, Kubernetes/OpenShift clusters, and other services generating telemetry data.

[9] Cloud Storage
PortX enables intelligent data routing—determining which data is forwarded to the central cluster and which remains in cloud storage for short- or long-term retention.

 

Sizing & Capacity Planning

Sizing an XPLG deployment depends on the following key parameters:

  • Daily Data Volume (ingestion rate)

  • Retention Requirements (hot and cold tiers)

  • Concurrent Users and Query Load

Key considerations:

  • Compute (CPU & Memory)
    Scales with ingestion throughput and query complexity. Additional processing nodes increase indexing and analytics capacity.

  • Storage
    Determined by daily volume and retention period. Tiered storage (hot/cold) enables performance optimization and cost control.

  • Network Throughput
    Critical for distributed environments and high-ingestion scenarios.

  • Cluster Design
    Independent scaling of UI, ingestion, and processing layers allows precise performance tuning.

For detailed sizing guidelines and recommended configurations, refer to the System Requirements documentation.

 

Getting Started

For deployment instructions and best practices, refer to the installation instructions
For architecture planning, sizing, or deployment assistance, contact the XPLG support team: support@xplg.com