System Architecture - HA cluster - AWS Spot Instances

AWS Spot instances background

AWS spot instances are an excellent way to significantly reduce your EC2 on demand instance cost by up to 90%. Increasing numbers of companies, from SMBs to enterprises, have been leveraging spot instances for even mission-critical and production workloads. Spot instances pricing has helped them greatly optimize their cloud costs.

AWS spot instances represent AWS’s excess capacity. As a cloud provider, they must have spare capacity available for any surge in customer demand. To offset the loss of idle infrastructure, AWS offers this excess capacity at a massive discount to drive usage. That is why spot instances pricing is so affordable in comparison to EC2 pricing on demand.

This discounted Amazon EC2 spot instance pricing comes with a caveat. AWS can “pull the plug” and terminate spot instances with just a 2 minute warning. These interruptions occur when AWS needs to draw from the excess capacity to service customers who purchased reserved instances, savings plans or on-demand instances. 

While AWS does offer “capacity rebalancing” signals which might notify you that an EC2 spot instance is at risk of termination, AWS does not guarantee these signals will be delivered early enough for you to take action.

Here are some of the reasons Amazon EC2 may interrupt a spot instance:

  • Capacity—EC2 does not have enough unused instances to supply on-Demand or reserved instances, so it suspends spot instances to fulfill those requests.

  • Constraint—if the spot instance request specified a launch group or Availability Zone group, and these constraints cannot be accommodated, instances will be terminated as a group.

When an instance is interrupted, you can select one of three possible actions—terminating the spot instance (default), stopping the spot instance (making it possible to restart it with the same launch specifications), or hibernating the instance.

XPLG runs as a stateful application. It requires data and IP persistence. With automated solutions, even in the event of AWS spot instance replacement, the workload will immediately restart in the desired Availability Zone, from the same exact data point, maintaining root and data volumes as well as private and public IPs.

 

XPLG Architecture

The most efficient deployment of XPLG is a multiple machines based cluster.
The clustering mechanism of XPLG manages all its tasks automatically and dynamically based on the available cluster processes.

A clustered deployment has some key advantages:

  • Scalability - by adding additional processes, support any required volume.

  • Users activity vs. back end processing  - a complete separation between the users activity and the backend processes of data in order to maintain high quality of service. 

  • Multiple points of failure - all cluster nodes may function as alerting and processing nodes in cases of failures to avoid loss of data / loss of service.

  • Fast Disaster Recovery (DR) - XpoLog has automated procedures to maintain configuration backups that may be easily used to restore a system.

  • High Availability (HA) - Upon cluster node failure the cluster manager immediately identifies it and alerts. Until the failed node resumes, its processes are automatically assigned to another node to ensure all activities are performed.

  • Fault tolerance - During a cluster node failure or following an entire cluster failure, XpoLog recovers immediately and accurately to complete undigested data, reports and monitors.

HA cluster - AWS Spot Instances Architecture

The following diagram presents a spot based clustered environment with multiple user interface nodes and data processing nodes.

 

[0] Users Web Browser Access. Integration to Active Directory for authentication and authorization.

 [1] Users Traffic Load Balancer - (reserved) - in case of multiple UI nodes this LB distributed users’ activity. The LB will direct the users between the available spot UI nodes.

 [2] UI Nodes Array (spot) - processes dedicated to serve users activity. UI nodes will go up and down depending on demand and loads.

 [3] EFS (Reserved) - Elastic file system accessible by all cluster nodes (spots and reserved nodes).

 [4] Processing Nodes Array - the cluster management, processing, and listeners nodes.
M (Reserved) = MASTER (the process that orchestrates the cluster). This is the only cluster machine that will be reserved as it continuously manages the cluster’s available resources.
P (Spot) = Processors (the process(es) that process data (collect, index, monitor, report, etc.). All core tasks will be done by the available spot processor nodes.
L (Spot) = Listeners (the process(es) that receive data from various data shippers). Data will be sent to the Listeners LB, which will share the records between the available listeners spot nodes.

[5] Listeners Load Balancer (Reserved) - (optional) - in case of multiple Listeners, this LB distributed traffic between nodes.

 

[0] Users Web Browser Access. Integration to Active Directory for authentication and authorization.

 [1] Users Traffic Load Balancer - (reserved) - in case of multiple UI nodes this LB distributed users’ activity. The LB will direct the users between the available spot UI nodes.

 [2] UI Nodes Array (spot) - processes dedicated to serve users activity. UI nodes will go up and down depending on demand and loads.

 [3] EFS (Reserved) - Elastic file system accessible by all cluster nodes (spots and reserved nodes).

 [4] Processing Nodes Array - the cluster management, processing, and listeners nodes.
M (Reserved) = MASTER (the process that orchestrates the cluster). This is the only cluster machine that will be reserved as it continuously manages the cluster’s available resources.
P (Spot) = Processors (the process(es) that process data (collect, index, monitor, report, etc.). All core tasks will be done by the available spot processor nodes.
L (Spot) = Listeners (the process(es) that receive data from various data shippers). Data will be sent to the Listeners LB, which will share the records between the available listeners spot nodes.

[5] Listeners Load Balancer (Reserved) - (optional) - in case of multiple Listeners, this LB distributed traffic between nodes.

Please see AWS Spot Instances based HA Cluster Installation (multi-machines), our support team is happy to consult/assist when needed, contact us at support@xplg.com 

 

 

XPLG AWS Infrastructure Costs (examples)

Below you can find clusters costs for running XPLG on AWS using spot instances where possible.

Up to 50GB/Day / 30 days retention

Resource

Spec

Type

Estimated Total Annual Cost - (Spot/Reserved)

Total Cost / (% Saving Spot vs. on Demand)

Resource

Spec

Type

Estimated Total Annual Cost - (Spot/Reserved)

Total Cost / (% Saving Spot vs. on Demand)

2 UI nodes

t3.2xlarge (8vCPU 32 Memory GiB)

Spot instance

$1,748 / $5,830

 

 

 

 

 

 

$6,630 / $14,794 (56%)

2 Processor nodes

t3.2xlarge (8vCPU 32 Memory GiB)

Spot instance

$1,748 / $5,830

1 Master node

t3.large (2vCPU 8 Memory GiB)

Reserved instance

$728

1 ELB

N.A.

Reserved

$246

1 EFS

Enhanced Elastic Throughput*

Reserved

$2,160

* Elastic mode, your throughput scales automatically and you only pay for what you use (to be added to TCO)

Up to 100GB/Day / 30 days retention

Resource

Spec

Type

Estimated Total Annual Cost - (Spot/Reserved)

Total Cost / (% Saving Spot vs. on Demand)

Resource

Spec

Type

Estimated Total Annual Cost - (Spot/Reserved)

Total Cost / (% Saving Spot vs. on Demand)

2 UI nodes

t3.2xlarge (8vCPU 32 Memory GiB)

Spot instance

$1,748 / $5,830

 

 

 

 

 

 

$9,664/ $19,869 (52%)

3 Processor nodes

t3.2xlarge (8vCPU 32 Memory GiB)

Spot instance

$2,622 / $8,745

1 Master node

t3.large (2vCPU 8 Memory GiB)

Reserved instance

$728

1 ELB

N.A.

Reserved

$246

1 EFS

Enhanced Elastic Throughput*

Reserved

$4,320

* Elastic mode, your throughput scales automatically and you only pay for what you use (to be added to TCO)

Up to 500GB/Day / 30 days retention

Resource

Spec

Type

Estimated Total Annual Cost - (Spot/Reserved)

Total Cost / (% Saving Spot vs. on Demand)

Resource

Spec

Type

Estimated Total Annual Cost - (Spot/Reserved)

Total Cost / (% Saving Spot vs. on Demand)

2 UI nodes

t3.2xlarge (8vCPU 32 Memory GiB)

Spot instance

$1,748 / $5,830

 

 

 

 

 

 

$28,692/ $42,979 (34%)

5 Processor nodes

t3.2xlarge (8vCPU 32 Memory GiB)

Spot instance

$4,370 / $14,575

1 Master node

t3.large (2vCPU 8 Memory GiB)

Reserved instance

$728

1 ELB

N.A.

Reserved

$246

1 EFS

Enhanced Elastic Throughput*

Reserved

$21,600

* Elastic mode, your throughput scales automatically and you only pay for what you use (to be added to TCO)

Up to 1,000GB/Day / 30 days retention

Resource

Spec

Type

Estimated Total Annual Cost - (Spot/Reserved)

Total Cost / (% Saving Spot vs. on Demand)

Resource

Spec

Type

Estimated Total Annual Cost - (Spot/Reserved)

Total Cost / (% Saving Spot vs. on Demand)

2 UI nodes

t3.2xlarge (8vCPU 32 Memory GiB)

Spot instance

$1,748 / $5,830

 

 

 

 

 

 

$52,914 / $73,324 (28%)

8 Processor nodes

t3.2xlarge (8vCPU 32 Memory GiB)

Spot instance

$6,992/ $23,320

1 Master node

t3.large (2vCPU 8 Memory GiB)

Reserved instance

$728

1 ELB

N.A.

Reserved

$246

1 EFS

Enhanced Elastic Throughput*

Reserved

$43,200

* Elastic mode, your throughput scales automatically and you only pay for what you use (to be added to TCO)