/
AWS Spot Instances based HA Cluster Installation (multi-machines)

AWS Spot Instances based HA Cluster Installation (multi-machines)

XPLG runs as a stateful application and therefore supports basing the majority of the cluster resources (specifically the processing units which, typically, are the most expensive) on spot instances.
Below please see a detailed guide of how to deploy a spot instances based cluster on AWS. Please make sure to review the architecture and determine the required number of machines based on the daily digestion capacity.

 

Creating Spot Instances (UI and Processor nodes)

 

  1. In the AWS console, go to EC2 tab

  2. Click on

  3. Provide a name for your instance and choose relevant OS and OS Architecture

 

4. Choose Instance type according to XPLG hardware requirements which is based on the cluster capacity, and choose relevant ssh key for connection to the instance for future use.

5. Choose your relevant network settings according to your architecture and topology:

 

6. Configure instance disk. Please make sure that ‘Delete on termination’ is set to No. This will allow to save the spot instance state:

7. Click on advanced settings:

  1. Click on Request Spot Instances:

 

2. Click on ‘Customize’, and select:

●     Request type: Persistent

●     Interruption behavior: Stop

●     Shutdown behavior: Stop

9. Click on ‘Launch Instance’


  1. Your spot is ready. The internal IP and attached disk will remain the same after the stop\start of the spot.

  2. Create as many spot instances as you need (UI nodes, Processors nodes).

  3. Connect to instance and install XPLG node with relevant role (UI or Processor) as required.

  4. After installation attach the EFS storage and configure the node in Cluster Mode with share path as detailed below.

 

Creating a Reseved MASTER node

The only reserved machine should be the MASTER node. The MASTER orchestrates the cluster and identifies which processors are available for processing jobs.
During a processor downtime/replacement the MASTER will allocate jobs to the other available nodes to ensure continuity.

The MASTER node does not perform any processing, therefore it should be a small machine (4 CPUs, 8GB RAM) to save costs since its a reserved instance.

  1. In the AWS console, go to EC2 tab.

  2. Create an instance.

  3. Connect to instance and install XPLG node with relevant role (UI or Processor) as required.

  4. After installation attach the EFS storage and configure the node in Cluster Mode with share path as detailed below.

Create EFS for the cluster

 

  1. In the AWS console go to Amazon EFS and click on

  2. Provide a name, VPC and storage class as required for your company:

 

3. Click On ‘Customize’ and go to Performance settings.

4. Choose ‘Enhanced’, choose ‘Elastic’ (With Elastic mode, your throughput scales automatically and you only pay for what you use)

5. Click on Next and Create the EFS Storage

6. Inside newly created EFS storage click on

7. Mount the EFS to your Spot Instances with one of the following commands:

 

8. Important: Make sure your fstab is update accordingly for persistency:

a. Open the /etc/fstab file in an editor.

b. To automatically mount a file system using NFS instead of the EFS mount helper, add the following line to the /etc/fstab file.

●     Replace file_system_id with the ID of the file system you are mounting.

●     Replace aws-region with the AWS Region that the file system in, such as us-east-1.

●     Replace mount_point with the file system's mount point.

c. file_system_id.efs.aws-region.amazonaws.com:/ mount_point nfs4 nfsvers=4.0,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport,_netdev 0 0

 

Create ELB for UI Instances

1. In the AWS console go to EC2 and in the right menu under Load Balancing click on Load Balancers:

2. Click on

3. Choose Application Load balancer:

●     Provide a name for the LB

●     Choose ‘Internet Facing’ for WAN access or Internal for your LAN VPC access

●     Choose your VPC where your spot instances being created

●     Attach your Spot Instances security group

●     Choose your Port for the listener (this port will be used for external\internal clients), in this example, choose port 80:

●     Click on Create Target Group>

●     In the new window select Target Type: Instances

●     Provide Target Group name (xplg-spots-ui)

●     Choose the ports where traffic will be sent to the group (30303) and leave the rest of the configuration default

●     Click on Next

●     Choose the relevant instances that will sit in the group:

●     Click Include as pending below

●     Click on

●     Return to your LB creation screen and choose the group you created:

  • Click on

4. Once your LB created, check the created DNS name:

 

5. Copy the URL and add your port you mapped for external use (in case of 80 you don’t need to add anything)

6. Once you will browse to the URL you will be redirected to one of the XPLG Instances

 

 

How spot works and when the might stop

The following are the possible reasons that Amazon EC2 might interrupt your Spot Instances:

Capacity

Amazon EC2 can interrupt your Spot Instance when it needs it back. EC2 reclaims your instance mainly to repurpose capacity, but it can also occur for other reasons such as host maintenance or hardware decommission.

Price

You can specify the maximum price in your Spot request. However, if you specify a maximum price, your instances will be interrupted more frequently than if you do not specify it.

Constraints

If your Spot request includes a constraint such as a launch group or an Availability Zone group, the Spot Instances are terminated as a group when the constraint can no longer be met.

Based on the following:

  1. Spot instance will stop and will be restarted by AWS once resources are available in the Region you created the spots. Due to the low cost of spot instances we recommend planning the cluster with 10%-15% “spare” resources to avoid interruptions on spot replacement.

  2. If needed, you can instantly start them after they stopped, or with automation scripts and by that achieve near to 0 downtime as a whole.