Monday, January 20, 2014

Architecting Highly Available ElastiCache Redis replication cluster in AWS VPC

In this post lets explore how to architect and create a Highly Available + Scalable Redis Cache Cluster for your web application in AWS VPC. Following is the architecture in which the ElastiCache Redis Cluster is assembled:

  • Redis Cache Cluster inside Amazon VPC for better control and security
  • Master Redis Node 1 will be created in AZ-1 of US-West
  • Redis Read Replica Node 2 will be created in AZ-2 of US-West
  • Redis Read Replica Node 3 will be created in AZ-3 of US-West



You can position all the 3 Redis Nodes in different Availability zones for Achieving High Availability (or) you can position Master + RR 1 in AZ1 and RR 2 in AZ2. This reduces the Inter - AZ latency and might give better performance for heavily used clusters.
Step 1: Creating Cache Subnet groups:
To create Cache Subnet group  navigate to the dashboard of ElastiCache, select Cache Subnet groups and then click "Create Cache Subnet group". Add the Subnet Id and the Availability Zone you need to use for the ElastiCache cluster.
We have created Amazon VPC spreading across 3 availability zones. In this post we are going to place the Redis Master and 2 Redis Replica Slaves in these 3 availability zones. Since Redis will be most of the times accessed by your application tier it is better if you place them on Private Subnet of your VPC.
Step 2: Creating Redis Cache Cluster: 
To create Cache Cluster navigate to the  dashboard of ElastiCache, select Launch Cache Cluster and provide the necessary details. We are launching it inside Amazon VPC, so we have to select the Cache Subnet group .
Note: It is mandatory to create Cache Subnet group before Launch if you need ElastiCache Redis cluster in Amazon VPC.
For test purposes i have used m1.small EC2 instance for the Redis. Since this is a fresh Redis installation, i have not mentioned S3 bucket from where the persistent Redis Snapshot will be used as input. On successful creation of the Cache Cluster you can see the details in the dashboard.
Step 3: Replication Group Creation:
To create Replication group select the option of Replication Groups from dashboard and then select the “Create Replication Group”

Select the master Redis node "redisinsidevpc" created previously as the primary cluster id of the Cache cluster.  Give the Replication group id and description as illustrated below.
Note: Replication Group should be created only after the Primary Cache Cluster node is UP and running, else you will get the error as shown below.
On the successful creation of the Replication group you can see the following details. You can observe from below screenshot that there is only one primary node in US-WEST-2A and zero Redis Read Replica's are attached to it.

Step 4: Adding Read Replica Nodes:
When you select the Replication group, you can see the option to add Redis Read Replica. We are adding 2 Redis Read Replica named Redis-RR1 (in US-West-2B) and Redis-RR2 (in US-WEST-2C). Both the Read replica's are pointed to the master node "redisinsidevpc". Currently we can add up to 5 Read replica Nodes for a Redis Master Node. This is more than enough to handle Thousands of messages per second. If you combine it with Redis Pipeline handling 100K messages per second from a node is like cake walk.
Adding Read Replica 1 in Us-West -2B
Adding Read Replica 2 in US-West-2c

On successful creation you can see the following details of Replication group in the dashboard. Now you can see there are 3 Redis nodes listed with Number of read Replica's as 2. Placing the Read Replica's and master node in multiple AZ will increase the high availability and protects you from node and AZ level failure. On our sample tests inter AZ Replication deployments had <2 second replication lag for massive writes on master and <1 second replication lag between master slave inside same AZ deployments. We pumped @100K messages per second for few minutes on m1.large Redis instance cluster. 
In event, if you need additional read scalability i recommend to use more read Replica slaves added to the master. 
In your application tier you need to use the primary Endpoint "redis-replication.qcdze2.0001.usw2.cache.amazon.aws.com:6379" shown below to connect to Redis. 
If you need to delete/reboot/Modify you can make it through the options available here.
Step 5: Promoting the Read replica:
You can also promote any node as the Primary cluster using the Promote/Demote option. There will be only one Primary Node.
Note: This step is not part of the cluster creation process.



This promotion has to be carried out with caution and proper understanding for maintaining data consistency. 

Post was co authored with Senthil 8KMiles

Other related posts:

Billion Messages - Art of Architecting scalable ElastiCache Redis tier

2 comments:

Redsmin {Redis GUI} said...

Nice article! You will be featured in our next RedisWeekly (http://redisweekly.com) !

Itamar Haber said...

This setup is certainly scalable in terms of read requests, but I'm not convinced regarding its (high) availability. AFAIK, the service doesn't feature an auto-failover mechanism.

Need Consulting help ?

Name

Email *

Message *

DISCLAIMER
All posts, comments, views expressed in this blog are my own and does not represent the positions or views of my past, present or future employers. The intention of this blog is to share my experience and views. Content is subject to change without any notice. While I would do my best to quote the original author or copyright owners wherever I reference them, if you find any of the content / images violating copyright, please let me know and I will act upon it immediately. Lastly, I encourage you to share the content of this blog in general with other online communities for non-commercial and educational purposes.

Followers