Sunday, February 3, 2013

Part 4: Economics behind choosing Amazon CloudSearch vs Apache Solr



In this article we will explore the economics behind choosing Amazon CloudSearch vs Apache Solr(v3.6) on EC2 for the search tier. I have taken 3 scenarios based on which the cost comparison is done. 

Scenario 1: A small application with constant load pattern

Application nature:
·         Load Volatility pattern: Constant
·         Utilization: Low to Medium
·         Dependency on Search Layer: Low
Data requirements:
·         Each document size is ~ 1 KB (For easy calculation purposes)
·         500 MB of Search Index data
·         50 K- 100 k requests per day
·         Low concurrency
Batch and Index Rebuilds:
·         24 Batch uploads per day (each batch 100 documents of 1KB each).
·         Explicit Index Rebuild once/twice a month
·         50 MB increase in Search index data per month
Administration Efforts:
·         Initial Provisioning ( One time)
·         Monitoring
·         Regular Backups
·         Index Rebuilds


Amazon Cloud Search
Apache Solr v3.6
Compute
74.4
48.36
Storage (EBS)
-
12
Batch upload
0.10
0
Index Rebuild
4
0
Data IN/OUT
0
0
Admin efforts (Person Hrs/Month @ 75)
2
10
Administration cost
150
750
Total
~ 230
~811

  • ·         Amazon EC2 US-East-Region, Costs in USD ,1 Month of compute = 744 hrs, Instance type : Small
  • ·         EBS: 10 GB volume + 100 Million IO per month+ 10 GB snapshot for Apache Solr on EC2
  • Refer pricing @ http://aws.amazon.com/cloudsearch/
  • ·         Search Expert administration effort Hourly price averaged to minimum of 75 USD/hr 
  • ·         We can observe the infra costs are almost same, but the provisioning/admin/managing costs spikes up when it comes to Apache Solr on EC2, which can be minimized using Amazon CloudSearch. 


Scenario 2: Heavily utilized Search tier
Application nature:
·         Load Volatility pattern: Peak and Valleys in a day
·         Utilization: High
·         Dependency on Search Layer: High
Data requirements:
·         Each document size is ~ 1 KB (For easy calculation purposes)
·         50 GB of Search data (Index)
·         10 million requests per day, Each Response size is 10 KB, ~ 100 GB data out per day, 3 TB per month
·         High concurrency
Batch and Index Rebuilds:
·         2048 Batch uploads per Month (each batch 5 MB of data).
·         Explicit Index Rebuild 12 times a month
·         Search Data growth: 10 GB index added every month
Administration Efforts:
·         Initial Provisioning
·         Partitioning and read scaling frequently
·         Monitoring, Regular Backups and maintaining the HA
·         Index Rebuilds

Amazon Cloud Search
Apache Solr v3.6
Compute
~2865
~2440
Storage (EBS)
-
140
Batch upload
0.30
-
Index Rebuild
50
-
Data In/Out
-
-
Admin efforts (Person Hrs/Month @ 75)
8
24
Administration cost
600
1800
Total
~ 3515
~4380

·         Amazon EC2 US-East-Region, Costs in USD ,1 Month of compute = 744 hrs, Instance type : Small
·         7 Xlarge Search Instances in Amazon Cloud Search + more instances depending upon growth
·         EBS: 100 GB volume + 500 Million IO per month+ X GB snapshot for Apache Solr on EC2
·         Apache Solr on EC2 Costs:  
o   If we are scaling up the capacity of Solr Nodes in multiple phases from m1.xlarge to m2.4xlarge depending upon the index growth (10 GB in our case) every month, then lots of manual admin labour efforts is needed. This adds to our administration cost.  
o   If we decide not to frequently scale up but start Solr initially itself with m2.4xlarge, then we have overprovisioned for first 5 months, which essentially means cost leakage again. Still monitoring, backups etc have to done on Apache Solr on EC2.The above table indicates cost calculated in this approach. Still at the end of 6th month, we need to Shard Solr on EC2 because it will exceed m2.4xlarge capacity / or Scale up again with more costly EC2 instances to keep up with the growth. Again labour efforts to be added.
o   After having all this, there is no guarantee that Apache Solr on EC2 can handle the load, since the volatility pattern is spikey in nature for a day, there could be times where Solr is pounded and not performing well also. This under performance may lead to losing customers itself.
·         Amazon CloudSearch eliminates all the scaling up/out/portioning complexities automatically. Labour cost is one of the important costs in large scale search tier setups and Amazon CloudSearch helps us keep it at minimum as we grow. Larger and more elastic our search setup requirements, then Amazon CloudSearch will easily beat the hell out of Apache Solr on EC2.
·         Scale out during heavy load is automatic in CloudSearch and it is a manual cumbersome effort in Apache Solr on EC2. Note Scale out based on Load costs are not calculated in both Apache Solr and Amazon CloudSearch.

Scenario 3: Seasonal Loads 

Application nature:
·         Load Volatility pattern: Seasonal load (1 week campaign every 2 months), other times minimal activity
·         Utilization: High
·         Dependency on Search Layer: High
Data requirements:
·         Each document size is ~ 1 KB (For easy calculation purposes)
·         Getting started with 5 GB of Search data
·         ~750 million requests (week) or more during the campaign week
·         12 hours heavy utilization and 12 hours under utilization during campaign days.
·         10 million requests during normal days
·         High concurrency during campaign week
Batch and Index Rebuilds:
·         512 Batch uploads per Month (each batch 5 MB of data).
·         Explicit Index Rebuild 12 times a month
·         Search Data growth: 2.5 GB added every month
Administration Efforts:
·         Initial Provisioning
·         Partitioning and read scaling frequently
·         Monitoring, Regular Backups and maintaining the HA
·         Index Rebuilds

Amazon Cloud Search
Apache Solr v3.6
Compute
~410 + 150 (Scale out)
~357+ 70 (scale out)
Storage (EBS)
-
50
Batch upload
0.10
-
Index Rebuild
5
-
Data In/Out
-
-
Admin efforts (Person Hrs/Month @ 75)
10
40
Administration cost
750
3000
Total
~ 1320
~3477

·         Amazon EC2 US-East-Region, Costs in USD ,1 Month of compute = 744 hrs, Instance type : Small
·         Amazon CloudSearch Costs
o   1 Xlarge Search Instances in Amazon Cloud Search during normal days
o   Imagine 3 Additional Xlarge Instance are spawned during campaign period for 1 week
o   Automated Scale out and Scale down. No efforts needed.
o   Morning till night when heavy utilization is there additional xlarge (3 new) is launched. Night till morning where not much load is there, these additional instances will be removed accordingly.
·       ·         Apache Solr on EC2 Costs:  
o   2 X m1.large EC2 instances for Solr on Normal Days
o   2 new additional m1.large instances during campaign period
o   Manual effort to scale out before campaign week and scale down post campaign week.
o   Additional EC2 instances are used all 24 hrs during the campaign week







2 comments:

mmoody said...

Solr 4.0 has been out for almost 6 months, it has significant feature and economic advantages over Solr 3.6, consider updating this (and other) related pages.

Harish Ganesan said...

Hi mmoody,

Thanks for the comment and your time reading my blog. SolrCloud 4.0 is in GA from october 2012 and we have started implementing it to some of our customers as well. Yes, it reduces some complexity on adding shards, Replica's etc. Very soon i will be publishing a detailed one sharing our experience about SolrCloud+AWS.

Need Consulting help ?

Name

Email *

Message *

DISCLAIMER
All posts, comments, views expressed in this blog are my own and does not represent the positions or views of my past, present or future employers. The intention of this blog is to share my experience and views. Content is subject to change without any notice. While I would do my best to quote the original author or copyright owners wherever I reference them, if you find any of the content / images violating copyright, please let me know and I will act upon it immediately. Lastly, I encourage you to share the content of this blog in general with other online communities for non-commercial and educational purposes.

Followers