Tuesday, January 15, 2013

Part 5: Comparison Analysis: Amazon CloudSearch vs Apache Solr


I have summarized all the features compared in previous articles into a table for easy reference. Table is listed below:
* means positive, X means negative
Weight: High/Medium/Low are the importance of a feature (my perspective)


Feature
Weight
Amazon CloudSearch
Apache Solr on EC2
1.        
Getting Started
High
*
X
2.        
Scalability
High
*
X
3.        
Partitioning
High
*
X
4.        
Index Replication
High
*
X
5.        
High Availability
High
*
X
6.        
Cost
High
*
X
7.        
Faceted Search
High
*
*
8.        
Field Weighting/Boosting
High
*
*
9.        
Rich Documents Support
High
*
*
10.    
Stemming
High
*
*
11.    
Stop words
High
*
*
12.    
Synonyms
High
*
*
13.    
Protocols Support
High
*
*
14.    
“Find Similar” Feature
High
X
*
15.    
“Did you mean” Feature
High
X
*
16.    
Breed
Medium
*
*
17.    
Feature Customization
Medium
X
*
18.    
Auto Suggest
Medium
X
*
19.    
Geo Spatial Search
Medium
X
*
20.    
Algorithms
Low
X
*
21.    
Multilingual Support
Low
X
*


Observations:
  • Amazon Cloud Search scores overall well on most of the “High” priority features in comparison with Apache Solr, especially in infrastructure related features like scaling, partitioning etc. These infra features are essential for any online application which has heavy usage & dependence on the search tier. Usually activities like Scaling, Partitioning and Replication involve complex manual effort, planning and execution in the search tier.  Amazon CloudSearch eliminates this complexity and makes it for us by automating these essentials.
  • Manual effort involved in the above mentioned search infra activities translate directly to cost of training, managing and maintaining this tier with help of experts. These experts are usually costly!!!. Amazon CloudSearch with its automation brings down these manual efforts (thereby costs) significantly in comparison to expanding Apache Solr setups on EC2. This is an important aspect to be considered in the selection process of search tiers for your online applications. If your online application is constantly growing in terms of index and compute, then Amazon CloudSearch is the way to go compared to Apache Solr.
  • Amazon CloudSearch is well matured, robust and stable search service built on A9 search platform. For most of the online use cases like ecommerce, job search, documents search, content search etc it is more than sufficient.
  • IT teams of startups and mid-sized companies which are usually in short of technical staff (especially who cannot afford dedicated expertise for search tier) should first look into Amazon CloudSearch for their fitment. On the whole it will be a better package for them.
  • Enterprises & software vendors who are refining their products for AWS, should surely consider the merits of Amazon CloudSearch vs Apache Solr/MongoDB in their technical stack. In addition if their deployments have unpredictable or elastic load volatility, surely Amazon CloudSearch will be a top contender in cost savings.
  • Features like “Find similar” and “Did you mean” are generally used on search modules of Jobs and ecommerce applications. It is available in Apache Solr and surely good to have on Amazon CloudSearch. Though it is currently not available, i assume AWS might work on it if lots of customers are requesting for it. (+1 vote from me for this feature)
  • If you are looking to build a specialized search module with customizations, geo spatial and multilingual intelligence, currently the best choice is to use Apache Solr on Amazon EC2. Location aware applications and localized applications can use the Geo spatial and multilingual features of Apache Solr on EC2 easily (missing in Amazon CloudSearch).  I have also noticed patterns on AWS, where customers are using MongoDB for searching documents / geo spatial indexes last few years.  Though these requested features are little specific, Amazon CloudSearch surely should introduce them for wider use case adoption. (+1 vote from me for these features)
  • For Open source developers who are looking to extend/customize the functionalities of search tier Amazon CloudSearch is not recommended and Apache Solr is the best fit.


Related Articles:



3 comments:

Lewis said...

Two questions
1) Cost - Apache Solr is free - why do you have an X for it?

2) This post is titled, "Part 5" - I cannot see a "Part 4" in the list - is there one?

Thanks

Anonymous said...

Getting Started, Partitioning, Index Replication are supported in apache solr. update your post as it is misleading or clarify

Venkata Kolla said...

I checked the latest feature set of Amazon CloudSearch and some of the features which were not available when this blog was posted are added now.

Here is the list

Autocomplete suggestions
Customizable relevance ranking and query-time rank expressions
Field weighting
Geospatial search
Highlighting
Support for 34 languages

Need Consulting help ?

Name

Email *

Message *

DISCLAIMER
All posts, comments, views expressed in this blog are my own and does not represent the positions or views of my past, present or future employers. The intention of this blog is to share my experience and views. Content is subject to change without any notice. While I would do my best to quote the original author or copyright owners wherever I reference them, if you find any of the content / images violating copyright, please let me know and I will act upon it immediately. Lastly, I encourage you to share the content of this blog in general with other online communities for non-commercial and educational purposes.

Followers