Friday, March 15, 2013

Migration from Apache Solr to Amazon CloudSearch

Apache Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. It is one of the most widely used search platform in Amazon web services as wellby many product companies and SaaS platforms. Some of the major features include powerful full-text search, hit highlighting, faceted search, near real-time indexing, dynamic clustering, database integration, rich document  handling, and geospatial search etc. Apache Solr has recently launched Solr (v4.X) named SolrCloud,  which is highly reliable, scalable and fault tolerant, providing distributed indexing, replication and load-balanced querying, automated failover and recovery, centralized configuration and more. Apache Solr already powers the search and navigation features of many of the world’s largest internet sites and with this new launch they have signaled their competitors.

To know more about Apache SolrCloud deployment best practices on Amazon VPC, Refer article:

Amazon Web Services introduced a search service called Amazon CloudSearch sometime back. It is based on their robust A9 search platform. Amazon CloudSearch equally offers lots of features like Solr for searching. In addition CloudSearch is built in with automatic Scalability , Sharding and Availability. These infra features are tempting for many architects and product managers to port their applications from Solr to CloudSearch and reap the benefits.

To know more about how Apache Solr compares with Amazon CloudSearch, Refer article:

In this article , let us explore the migration from Apache Solr to Amazon CloudSearch.

Know your show stoppers before migration :

Customers who are looking to develop a product which has cloud and on-premise editions should carefully evaluate their considerations before taking this migration approach. Amazon CloudSearch is a cloud dependent service and it cannot be moved on premise. If VPN or direct connect can be established between Customer DC and Amazon Web services, then some amount of viability is there for this approach. If the customer use case does not provide feasibility for this connectivity approach, then Apache Solr on Amazon EC2 will be a better approach on their product.

Customers who are looking at Cloud portability or Cloud Lock-in should carefully evaluate their approach before taking this migration. CloudSearch is a Amazon Web Service dependent service. (Note: When java was launched in late 90's customers were worried about same lock-in issues for architecting applications using this new technology. But a decade later, Java is the market leader. I feel the same vibe while working on AWS.)

Customers who want full control of their search environment (logs, security, disk encryption etc) should understand that CloudSearch is a web service and it does not provide those at this instant of time. For such deep security needs Apache Solr on Amazon VPC with TrendMicro/Safenet suite of products will suffice. Apache Solr on Amazon EC2 inside VPC with suite of security products available on the AWS marketplace will help you assemble a robust architecture on Amazon cloud.

Bottom line is that for your search module needs Amazon IaaS provides the best possible flexibility either thru cloudSearch or Apache Solr or AWS Marketplace products.

To know more about how to configure Apache Solr on Amazon VPC, Refer article:

Compelling reasons why customers move to Amazon CloudSearch

"Amazon CloudSearch is a fully-managed service, When you migrate to Amazon CloudSearch, you no longer have to worry about provisioning and managing your own search fleet and scaling the fleet as your volume of data and traffic fluctuate. Amazon CloudSearch handles all of this for you behind the scenes. You don't need to make allowances for sharding in your application code; your index is automatically moved to a larger instance type or partitioned across multiple instances as needed for optimal performance." - from AWS documentation 

Economics behind the Migration

Some time back i did a cost comparison analysis between Amazon CloudSearch and Apache Solr(v3.6) on AWS. I found Amazon CloudSearch is cost effective in comparison to Apache Solr on the use cases detailed. 
Refer URL :

As mentioned earlier in my other blog posts Solr 4.X has solved some of the infra problems and i feel it will still reduce the overall cost compared to Solr v3.6. I have not yet done the detailed comparison on this with CloudSearch, but i still feel Amazon CloudSearch will edge out Solr on costs. 

Migrating Your Application

Migrating your application from Solr to Amazon CloudSearch is a relatively straightforward process. You need to:

  • Create an Amazon CloudSearch domain. A search domain encapsulates your searchable data and the search instances that handle your search requests.
  • Map your Solr schema to Amazon CloudSearch index fields. Once you create your search domain, you define index fields to configure Amazon CloudSearch to handle your data the same way Solr did.
  • Implement your boosting and sorting preferences using rank expressions. Rank expressions are reusable Javascript-style expressions that you can define and use to customize how your results are ranked.
  • Submit Your Data Using the Amazon CloudSearch Search Data Format. You need to adapt your application to submit data to Amazon CloudSearch instead of Solr. Data can be submitted to Amazon CloudSearch in either JSON or XML.
  • Convert Your Solr queries to the Amazon CloudSearch search syntax. Once you've uploaded your data, you're ready to start submitting search requests to your domain's search endpoint using the Amazon CloudSearch search syntax.
  • Display the results from Amazon CloudSearch. The response format is very similar to Solr, but note that by default Amazon CloudSearch returns responses in JSON. To get responses in XML, you need to specify the response format in each search request.

Amazon Web Services has published an excellent article detailing the migration process. Refer URL for the same :

Other Related Articles:

No comments:

Need Consulting help ?


Email *

Message *

All posts, comments, views expressed in this blog are my own and does not represent the positions or views of my past, present or future employers. The intention of this blog is to share my experience and views. Content is subject to change without any notice. While I would do my best to quote the original author or copyright owners wherever I reference them, if you find any of the content / images violating copyright, please let me know and I will act upon it immediately. Lastly, I encourage you to share the content of this blog in general with other online communities for non-commercial and educational purposes.