Sunday, December 30, 2012

Part 2: Comparison Analysis: Amazon CloudSearch vs Apache Solr

Comparison Analysis Apache Solr vs Amazon CloudSearch - continued from Part 1.......

“Did you mean…” feature:
Sometimes when you search for a word, you will be presented with correct spelling.  Search engines like google automatically correct the spelling and present you with even the search result. This feature of presenting the user with spelling corrected suggestions is called “Did you mean” feature.
Apache Solr supports this feature with the Spellcheck search component. The recommended approach is to build a word corpus based on the index principally because your data will contain proper nouns and other words not present in a general-purpose dictionary.
Amazon CloudSearch has no support for “Did you mean…” feature currently
Advantage: Apache Solr
Feature weight: High

Rich Documents Support:

Rich document types like HTML, PDF, Word etc can be uploaded into the search engine for providing searchable access. These uploaded documents will be parsed into a native format and indexed by the search engines. Such indexed documents can be searched using the common search terms and patterns by the users/applications. Usually systems like DocumentManagement, CMS etc use this feature of a search engine/service to help itscustomers search through the documents uploaded. Typically in enterprise scenario you can expect variety of document formats to flow into the search systems from different applications.      
Apache Solr has support for rich document parsing & indexing using Apache Tika.
Amazon CloudSearch expects data to be in Search Data Format (JSON & XML). CloudSearch supports uploading rich documents via the Console, or via the cs-generate-sdf command line tool. With CloudSearch you can use cs-generate-sdf to extract the data on the client, and send the text to CloudSearch.
Advantage: Neutral
Feature Weight: High

Feature Customization:
Sometimes search software’s may not support some specific feature natively because there might not be sufficient demand for them to be added in core. In such cases, some search software’s provide capability to customize and extend their existing feature sets as plugins and modules.           Amazon CloudSearch, being a proprietary creation, does not allow for any customization either through plugin integration or via extending functionalities. Features will be rolled out only by AWS team. In my experience with AWS team, they are usually very proactive, accessible and receptive. You can speak to AWS architect or product manager and explain your specific need.  In case if your specific need is not be as specific as you think and it is being asked by considerable number of customers around the world, they will include this in their road map.
Apache Solr, being open source, allows customizations of analysers, tokenizers, indexers, query analysis through plugins and via extending their code base.
Advantage: Apache Solr
Feature weight: Medium

Stemming, Stop Words and Synonyms:
Stemming: A stemming dictionary maps related words to a common stem. A stem is typically the root or base word from which variants are derived. For example, run is the stem of running and ran.
Stop words: Stopwords are words that should typically be ignored both during indexing and at search time because they are either insignificant or so common that including them would result in a massive number of matches. Example: a,an, and, the, to… etc are some commonly used words which can be ignored during indexing.
Synonyms: You can configure synonyms for terms that appear in the data you are searching. That way, if a user searches for the synonym rather than the indexed term, the results will include documents that contain the indexed term. For example, you might want to configure synonyms so that a search for "Rocky Four" or "Rocky 4" will match the movie titled "Rocky IV". To do that, you would configure 4 and four as synonyms of the indexed term IV
Both Apache Solr and Amazon Cloud Search support these features.
Advantage: Neutral
Feature Weight: High

Support for protocols:
Both Amazon CloudSearch and Apache Solr support HTTP & HTTPS protocols. Amazon CloudSearch supports HTTPS and includes web service interfaces to configure firewall settings that control network access to your domain.
Advantage: Neutral

No comments:

Need Consulting help ?


Email *

Message *

All posts, comments, views expressed in this blog are my own and does not represent the positions or views of my past, present or future employers. The intention of this blog is to share my experience and views. Content is subject to change without any notice. While I would do my best to quote the original author or copyright owners wherever I reference them, if you find any of the content / images violating copyright, please let me know and I will act upon it immediately. Lastly, I encourage you to share the content of this blog in general with other online communities for non-commercial and educational purposes.