Ebook568 pages2 hours

Apache Solr Search Patterns

Name: Apache Solr Search Patterns
Author: Jayant Kumar
ISBN: 9781783981854

By Jayant Kumar

Rating: 0 out of 5 stars

()

Read preview

About this ebook

About This Book

Learn the best use cases for using Solr in e-commerce, advertising, real-estate, and other sites
Explore Solr internals and customize the scoring algorithm in Solr
This is an easy-to-follow book with a step-by-step approach to help you get the best out of Solr search patterns

Who This Book Is For

This book is for developers who already know how to use Solr and are looking at procuring advanced strategies for improving their search using Solr. This book is also for people who work with analytics to generate graphs and reports using Solr. Moreover, if you are a search architect who is looking forward to scale your search using Solr, this is a must have book for you.

It would be helpful if you are familiar with the Java programming language.

Skip carousel

LanguageEnglish

PublisherPackt Publishing

Release dateApr 24, 2015

ISBN9781783981854

Author

Jayant Kumar

Related authors

Skip carousel

Related to Apache Solr Search Patterns

Related ebooks

Skip carousel

Neo4j High Performance
Ebook
Neo4j High Performance
bySonal Raj
Rating: 0 out of 5 stars
0 ratings
Administrating Solr
Ebook
Administrating Solr
bySurendra Mohan
Rating: 0 out of 5 stars
0 ratings
Scala Test-Driven Development
Ebook
Scala Test-Driven Development
byGaurav Sood
Rating: 0 out of 5 stars
0 ratings
Apache Solr for Indexing Data
Ebook
Apache Solr for Indexing Data
byHandiekar Sachin
Rating: 0 out of 5 stars
0 ratings
Lucene 4 Cookbook
Ebook
Lucene 4 Cookbook
byEdwood Ng
Rating: 0 out of 5 stars
0 ratings
Learning Apache Mahout Classification
Ebook
Learning Apache Mahout Classification
byGupta Ashish
Rating: 0 out of 5 stars
0 ratings
Learning Elasticsearch
Ebook
Learning Elasticsearch
byAbhishek Andhavarapu
Rating: 4 out of 5 stars
4/5
Distributed Computing in Java 9
Ebook
Distributed Computing in Java 9
byRaja Malleswara Rao Pattamsetti
Rating: 0 out of 5 stars
0 ratings
Python for Google App Engine
Ebook
Python for Google App Engine
byMassimiliano Pippi
Rating: 0 out of 5 stars
0 ratings
Apache Mahout Clustering Designs
Ebook
Apache Mahout Clustering Designs
byGupta Ashish
Rating: 0 out of 5 stars
0 ratings
Flask Blueprints
Ebook
Flask Blueprints
byPerras Joël
Rating: 0 out of 5 stars
0 ratings
Instant Redis Optimization How-to
Ebook
Instant Redis Optimization How-to
byArun Chinnachamy
Rating: 0 out of 5 stars
0 ratings
Machine Learning with Spark - Second Edition
Ebook
Machine Learning with Spark - Second Edition
byNick Pentreath
Rating: 0 out of 5 stars
0 ratings
Learn D3.js: Create interactive data-driven visualizations for the web with the D3.js library
Ebook
Learn D3.js: Create interactive data-driven visualizations for the web with the D3.js library
byHelder da Rocha
Rating: 0 out of 5 stars
0 ratings
Monitoring Hadoop
Ebook
Monitoring Hadoop
byGurmukh Singh
Rating: 0 out of 5 stars
0 ratings
Pig Design Patterns
Ebook
Pig Design Patterns
byPasupuleti Pradeep
Rating: 0 out of 5 stars
0 ratings
PostgreSQL 11 Administration Cookbook: Over 175 recipes for database administrators to manage enterprise databases
Ebook
PostgreSQL 11 Administration Cookbook: Over 175 recipes for database administrators to manage enterprise databases
bySimon Riggs
Rating: 0 out of 5 stars
0 ratings
Monitoring Elasticsearch
Ebook
Monitoring Elasticsearch
byDan Noble
Rating: 0 out of 5 stars
0 ratings
Learning Kibana 5.0
Ebook
Learning Kibana 5.0
byBahaaldine Azarmi
Rating: 0 out of 5 stars
0 ratings
Solr Cookbook - Third Edition
Ebook
Solr Cookbook - Third Edition
byRafał Kuć
Rating: 0 out of 5 stars
0 ratings
Solr in Action
Ebook
Solr in Action
byTimothy Potter
Rating: 3 out of 5 stars
3/5
Mastering Scala Machine Learning
Ebook
Mastering Scala Machine Learning
byAlex Kozlov
Rating: 0 out of 5 stars
0 ratings
Big Data Analytics
Ebook
Big Data Analytics
byVenkat Ankam
Rating: 0 out of 5 stars
0 ratings
Introduction to Deep Learning Business Applications for Developers: From Conversational Bots in Customer Service to Medical Image Processing
Ebook
Introduction to Deep Learning Business Applications for Developers: From Conversational Bots in Customer Service to Medical Image Processing
byArmando Vieira
Rating: 0 out of 5 stars
0 ratings
Akka Cookbook
Ebook
Akka Cookbook
byHéctor Veiga Ortiz
Rating: 2 out of 5 stars
2/5
Scalatra in Action
Ebook
Scalatra in Action
byRoss Baker
Rating: 0 out of 5 stars
0 ratings
Learning Azure DocumentDB
Ebook
Learning Azure DocumentDB
byBecker Riccardo
Rating: 0 out of 5 stars
0 ratings
Elasticsearch Server: Second Edition
Ebook
Elasticsearch Server: Second Edition
byRafał Kuć
Rating: 0 out of 5 stars
0 ratings
Hadoop in Practice
Ebook
Hadoop in Practice
byAlex Holmes
Rating: 0 out of 5 stars
0 ratings
Spark GraphX in Action
Ebook
Spark GraphX in Action
byMichael Malak
Rating: 0 out of 5 stars
0 ratings

Internet & Web For You

Skip carousel

Cybersecurity For Dummies
Ebook
Cybersecurity For Dummies
byJoseph Steinberg
Rating: 4 out of 5 stars
4/5
More Porn - Faster!: 50 Tips & Tools for Faster and More Efficient Porn Browsing
Ebook
More Porn - Faster!: 50 Tips & Tools for Faster and More Efficient Porn Browsing
bySue Denim
Rating: 3 out of 5 stars
3/5
The JavaScript Workshop: Learn to develop interactive web applications with clean and maintainable JavaScript code
Ebook
The JavaScript Workshop: Learn to develop interactive web applications with clean and maintainable JavaScript code
byJoseph Labrecque
Rating: 5 out of 5 stars
5/5
Coding For Dummies
Ebook
Coding For Dummies
byNikhil Abraham
Rating: 5 out of 5 stars
5/5
The Mega Box: The Ultimate Guide to the Best Free Resources on the Internet
Ebook
The Mega Box: The Ultimate Guide to the Best Free Resources on the Internet
byChris Mason
Rating: 4 out of 5 stars
4/5
The $1,000,000 Web Designer Guide: A Practical Guide for Wealth and Freedom as an Online Freelancer
Ebook
The $1,000,000 Web Designer Guide: A Practical Guide for Wealth and Freedom as an Online Freelancer
byRob Anthony O'Rourke
Rating: 5 out of 5 stars
5/5
Coding All-in-One For Dummies
Ebook
Coding All-in-One For Dummies
byNikhil Abraham
Rating: 4 out of 5 stars
4/5
The Digital Decluttering Workbook: How to Succeed with Digital Minimalism, Defeat Smartphone Addiction, Detox Social Media, and Organize Your Online Life: Declutter Workbook, #3
Ebook
The Digital Decluttering Workbook: How to Succeed with Digital Minimalism, Defeat Smartphone Addiction, Detox Social Media, and Organize Your Online Life: Declutter Workbook, #3
byAlex Wong
Rating: 3 out of 5 stars
3/5
Grokking Algorithms: An illustrated guide for programmers and other curious people
Ebook
Grokking Algorithms: An illustrated guide for programmers and other curious people
byAditya Bhargava
Rating: 4 out of 5 stars
4/5
Beginner's Guide To Starting An Etsy Print-On-Demand Shop
Ebook
Beginner's Guide To Starting An Etsy Print-On-Demand Shop
byAnn Eckhart
Rating: 0 out of 5 stars
0 ratings
Hacking With Kali Linux : A Comprehensive, Step-By-Step Beginner's Guide to Learn Ethical Hacking With Practical Examples to Computer Hacking, Wireless Network, Cybersecurity and Penetration Testing
Ebook
Hacking With Kali Linux : A Comprehensive, Step-By-Step Beginner's Guide to Learn Ethical Hacking With Practical Examples to Computer Hacking, Wireless Network, Cybersecurity and Penetration Testing
byPeter Bradley
Rating: 5 out of 5 stars
5/5
200+ Ways to Protect Your Privacy: Simple Ways to Prevent Hacks and Protect Your Privacy--On and Offline
Ebook
200+ Ways to Protect Your Privacy: Simple Ways to Prevent Hacks and Protect Your Privacy--On and Offline
byJeni Rogers
Rating: 0 out of 5 stars
0 ratings
The Logo Brainstorm Book: A Comprehensive Guide for Exploring Design Directions
Ebook
The Logo Brainstorm Book: A Comprehensive Guide for Exploring Design Directions
byJim Krause
Rating: 4 out of 5 stars
4/5
Hacking : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Ethical Hacking
Ebook
Hacking : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Ethical Hacking
byKevin Clark
Rating: 5 out of 5 stars
5/5
How To Start A Podcast
Ebook
How To Start A Podcast
byP Teague
Rating: 4 out of 5 stars
4/5
The Digital Marketing Handbook: A Step-By-Step Guide to Creating Websites That Sell
Ebook
The Digital Marketing Handbook: A Step-By-Step Guide to Creating Websites That Sell
byRobert W. Bly
Rating: 5 out of 5 stars
5/5
Learn JavaScript in 24 Hours
Ebook
Learn JavaScript in 24 Hours
byAlex Nordeen
Rating: 3 out of 5 stars
3/5
The Designer's Web Handbook: What You Need to Know to Create for the Web
Ebook
The Designer's Web Handbook: What You Need to Know to Create for the Web
byPatrick McNeil
Rating: 0 out of 5 stars
0 ratings
Remote/WebCam Notarization : Basic Understanding
Ebook
Remote/WebCam Notarization : Basic Understanding
byJeannie Eunice Franks
Rating: 3 out of 5 stars
3/5
YouTube: How to Build and Optimize Your First YouTube Channel, Marketing, SEO, Tips and Strategies for YouTube Channel Success
Ebook
YouTube: How to Build and Optimize Your First YouTube Channel, Marketing, SEO, Tips and Strategies for YouTube Channel Success
byTommy Swindali
Rating: 4 out of 5 stars
4/5
Social Engineering: The Science of Human Hacking
Ebook
Social Engineering: The Science of Human Hacking
byChristopher Hadnagy
Rating: 3 out of 5 stars
3/5
Six Figure Blogging Blueprint
Ebook
Six Figure Blogging Blueprint
byRaza Imam
Rating: 5 out of 5 stars
5/5
The Beginner's Affiliate Marketing Blueprint
Ebook
The Beginner's Affiliate Marketing Blueprint
byAlex M
Rating: 4 out of 5 stars
4/5
How to Disappear and Live Off the Grid: A CIA Insider's Guide
Ebook
How to Disappear and Live Off the Grid: A CIA Insider's Guide
byJohn Kiriakou
Rating: 0 out of 5 stars
0 ratings
Summary of Dotcom Secrets: by Russell Brunson - The Underground Playbook for Growing Your Company Online with Sales Funnels - A Comprehensive Summary
Ebook
Summary of Dotcom Secrets: by Russell Brunson - The Underground Playbook for Growing Your Company Online with Sales Funnels - A Comprehensive Summary
byAlexander Cooper
Rating: 5 out of 5 stars
5/5
The Internet Is Not What You Think It Is: A History, a Philosophy, a Warning
Ebook
The Internet Is Not What You Think It Is: A History, a Philosophy, a Warning
byJustin Smith-Ruiu
Rating: 4 out of 5 stars
4/5
Wireless Hacking 101
Ebook
Wireless Hacking 101
byKarina Astudillo
Rating: 4 out of 5 stars
4/5
Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are
Ebook
Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are
bySeth Stephens-Davidowitz
Rating: 4 out of 5 stars
4/5
How To Start A Profitable Authority Blog In Under One Hour
Ebook
How To Start A Profitable Authority Blog In Under One Hour
byPassive Marketing
Rating: 5 out of 5 stars
5/5
How To Make Money Blogging: How I Replaced My Day-Job With My Blog and How You Can Start A Blog Today
Ebook
How To Make Money Blogging: How I Replaced My Day-Job With My Blog and How You Can Start A Blog Today
byBob Lotich
Rating: 4 out of 5 stars
4/5

Related podcast episodes

Skip carousel

Distributing Geospatial Data: Distributing Geospatial Data - Every wondered why you might what to do this? Or maybe you understand the why but are unsure about the how? Perhaps you have heard people talk about partitioning data or sharding data, you might have heard some of thes...
Podcast episode
Distributing Geospatial Data: Distributing Geospatial Data - Every wondered why you might what to do this? Or maybe you understand the why but are unsure about the how? Perhaps you have heard people talk about partitioning data or sharding data, you might have heard some of thes...
byThe MapScaping Podcast - GIS, Geospatial, Remote Sensing, earth observation and digital geography
0 ratings
0% found this document useful
55: Go on The Web: Summary Andrew Gerrand (@enneff), Developer Advocate at Google & Go core contributor, talks about GoLang and how it is being used in Web Development today as well as the plans for the future of the Go as a platform for the web. Resources Go...
Podcast episode
55: Go on The Web: Summary Andrew Gerrand (@enneff), Developer Advocate at Google & Go core contributor, talks about GoLang and how it is being used in Web Development today as well as the plans for the future of the Go as a platform for the web. Resources Go...
byThe Web Platform Podcast
100%
100% found this document useful
Build Better Machine Learning Models With Confidence By Adding Validation With Deepchecks: A cross-over episode from The Machine Learning Podcast with the team from Deepchecks, exploring the challenges of testing and validating machine learning applications and their work to make it easier.
Podcast episode
Build Better Machine Learning Models With Confidence By Adding Validation With Deepchecks: A cross-over episode from The Machine Learning Podcast with the team from Deepchecks, exploring the challenges of testing and validating machine learning applications and their work to make it easier.
byThe Python Podcast.__init__
0 ratings
0% found this document useful
Level Up Your Data Platform With Active Metadata: A conversation with Atlan co-founder Prukalpa Sankar about the idea of active metadata and how it can reduce the toil involved in managing a data platform
Podcast episode
Level Up Your Data Platform With Active Metadata: A conversation with Atlan co-founder Prukalpa Sankar about the idea of active metadata and how it can reduce the toil involved in managing a data platform
byData Engineering Podcast
0 ratings
0% found this document useful
Can we predict the accuracy of a Neural Network? Yes, with the WeightWatcher tool by Charles Martin, Ph.D. - 002: In this episode we do a deep dive into deep neural networks. What conclusions can we make looking at the distribution of eigenvalues of each layer?
Podcast episode
Can we predict the accuracy of a Neural Network? Yes, with the WeightWatcher tool by Charles Martin, Ph.D. - 002: In this episode we do a deep dive into deep neural networks. What conclusions can we make looking at the distribution of eigenvalues of each layer?
byMachine Learning Cafe
0 ratings
0% found this document useful
Open Source TensorFlow with Yifei Feng: Yifei Feng, a TensorFlow software engineer, shares with Melanie and Mark about her work on the open source TensorFlow project and the tools she builds.
Podcast episode
Open Source TensorFlow with Yifei Feng: Yifei Feng, a TensorFlow software engineer, shares with Melanie and Mark about her work on the open source TensorFlow project and the tools she builds.
byGoogle Cloud Platform Podcast
100%
100% found this document useful
All Things Azure with Dwayne Monroe: Dwayne Monroe is a senior cloud architect at Cloudreach, an organization that helps enterprises maximize their cloud investments, who’s focused on Azure. Prior to joining Cloudreach, Dwayne worked as a senior Microsoft and cloud architect at High Availabi
Podcast episode
All Things Azure with Dwayne Monroe: Dwayne Monroe is a senior cloud architect at Cloudreach, an organization that helps enterprises maximize their cloud investments, who’s focused on Azure. Prior to joining Cloudreach, Dwayne worked as a senior Microsoft and cloud architect at High Availabi
byScreaming in the Cloud
0 ratings
0% found this document useful
Eureka moments with natural language processing: featuring Nicholas Mohnacky of bundleIQ
Podcast episode
Eureka moments with natural language processing: featuring Nicholas Mohnacky of bundleIQ
byPractical AI: Machine Learning, Data Science
0 ratings
0% found this document useful
DynamoDB The Database of Choice for Serverless Applications with Alex DeBrie: Alex DeBrie is the founder of DeBrie, LLC, a cloud-native training and AWS consulting company with a focus on DynamoDB and serverless technologies. He’s also the author of The DynamoDB Book, a 450-page tome that offers tips, strategies, and more about dat
Podcast episode
DynamoDB The Database of Choice for Serverless Applications with Alex DeBrie: Alex DeBrie is the founder of DeBrie, LLC, a cloud-native training and AWS consulting company with a focus on DynamoDB and serverless technologies. He’s also the author of The DynamoDB Book, a 450-page tome that offers tips, strategies, and more about dat
byScreaming in the Cloud
0 ratings
0% found this document useful
Beam and Spark with Holden Karau: This week our colleague, Holden Karau, joins us to talk about Spark and Beam.
Podcast episode
Beam and Spark with Holden Karau: This week our colleague, Holden Karau, joins us to talk about Spark and Beam.
byGoogle Cloud Platform Podcast
0 ratings
0% found this document useful
387: McKay Coppins - A Mormon Reporter on the Romney Bus: McKay Coppins - A Mormon Reporter on the Romney Bus
Podcast episode
387: McKay Coppins - A Mormon Reporter on the Romney Bus: McKay Coppins - A Mormon Reporter on the Romney Bus
byMormon Stories - LDS
0 ratings
0% found this document useful
Cloud Dataflow with Eric Anderson: Batch and stream processing systems have been evolving for the past decade. From MapReduce to Apache Storm to Dataflow, the best practices for large volume data processing have become more sophisticated as the industry and open source communities have ...
Podcast episode
Cloud Dataflow with Eric Anderson: Batch and stream processing systems have been evolving for the past decade. From MapReduce to Apache Storm to Dataflow, the best practices for large volume data processing have become more sophisticated as the industry and open source communities have ...
byCloud Engineering Archives - Software Engineering Daily
0 ratings
0% found this document useful
Reflections On Designing A Data Platform From Scratch: A monologue by Tobias Macey, the host of the show, about the design considerations involved in building a data platform and how the lessons learned from running the Data Engineering Podcast are influencing the choices made.
Podcast episode
Reflections On Designing A Data Platform From Scratch: A monologue by Tobias Macey, the host of the show, about the design considerations involved in building a data platform and how the lessons learned from running the Data Engineering Podcast are influencing the choices made.
byData Engineering Podcast
100%
100% found this document useful
Hasty Treat - Hireable Skills for 2021: In this Hasty Treat, Scott and Wes talk about hireable skills or 2021 — what you need to know to get a job and grow in your career this year! Freshbooks - Sponsor Get a 30 day free trial of Freshbooks at and put SYNTAX in the “How did...
Podcast episode
Hasty Treat - Hireable Skills for 2021: In this Hasty Treat, Scott and Wes talk about hireable skills or 2021 — what you need to know to get a job and grow in your career this year! Freshbooks - Sponsor Get a 30 day free trial of Freshbooks at and put SYNTAX in the “How did...
bySyntax - Tasty Web Development Treats
0 ratings
0% found this document useful
Jacob Aronoff - At Least One Person Who Cares To See It Through: Robby has a chat with Staff Software Engineer at Lightstep from ServiceNow, Jacob Aronoff, about the vital signs of a thriving open source software project, the importance of a passionate community behind such projects, why understanding an open source project's own dependencies is crucial before adopting it, the nuances of evaluating a project's health through performance metrics, the organizational dynamics of the OpenTelemetry community, and so much more.
Podcast episode
Jacob Aronoff - At Least One Person Who Cares To See It Through: Robby has a chat with Staff Software Engineer at Lightstep from ServiceNow, Jacob Aronoff, about the vital signs of a thriving open source software project, the importance of a passionate community behind such projects, why understanding an open source project's own dependencies is crucial before adopting it, the nuances of evaluating a project's health through performance metrics, the organizational dynamics of the OpenTelemetry community, and so much more.
byMaintainable
0 ratings
0% found this document useful
433: Falling for FastAPI: Mike's falling in love with FastAPI and gives us a hint at the next project he's building.
Podcast episode
433: Falling for FastAPI: Mike's falling in love with FastAPI and gives us a hint at the next project he's building.
byCoder Radio
0 ratings
0% found this document useful
Putting Airflow Into Production With James Meickle - Episode 43: Lessons Learned While Building A Data Science Platform With Airflow (Interview)
Podcast episode
Putting Airflow Into Production With James Meickle - Episode 43: Lessons Learned While Building A Data Science Platform With Airflow (Interview)
byData Engineering Podcast
0 ratings
0% found this document useful
69: Testing Front End Code: Summary Oren Rubin (@Shexman) goes through why it’s important to not only test the back-end code of our applications but also to test our Front End code, the integration points, and the full user experience. Oren also goes through...
Podcast episode
69: Testing Front End Code: Summary Oren Rubin (@Shexman) goes through why it’s important to not only test the back-end code of our applications but also to test our Front End code, the integration points, and the full user experience. Oren also goes through...
byThe Web Platform Podcast
0 ratings
0% found this document useful
Launchpad Studio with Malika Cantor and Peter Norvig: Malika Cantor and Peter Norvig tell us about Launchpad Studio, a program for Applied Machine Learning Startups.
Podcast episode
Launchpad Studio with Malika Cantor and Peter Norvig: Malika Cantor and Peter Norvig tell us about Launchpad Studio, a program for Applied Machine Learning Startups.
byGoogle Cloud Platform Podcast
0 ratings
0% found this document useful
WBSP467: Grow Your Business by Understanding OpenCart’s Capabilities, an Objective Panel Discussion
Podcast episode
WBSP467: Grow Your Business by Understanding OpenCart’s Capabilities, an Objective Panel Discussion
byWBSRocks: Business Growth with ERP and Digital Transformation
0 ratings
0% found this document useful
Platform Engineering at a FAANG Company
Podcast episode
Platform Engineering at a FAANG Company
byThe Cloudcast
0 ratings
0% found this document useful
418: Navigating the Changing Landscape of Technology in Food Blogging with Lauren Gray: Understanding plugins, the future of custom themes, and WordPress Site Editor with Lauren Gray from Once Coupled. ----- Welcome to episode 418 of The Food Blogger Pro Podcast! This week on the podcast, Bjork interviews Lauren Gray from Once Coupled....
Podcast episode
418: Navigating the Changing Landscape of Technology in Food Blogging with Lauren Gray: Understanding plugins, the future of custom themes, and WordPress Site Editor with Lauren Gray from Once Coupled. ----- Welcome to episode 418 of The Food Blogger Pro Podcast! This week on the podcast, Bjork interviews Lauren Gray from Once Coupled....
byThe Food Blogger Pro Podcast
0 ratings
0% found this document useful
Improving search with RAG architecture with Pinecone CEO Edo Liberty
Podcast episode
Improving search with RAG architecture with Pinecone CEO Edo Liberty
byNo Priors: Artificial Intelligence | Technology | Startups
0 ratings
0% found this document useful
How to Get Better Traffic, Free with Semantic Data: Everyone wants better ranking but that’s not really what they want. Stores want more traffic. They want that because it leads to sales. Sales is what really matters. All of SEO is just a marketing channel to get more visitors who turn into customers. The main FIX for SEO is better rankings What if there was a way to get more sales with the existing rankings you already have? How? By convincing more searchers to click on your search listings. More clicks == more traffic == more sales. What we’re talking about today is a search enhancement in Google called Rich Snippets, specifically Product Rich Snippets. They enhance your existing search results with more data and pixels. Eric Davis joins us today to walk us through it in easy to understand terms. Eric is founder of Little Stream Software, which helps Shopify entrepreneurs customize their Shopify stores using public and private Shopify Apps.
Podcast episode
How to Get Better Traffic, Free with Semantic Data: Everyone wants better ranking but that’s not really what they want. Stores want more traffic. They want that because it leads to sales. Sales is what really matters. All of SEO is just a marketing channel to get more visitors who turn into customers. The main FIX for SEO is better rankings What if there was a way to get more sales with the existing rankings you already have? How? By convincing more searchers to click on your search listings. More clicks == more traffic == more sales. What we’re talking about today is a search enhancement in Google called Rich Snippets, specifically Product Rich Snippets. They enhance your existing search results with more data and pixels. Eric Davis joins us today to walk us through it in easy to understand terms. Eric is founder of Little Stream Software, which helps Shopify entrepreneurs customize their Shopify stores using public and private Shopify Apps.
byThe Unofficial Shopify Podcast: Entrepreneur Tales
0 ratings
0% found this document useful
The Trick That Supercharged Superhuman's Growth: Rahul Vohra founder & CEO of Superhuman
Podcast episode
The Trick That Supercharged Superhuman's Growth: Rahul Vohra founder & CEO of Superhuman
byRocketship.fm
0 ratings
0% found this document useful
The Potential of Generative AI for L&D With Donald Clark
Podcast episode
The Potential of Generative AI for L&D With Donald Clark
byThe Learning & Development Podcast
0 ratings
0% found this document useful
Creators! Reach Your True Audience Using NFTs with Julien Genestoux
Podcast episode
Creators! Reach Your True Audience Using NFTs with Julien Genestoux
byBecome a Writer Today
0 ratings
0% found this document useful
14: Web Components Interop and Polymer: Today, Web Components have emerged from cutting edge technologies to technologies we can implement in our small scale production. It won’t be long before we are building large scale applications with Custom Elements, HTML Imports, Template Tags,...
Podcast episode
14: Web Components Interop and Polymer: Today, Web Components have emerged from cutting edge technologies to technologies we can implement in our small scale production. It won’t be long before we are building large scale applications with Custom Elements, HTML Imports, Template Tags,...
byThe Web Platform Podcast
0 ratings
0% found this document useful
Composable Data Analytics
Podcast episode
Composable Data Analytics
byThe Cloudcast
0 ratings
0% found this document useful
Developing Storage Solutions Before the Rest with AB Periasamay: Conversations about what the cloud is might be an infinitely convoluted one, but some are taking the conversation down paths less traveled. That is certainly the case for AB Periasamy, CEO and Co-Founder of MinIO, an open source provider of high performan
Podcast episode
Developing Storage Solutions Before the Rest with AB Periasamay: Conversations about what the cloud is might be an infinitely convoluted one, but some are taking the conversation down paths less traveled. That is certainly the case for AB Periasamy, CEO and Co-Founder of MinIO, an open source provider of high performan
byScreaming in the Cloud
0 ratings
0% found this document useful

Skip carousel

Installation
Linux Format
Article
Installation
Oct 19, 2021
1 min read
Google Answer Box Strategy
Techfastly
Article
Google Answer Box Strategy
Sep 21, 2020
Leveraging the Google PAA (People Also Ask) element on a Search Results Page for Targeted Content Creation with a Python Scraper All businesses that are online today are creating content at a furious pace. According to Technavio, a research firm, con
7 min read
Seven Ways To Future-proof Your SEO Strategy
Marketing
Article
Seven Ways To Future-proof Your SEO Strategy
Apr 8, 2018
Search engine optimisation (SEO) is always changing. To stay ahead of your competitors you need to be able to shift your SEO strategy. Expect to see mobile devices, artificial intelligence (AI) and voice search dominating the news. But what practical
3 min read
In Conversation with Surbhi Rathore
Techfastly
Article
In Conversation with Surbhi Rathore
Oct 1, 2021
4 min read
An Expert Speaks Up on What You Should Know About Programming Languages
Entrepreneur
Article
An Expert Speaks Up on What You Should Know About Programming Languages
Oct 1, 2015
1 min read
“If ‘Show Password’ Is Enabled, The Feature Sends Your Password To Their Third-party Servers”
PC Pro Magazine
Article
“If ‘Show Password’ Is Enabled, The Feature Sends Your Password To Their Third-party Servers”
Dec 8, 2022
Like most people who write for a living, I lean heavily on my spoil chicken to get me through the day. Sorry, I mean spell checker. It’s not just professional writers, either: spell checkers have become de rigueur for business users and consumers ali
7 min read
Contributing For Non - Coders
Linux Format
Article
Contributing For Non - Coders
Jan 10, 2023
9 min read
Optimise Your Website For Google
PC Pro Magazine
Article
Optimise Your Website For Google
Oct 5, 2023
8 min read
Scroll Media
NZ Marketing
Article
Scroll Media
Sep 16, 2018
You have been in the digital advertising industry since 2001, what changes have you seen and what’s your view on it today? It seems we have come a long way from faxing order forms across town and fixed weekly rates, so any automation is a good thing.
3 min read
FLASK Web Frameworks
Linux Format
Article
FLASK Web Frameworks
Jun 4, 2019
The main focus of Python has always been to get you cracking on with your coding – the language was never made for web programming. However, this has just made it more interesting to extend the language for the web, or to create an interface to web-b
9 min read
Search Engine Success
Home Business Magazine
Article
Search Engine Success
Dec 24, 2020
4 min read
Newsdesk
Linux Format
Article
Newsdesk
Dec 13, 2022
OPEN SOURCE FUNDING GitHub, Fastly and Mozilla are all looking for new projects to back, giving a boost to open source development. Small open source projects might be created solely by enthusiasts but most make use of outside developers, often paid
9 min read
Code An Admin Back-end In Django
Linux Format
Article
Code An Admin Back-end In Django
Dec 13, 2022
Credit: www.djangoproject.com OUR EXPERT Matt Holder has been a fan of the open source methodology for over two decades and uses Linux and other tools where possible. More featurepacked source code for this project can be downloaded from https://
6 min read
Sparkle Pro 4
MacFormat
Article
Sparkle Pro 4
Jun 28, 2022
3 min read
Sparkle Pro 4
MacFormat
Article
Sparkle Pro 4
Jun 28, 2022
3 min read
How To Setup A Killer Wensite In 2022
PC Pro Magazine
Article
How To Setup A Killer Wensite In 2022
Jan 6, 2022
8 min read
Smartproxy
Linux Format
Article
Smartproxy
Jan 9, 2024
2 min read
Make AI Work For You
Linux Format
Article
Make AI Work For You
Apr 2, 2024
8 min read
Smart Answers: GenAI Tool Makes It Easier To Find The Info You Need On PCWorld
PCWorld
Article
Smart Answers: GenAI Tool Makes It Easier To Find The Info You Need On PCWorld
Sep 5, 2023
4 min read
Code A Cataloguing Application In Python
Linux Format
Article
Code A Cataloguing Application In Python
Nov 15, 2022
Credit: www.djangoproject.com Matt Holder has been a fan of the open source methodology for over two decades and uses Linux and other tools where possible. More featurepacked source code for this project can be downloaded from https://github.com/mat
8 min read
How We Tested…
Linux Format
Article
How We Tested…
Aug 24, 2021
To explore the capabilities of these terminal-based browsers, we used them to access popular websites and tried to download data where applicable. We also attempted tasks that you would normally carry out on those sites. Once we had accessed a site,
1 min read
Hashtagsearches.com
TechLife
Article
Hashtagsearches.com
Jun 3, 2019
2 min read
A Place For Everything
Outdoor Photographer
Article
A Place For Everything
Aug 10, 2019
9 min read
Even If You Love Safari, Here Are 5 Reasons To Try A New Browser On Your Mac
MacWorld
Article
Even If You Love Safari, Here Are 5 Reasons To Try A New Browser On Your Mac
Oct 18, 2022
3 min read
Supercharge Chrome & Edge with these 10 BRILLIANT EXTENSIONS
PC Pro Magazine
Article
Supercharge Chrome & Edge with these 10 BRILLIANT EXTENSIONS
Oct 5, 2023
9 min read
Create A Webpage Or Blog
iCreate
Article
Create A Webpage Or Blog
Mar 26, 2020
12 min read
Hatch
Linux Format
Article
Hatch
Feb 6, 2024
2 min read
Thriving As An Ecosystem Partner
The European Business Review
Article
Thriving As An Ecosystem Partner
Sep 30, 2022
Researching ecosystems that span industries from e-commerce and publishing to semiconductors and healthcare over the past decade, we found companies that have been successful for years by contributing to an ecosystem. Sometimes, by contributing as pa
10 min read
“Brave Pushes The Boundaries Of How User Privacy Can Be Baked In Rather Than Having To Be Added”
PC Pro Magazine
Article
“Brave Pushes The Boundaries Of How User Privacy Can Be Baked In Rather Than Having To Be Added”
May 7, 2022
8 min read
2 The Use of Python in AI and ML
Techfastly
Article
2 The Use of Python in AI and ML
Nov 30, 2020
3 min read

Related categories

Skip carousel

Reviews for Apache Solr Search Patterns

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Apache Solr Search Patterns - Jayant Kumar

Apache Solr Search Patterns

Credits

About the Author

About the Reviewers

www.PacktPub.com

Support files, eBooks, discount offers, and more

Why subscribe?

Free access for Packt account holders

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Errata

Piracy

Questions

1. Solr Indexing Internals

The job site problem statement – Solr indexing fundamentals

Working of analyzers, tokenizers, and filters

Handling a multilingual search

Measuring the quality of search results

The e-commerce problem statement

The job site problem statement

Challenges of large-scale indexing

Using multiple threads for indexing on Solr

Using the Java binary format of data for indexing

Using the ConcurrentUpdateSolrServer class for indexing

Solr configuration changes that can improve indexing performance

Planning your commit strategy

Using better hardware

Distributed indexing

The SolrCloud solution

Summary

2. Customizing the Solr Scoring Algorithm

Relevance calculation

Building a custom scorer

Drawbacks of the TF-IDF model

The information gain model

Implementing the information gain model

Options to TF-IDF similarity

BM25 similarity

DFR similarity

Summary

3. Solr Internals and Custom Queries

Working of a scorer on an inverted index

Working of OR and AND clauses

The eDisMax query parser

Working of the eDisMax query parser

The minimum should match parameter

Working of filters

Using BRS queries instead of DisMax

Building a custom query parser

Proximity search using SWAN queries

Creating a parboiled parser

Building a Solr plugin for SWAN queries

Integrating the SWAN plugin in Solr

Summary

4. Solr for Big Data

Introduction to big data

Getting data points using facets

Field faceting

Query and range faceting

Radius faceting for location-based data

The geofilt filter

The bounding box filter

The rectangle filter

Distance function queries

Radius faceting

Data analysis using pivot faceting

Graphs for analytics

Getting started with Highcharts

Displaying Solr data using Highcharts

Summary

5. Solr in E-commerce

Designing an e-commerce search

Handling unclean data

Handling variations in the product

Sorting

Problems and solutions of flash sale searches

Faceting with the option of multi-select

Faceting with hierarchical taxonomy

Faceting with size

Implementing semantic search

Optimizations

Summary

6. Solr for Spatial Search

Features of spatial search

Java Topology Suite

Well-known Text

The Spatial4j library

Lucene 4 spatial module

SpatialRecursivePrefixTreeFieldType

BBoxField (to be introduced in Solr 4.10)

Indexing for spatial search

Searching and filtering on a spatial index

The bbox query

Distance sort and relevancy boost

Advanced concepts

Quadtree

Indexing data

Searching data

Geohash

Summary

7. Using Solr in an Advertising System

Ad system functionalities

Architecture of an ad distribution system

Requirements of an ad distribution system

Schema for a listing ad

Schema for targeted ads

Performance improvements

fieldCache

fieldValueCache

documentCache

filterCache

queryResultCache

Application cache

Garbage collection

Merging Solr with Redis

Summary

8. AJAX Solr

The purpose of AJAX Solr

The AJAX Solr architecture

The Manager controller

The ParameterStore model

Available parameters

Exposed parameters

Using the ParameterHashStore class

Extending the ParameterStore class

Widgets

Working with AJAX Solr

Talking to AJAX Solr

Displaying the result

Adding facets

Adding pagination

Adding a tag cloud

Performance tuning

Summary

9. SolrCloud

The SolrCloud architecture

Centralized configuration

Setting up SolrCloud

Test setup for SolrCloud

Setting up SolrCloud in production

Setting up the Zookeeper ensemble

Setting up Tomcat with Solr

Distributed indexing and search

Routing documents to a particular shard

Adding more nodes to the SolrCloud

Fault tolerance and high availability in SolrCloud

Advanced sharding with SolrCloud

Shard splitting

Deleting a shard

Moving the existing shard to a new node

Shard splitting based on split key

Asynchronous calls

Migrating documents to another collection

Sizing and monitoring of SolrCloud

Using SolrCloud as a NoSQL database

Summary

10. Text Tagging with Lucene FST

An overview of FST and text tagging

Implementation of FST in Lucene

Text tagging algorithms

Fuzzy string matching algorithm

The Levenshtein distance algorithm

Damerau–Levenshtein distance

Using Solr for text tagging

Implementing a text tagger using Solr

Summary

Index

Apache Solr Search Patterns

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: April 2015

Production reference: 1210415

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B3 2PB, UK.

ISBN 978-1-78398-184-7

www.packtpub.com

Credits

Author

Jayant Kumar

Reviewers

Ramzi Alqrainy

Damiano Braga

Omar Shaban

Commissioning Editor

Edward Bowkett

Acquisition Editor

Vinay Argekar

Content Development Editor

Sumeet Sawant

Technical Editor

Tanmayee Patil

Copy Editors

Janbal Dharmaraj

Pooja Iyer

Project Coordinator

Akash Poojary

Proofreaders

Ting Baker

Simran Bhogal

Safis Editing

Indexer

Monica Ajmera Mehta

Graphics

Sheetal Aute

Ronak Dhruv

Production Coordinator

Shantanu N. Zagade

Cover Work

Shantanu N. Zagade

About the Author

Jayant Kumar is an experienced software professional with a bachelor of engineering degree in computer science and more than 14 years of experience in architecting and developing large-scale web applications.

Jayant is an expert on search technologies and PHP and has been working with Lucene and Solr for more than 11 years now. He is the key person responsible for introducing Lucene as a search engine on www.naukri.com, the most successful job portal in India.

Jayant is also the author of the book Apache Solr PHP Integration, Packt Publishing, which has been very successful.

Jayant has played many different important roles throughout his career, including software developer, team leader, project manager, and architect, but his primary focus has been on building scalable solutions on the Web. Currently, he is associated with the digital division of HT Media as the chief architect responsible for the job site www.shine.com.

Jayant is an avid blogger and his blog can be visited at http://jayant7k.blogspot.in. His LinkedIn profile is available at http://www.linkedin.com/in/jayantkumar.

I would like to thank the guys at Packt Publishing for giving me the opportunity to write this book. Special thanks to Vinay Argekar and Mohammed Fahad from Packt Publishing for keeping me engaged and dealing with my drafts and providing feedback at all stages.

I would like to thank my wife, Nidhi, and my parents for taking care of our kids while I was engaged in writing this book. And finally, I would like to thank my kids, Ashlesha and Banhishikha, for bearing with me while I was writing this book.

About the Reviewers

Ramzi Alqrainy is one of the most recognized experts within artificial intelligence and information retrieval fields in the Middle East. He is an active researcher and technology blogger with a focus on information retrieval.

He is a Solr engineer at Lucidworks and head of technology of www.openSooq.com, where he capitalizes on his solid experience in open source technologies in scaling up the search engine and supportive systems.

His experience in Solr, Elasticsearch, Spark, Mahout, and Hadoop stack contributed directly to the business growth through the implementations and projects that helped the key people to slice and dice information easily throughout the dashboards and data visualization solutions.

By developing more than eight full stack search engines, he was able to solve many complicated challenges about agglutination and stemming in the Arabic language.

He holds a master's degree in computer science, was among the first rank in his class, and was listed on the honor roll. His website address is http://ramzialqrainy.com. His LinkedIn profile is http://www.linkedin.com/in/ramzialqrainy. He can be contacted at .

I would like to thank my parents and sisters for always being there for me.

Damiano Braga is a senior software engineer at Trulia, where he helps the company to scale the search infrastructure and make search better. He's also an open source contributor and a conference speaker.

Prior to Trulia, he studied at and worked for the University of Ferrara (Italy), where he completed his master's degree in computer science engineering.

Omar Shaban is a software architect and software engineer with a passion for programming and open source. He began programming as a youngster in 2001. He is the lead maintainer of PHP's PECL Solr Extension and a proud PHP member.

Omar enjoys resolving complex problems, designing web applications architecture, and optimizing applications' performance. He is interested in Natural Language Processing (NLP) and machine learning.

www.PacktPub.com

Support files, eBooks, discount offers, and more

For support files and downloads related to your book, please visit www.PacktPub.com.

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

https://www2.packtpub.com/books/subscription/packtlib

Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books.

Why subscribe?

Fully searchable across every book published by Packt

Copy and paste, print, and bookmark content

On demand and accessible via a web browser

Free access for Packt account holders

If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view 9 entirely free books. Simply use your login credentials for immediate access.

Preface

Apache Solr is the most widely used full text search solution. Almost all the websites today use Solr to provide the search function. Development of the search feature with a basic Solr setup is the starting point. At a later stage, most developers find it imperative to delve into Solr to provide solutions to certain problems or add specific features. This book will provide a developer working on Solr with a deeper insight into Solr. The book will also provide strategies and concepts that are employed in the development of different solutions using Solr. You will not only learn how to tweak Solr, but will also understand how to use it to handle big data and solve scalability problems.

What this book covers

Chapter 1, Solr Indexing Internals, delves into how indexing happens in Solr and how analyzers and tokenizers work during index creation.

Chapter 2, Customizing the Solr Scoring Algorithm, discusses different scoring algorithms in Solr and how to tweak these algorithms and implement them in Solr.

Chapter 3, Solr Internals and Custom Queries, discusses in-depth how relevance calculation happens and how scorers and filters work internally in Solr. This chapter will outline how to create custom plugins in Solr.

Chapter 4, Solr for Big Data, focuses on churning out big data for analysis purposes, including various faceting concepts and tools that can be used with Solr in order to plot graphs and charts.

Chapter 5, Solr in E-commerce, discusses the problems faced during the implementation of Solr in an e-commerce website and the related strategies and solutions.

Chapter 6, Solr for Spatial Search, focuses on spatial capabilities that the current and previous Solr versions possess. This chapter will also cover important concepts such as indexing and searching or filtering strategies together with varied query types that are available with a spatial search.

Chapter 7, Using Solr in an Advertising System, discusses the problems faced during the implementation of Solr to search in an advertising system and the related strategies and solutions.

Chapter 8, AJAX Solr, focuses on an AJAX Solr feature that helps reduce dependency on the application. This chapter will also cover an in-depth understanding of AJAX Solr as a framework and its implementation.

Chapter 9, SolrCloud, provides the complete procedure to implement SolrCloud and examines the benefits of using a distributed search with SolrCloud.

Chapter 10, Text Tagging with Lucene FST, focuses on the basic understanding of an FST and its implementation and guides us in designing an algorithm for text tagging, which can be implemented using FSTs and further integrated with Solr.

What you need for this book

You will need a Windows or Linux machine with Apache configured to run the web server. You will also need the Java Development Kit (JDK) installed together with an editor to write Java programs. You will need Solr 4.8 or higher to understand the procedures.

Who this book is for

Basic knowledge of working with Solr is required to understand the advanced topics discussed in this book. An understanding of Java programming concepts is required to study the programs discussed in this book.

Conventions

In this book, you will find a number of styles of text that distinguish between different kinds of information. Here are some examples of these styles and an explanation of their meanings.

Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: The mergeFactor class controls how many segments a Lucene index is allowed to have before it is coalesced into one segment.

A block of code is set as follows:

//Create collection of documents to add to Solr server

SolrInputDocument doc1 = new SolrInputDocument();

document.addField(id,1);

document.addField(desc, description text for doc 1);

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

//Create collection of documents to add to Solr server

SolrInputDocument doc1 = new SolrInputDocument();

document.addField(id,1);

document.addField(desc, description text for doc 1);

Any command-line input or output is written as follows:

java -jar post.jar *.xml

New terms and important words are shown in bold. Words that you see on the screen, in menus or dialog boxes, appear as follows: In the index, we can see that the token Harry appears in both documents.

Note

Warnings or important notes appear in a box like this.

Tip

Tips and tricks appear like this.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or may have disliked. Reader feedback is important for us to develop titles that you really get the most out of.

To send us general feedback, simply send an e-mail to <feedback@packtpub.com>, and mention the book title on the subject line of your message.

If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide on www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Downloading the example code

You can download the example code files for all Packt books you have purchased from your account from http://www.packtpub.com. If you purchase this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

Errata

Although we have taken care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in any of our books—maybe a mistake in the text or the code—we would be grateful if you reported this to us. By doing so, you can save other readers from frustration and help us improve the subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded on our website, or added to any list of existing errata, under the Errata section of that title.

To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.

Piracy

Piracy of copyright material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works, in any form, on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

Please contact us at <copyright@packtpub.com> with a link to the suspected pirated material.

We appreciate your help in protecting our authors, and our ability to bring you valuable content.

Questions

You can contact us at <questions@packtpub.com> if you are having a problem with any aspect of the book, and we will do our best to address it.

Chapter 1. Solr Indexing Internals

This chapter will walk us through the indexing process in Solr. We will discuss how input text is broken and how an index is created in Solr. Also, we will delve into the concept of analyzers and tokenizers and the part they play in the creation of an index. Second, we will look at multilingual search using Solr and discuss the concepts used for measuring the quality of an index. Third, we will look at the problems faced during indexing while working with large amounts of input data. Finally, we will discuss SolrCloud and the problems it solves. The following topics will be discussed throughout the chapter. We will discuss use cases for Solr in e-commerce and job sites. We will look at the problems faced while providing search in an e-commerce or job site:

Solr indexing fundamentals

Working of analyzers, tokenizers, and filters

Handling a multilingual search

Measuring the quality of search results

Challenges faced in large-scale indexing

Problems SolrCloud intends to solve

The e-commerce problem statement

The job site problem statement – Solr indexing fundamentals

The index created by Solr is known as an inverted index. An inverted index contains statistics and information on terms in a document. This makes a term-based search very efficient. The index created by Solr can be used to list the documents that contain the searched term. For an example of an inverted index, we can look at the index at the back of any book, as this index is the most accurate example of an inverted index. We can see meaningful terms associated with pages on which they occur within the book. Similarly, in the case of an inverted index, the terms serve to point or refer to documents in which they occur.

Inverted index example

Let us study the Solr index in depth. A Solr index consists of documents, fields, and terms, and a document consists of strings or phrases known as terms. Terms that refer to the context can be grouped together in a field. For example, consider a product on any e-commerce site. Product information can be broadly divided into multiple fields such as product name, product description, product category, and product price. Fields can be either stored or indexed or both. A stored field contains the unanalyzed, original text related to the field. The text in indexed fields can be broken down into terms. The process of breaking text into terms is known as tokenization. The terms created after tokenization are called tokens, which are then used for creating the inverted index. The tokenization process employs a list of token filters that handle various aspects of the tokenization process. For example, the tokenizer breaks a sentence into words, and the filters work on converting all of those words to lowercase. There is a huge list of analyzers and tokenizers that can be used as required.

Let us look at a working example of the indexing process with two documents having only a single field. The following are the documents:

Documents with Document Id and content (Text)

Suppose we tell Solr that the tokenization or breaking of terms should happen on whitespace. Whitespace is defined as one or more spaces or tabs. The tokens formed after the tokenization of the preceding documents are as follows:

Tokens in both documents

The inverted index thus formed will contain the following terms and associations:

Inverted index

In the index, we can see that the token Harry appears in both documents. If we search for Harry in the index we have created, the result will contain documents 1 and 2. On the other hand, the token Prince has only document 1 associated with it in the index. A search for Prince will return only document 1.

Let us look at how an index is stored in the filesystem. Refer to the following image:

Index files on disk

For the default installation of Solr, the index

Enjoying the preview?

Page 1 of 1

Apache Solr Search Patterns

About this ebook

Jayant Kumar

Related authors

Related to Apache Solr Search Patterns

Related ebooks

Internet & Web For You

Related podcast episodes

Related articles

Related categories

Reviews for Apache Solr Search Patterns

What did you think?

Book preview

Apache Solr Search Patterns - Jayant Kumar

Table of Contents

Apache Solr Search Patterns

Apache Solr Search Patterns

Credits

About the Author

About the Reviewers

Support files, eBooks, discount offers, and more

Why subscribe?

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Note

Tip

Reader feedback

Customer support

Downloading the example code

Errata

Piracy

Questions

Chapter 1. Solr Indexing Internals

The job site problem statement – Solr indexing fundamentals