Today, I want to share something about the search feature in Hiberante Search. As mentionned in the official guide Getting started with Hibernate Search :

Now we’ll finally execute a first search. The general approach is to create a Lucene query, …, and then wrap this query into a org.hibernate.Query in order to get all the functionality one is used to from the Hibernate API.

In this article, I will focus on the Apache Lucene queries and find out what features can we use on it.

The first step it to obtain a org.hibernate.search.query.dsl.QueryBuilder object, configured for seraching on a given entity:

// construct a DSL query builder                                                
QueryBuilder queryBuilder = fullTextEntityManager                               
    .getSearchFactory()                                                         
    .buildQueryBuilder()                                                        
    .forEntity(Address.class)                                                   
    .get();

Once done, we can go on to study different kinds of queries.

Keyword query

The most basic form of search. As the name suggests, this query type searches for one or more particular words.

Query luceneQuery = queryBuilder
    .keyword()
    .onFields("name", "type")
    .matching(searchString)
    .createQuery();

Example usage

I have built a webapp example based on US addresses. If I set searchString to san francisco rd, then here are the top 10 results returned :

id name type Left zip Right zip
6002503 San Francisco Rd 62901 62901
6002504 San Francisco Rd 62901 62901
6002505 San Francisco Rd 62901 62901
6002506 San Francisco Rd 62901 62901
6002507 San Francisco Rd 62901 62901
6002508 San Francisco Rd 62901 62901
6002509 San Francisco Rd 62901 62901
6002510 San Francisco Rd 62901 62901
8210917 Francisco Rd 49089 49089
8210919 Francisco Rd 49089 49089

We can see that the records 6002503, 6002504, 6002505, 6002506, 6002507, 6002508, 6002509, 6002510 match all the 3 keywords san, francisco and rd. By the way, in default configuration of Apache Lucene query, keywords are not case sensitive. That is why they have been returned first. Then the record 8210917 and 8210919 come, with 2 words matched: francisco and rd.

Fuzzy query

With a fuzzy search, keywords match against fields even when they are off by one or more characters. Check wikipedia to know more about the Edit Distance.

Query luceneQuery = queryBuilder
    .keyword()
    .fuzzy()
    .withEditDistanceUpTo(2)   // default 2 (can be 0, 1, 2)
    .onFields("name", "type")
    .matching(searchString)
    .createQuery();

Example usage

With searchString = "san francsco", we can still find a similar result thanks to fuzzy query :

id name type Left zip Right zip
6002503 San Francisco Rd 62901 62901
6002504 San Francisco Rd 62901 62901
6002505 San Francisco Rd 62901 62901
6002506 San Francisco Rd 62901 62901
6002507 San Francisco Rd 62901 62901
6002508 San Francisco Rd 62901 62901
6002509 San Francisco Rd 62901 62901
6002510 San Francisco Rd 62901 62901
8210917 Francisco Rd 49089 49089
8210919 Francisco Rd 49089 49089

Even if the character i is missing in the word francsco, the word is reachable because its edit distance is 1. That means we only need to do 1 edition to transform francsco to francisco.

Can we set up an edit distance greater than 2 ?

No, we can’t. According to the Hibernate search javadoc, this is not allowed. Maximum value of the edit distance. Roughly speaking, the number of changes between two terms to be considered close enough. Can be either 1 or 2 (0 would mean no fuzziness). Defaults to 2.

Lucene supports single and multiple character wildcard searches within single terms (not within phrase queries),

  • To perform a single character wildcard search, use the ? symbol
  • To perform a multiple character wildcard search, use the * symbol
Query luceneQuery = queryBuilder
    .keyword()
    .wildcard()
    .onFields("name", "type")
    .matching(searchString)
    .createQuery();

Example usage

Set searchString = "franci?co" and search with wildcard option:

id name type Left zip Right zip
3758060 Francisco Cir 6706 6706
6002503 San Francisco Rd 62901 62901
6002504 San Francisco Rd 62901 62901
6002505 San Francisco Rd 62901 62901
6002506 San Francisco Rd 62901 62901
6002507 San Francisco Rd 62901 62901
6002508 San Francisco Rd 62901 62901
6002509 San Francisco Rd 62901 62901
6002510 San Francisco Rd 62901 62901
15410367 Francisco St 46360 46360