Introducing Javascript full-text search
Here at TheLadders.com,
we're big fans of Lucene
-- it's fast, flexible, and easy to use. There's also plenty of documentation
and lots of helpful folks on the mailing lists. We use it
in our job search and resume search, as well as a few
internal applications.
For our "MyJobs" project, we were faced with a requirement to display a list of jobs that a user had applied to or saved, and in ajax-style (without reloading the page), narrow that list down as they were typing in a search term. Since we had all this data on the client side for display anyway, we thought, hey, can we do the searching there, too? Pretty quickly we were able to prototype a simple client-side only text index and search. We ended up being to reuse this code in a few different areas of the website.
You can create as many indices as you like in your application. Each document must have a field labeled "id". The value of this field will be returned to you if the document matches your search. Any other fields you add will be searchable once the document is added to the index. Multi-field search is not currently supported; all fields (except for id) are treated identically.
Searching
Only "AND" search is supported. That is, only documents that match all the words of your search will be returned, but the order in which those words appear is not important. Simple wildcarding (*) is supported.
For our "MyJobs" project, we were faced with a requirement to display a list of jobs that a user had applied to or saved, and in ajax-style (without reloading the page), narrow that list down as they were typing in a search term. Since we had all this data on the client side for display anyway, we thought, hey, can we do the searching there, too? Pretty quickly we were able to prototype a simple client-side only text index and search. We ended up being to reuse this code in a few different areas of the website.
API Use
creating an index:
// creating an index, adding documents
var index = new LADDERS.search.index();
var d = new LADDERS.search.document();
d.add("id", 1);
d.add("text", "one two three");
index.addDocument(d);
d = new LADDERS.search.document();
d.add("id", 2);
d.add("text", "four five six");
d.add("moretext", "ten");
index.addDocument(d);
// etc...
searching:
// searching
var searchTerms =
["one", "one two", "two one", "four six", "f*", "t*"];
for (var w in searchTerms)
{
print(searchTerms[w] + " => " + index.search(searchTerms[w]));
}
output:
Indexing
one => 1
one two => 1
two one => 1
four six => 2
f* =&g\t; 2
t* => 1,2
You can create as many indices as you like in your application. Each document must have a field labeled "id". The value of this field will be returned to you if the document matches your search. Any other fields you add will be searchable once the document is added to the index. Multi-field search is not currently supported; all fields (except for id) are treated identically.
Searching
Only "AND" search is supported. That is, only documents that match all the words of your search will be returned, but the order in which those words appear is not important. Simple wildcarding (*) is supported.
Implementation
Despite the limitations in the current implementation (detailed below), we've found this search functionality to be surprisingly useful. Additionally, moving this logic directly to the front-end greatly simplifies the backend work, as well as client-server communication and also the number of requests that are necessary.We're releasing this as open source under the Apache License, available here.
It's a really simple inverted index implementation, and nothing more. Internally we've experimented a bit with more complex structures (a naive javascript trie implementation), and while this dramatically sped up wildcard searching as one would expect, it also dramatically reduced the indexing speed due to the large number of additional javascript objects created. In fact, adding any additional information to the index (e.g. offset, for doing phrase matching) slows down indexing speed greatly because of the large number of javascript objects required to hold that information.
Missing Features / TODOs
- Field-specific searching.
- Field weighting.
- Boolean support (i.e. "NOT", "OR").
- Phrase queries.
- More complex query expressions, building on above ("a AND b NOT (c OR d)").
Files / Contact
Our javascript search source code is available here under the Apache License
Please report any issues through the javascript search project page.
You can also send any comments, corrections, updates, suggestions to js-search [at] theladders [dot] com.