« Product Requirement Document Samples | Main | TheLadders.Com does Hack Day »

Introducing Javascript full-text search

Here at TheLadders.com, we're big fans of Lucene -- it's fast, flexible, and easy to use. There's also plenty of documentation and lots of helpful folks on the mailing lists.  We use it in our job search and resume search, as well as a few internal applications.

For our "MyJobs" project, we were faced with a requirement to display a list of jobs that a user had applied to or saved, and in ajax-style (without reloading the page), narrow that list down as they were typing in a search term.  Since we had all this data on the client side for display anyway, we thought, hey, can we do the searching there, too?  Pretty quickly we were able to prototype a simple client-side only text index and search.  We ended up being to reuse this code in a few different areas of the website.

API Use

creating an index:
// creating an index, adding documents
var index = new LADDERS.search.index();
var d = new LADDERS.search.document();
d.add("id", 1);
d.add("text", "one two three");
index.addDocument(d);
d = new LADDERS.search.document();
d.add("id", 2);
d.add("text", "four five six");
d.add("moretext", "ten");
index.addDocument(d);
// etc...
searching:
// searching
var searchTerms =
["one", "one two", "two one", "four six", "f*", "t*"];

for (var w in searchTerms)
{
  print(searchTerms[w] + " => " + index.search(searchTerms[w]));
}
output:

one => 1
one two => 1
two one => 1
four six => 2
f* =&g\t; 2
t* => 1,2
Indexing

You can create as many indices as you like in your application.  Each document must have a field labeled "id".  The value of this field will be returned to you if the document matches your search.  Any other fields you add will be searchable once the document is added to the index.  Multi-field search is not currently supported; all fields (except for id) are treated identically.

Searching

Only "AND" search is supported.  That is, only documents that match all the words of your search will be returned, but the order in which those words appear is not important.  Simple wildcarding (*) is supported.

Implementation

Despite the limitations in the current implementation (detailed below), we've found this search functionality to be surprisingly useful.  Additionally, moving this logic directly to the front-end greatly simplifies the backend work, as well as client-server communication and also the number of requests that are necessary.
We're releasing this as open source under the Apache License, available here.

It's a really simple inverted index implementation, and nothing more.  Internally we've experimented a bit with more complex structures (a naive javascript trie implementation), and while this dramatically sped up wildcard searching as one would expect, it also dramatically reduced the indexing speed due to the large number of additional javascript objects created.  In fact, adding any additional information to the index (e.g. offset, for doing phrase matching) slows down indexing speed greatly because of the large number of javascript objects required to hold that information.

Missing Features / TODOs

  • Field-specific searching.
  • Field weighting.
  • Boolean support (i.e. "NOT", "OR").
  • Phrase queries.
  • More complex query expressions, building on above ("a AND b NOT (c OR d)").

Files / Contact

Our javascript search source code is available here under the Apache License

Please report any issues through the javascript search project page.

You can also send any comments, corrections, updates, suggestions to js-search [at] theladders [dot] com.

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)