Tuesday, February 10, 2015

Lucene.NET (part 1): Permission filtering

I'm working on adding search to www.int64.io. Here is what I'm looking for in the search capability:
  • Full-text search that looks at the Title, Code, and Notes fields of a snippet
  • Has to work on Azure
  • Has to be fairly easy to set up and tweak
And in the future:
  • Will need to be able to search tags
  • Has to support more granular parameters for handling Advanced Search
I was debating between Azure Search and Lucene.NET, and picked Lucene because I didn't want to be locked into Azure - who knows, maybe I will move INT64 to AWS in the future? Plus, Lucene is a much more mature platform with a lot more information available.

So I followed this tutorial to set up Lucene: http://chriskirby.net/getting-full-text-search-up-and-running-in-azure/ and everything went pretty smoothly.

Permission filtering preparation


So I set up field indexing like this:

private static Document MakeDocumentFromSnippet(
                            SearchIndexSnippetModel snippet) {
    var doc = new Document();
    doc.Add(new Field(FIELD_ID,
        snippet.Id.ToString(),
        Field.Store.YES,
        Field.Index.NOT_ANALYZED,
        Field.TermVector.NO));
    doc.Add(new Field(FIELD_USER_ID,
        snippet.UserId,
        Field.Store.YES,
        Field.Index.NOT_ANALYZED,
        Field.TermVector.NO));
    doc.Add(new Field(FIELD_IS_PUBLIC,
        snippet.IsPublic.ToString(),
        Field.Store.YES,
        Field.Index.NOT_ANALYZED,
        Field.TermVector.NO));
    // indexing the Title, Text, and Notes below is omitted
    ....

    return doc;
}


Note the Field.Index.NOT_ANALYZED specified for the FIELD_USER_ID and FIELD_IS_PUBLIC. We will use these fields when we filter out the data the current user is not allowed to see.

Indexing and Updating


So I added some code that loops through all the existing snippets, and indexes each one like this:

public void AddSnippetToIndex(SearchIndexSnippetModel snippet) {
    using (var writer = MakeIndexWriter()) {
        var doc = MakeDocumentFromSnippet(snippet);
        writer.AddDocument(doc);
    }
}

And every time a snippet is updated, I update the index like this:

public void UpdateSnippetInIndex(SearchIndexSnippetModel snippet) {
    using (var writer = MakeIndexWriter()) {
        var doc = MakeDocumentFromSnippet(snippet);
        writer.UpdateDocument(new Term(FIELD_ID, snippet.Id.ToString()), doc);
    }
}

To make this work correctly, you have to tell the writer which document to update. Since Lucene doesn't support actual updating, it will delete all documents that match the term you provided and add the new document doc. It wasn't finding the document to update at first because I was using Field.Index.NO for the FIELD_ID field during indexing. Switching to Field.Index.NOT_ANALYZED fixed the problem.

Searching and permission filtering


I have a single input for the user to type their search query. The results returned from the query need to be filtered to only show snippets the current user has access to. To achieve this, we need to add additional conditions to the query the user types. At INT64 the user has access to his own snippets and to any public snippets.

Here is an excerpt from the Search method where I prepare the queries:

// prepare the searcher and parser
var searcher = new IndexSearcher(m_azureDirectory);
var parser = new MultiFieldQueryParser(Version.LUCENE_30,
                                       textSearchFields,
                                       new StandardAnalyzer(Version.LUCENE_30));

// parse the user query
var userQuery = parser.Parse(query);

// filter out results that don't belong to the current user and
// that are not public
var onlyThisUser = new TermQuery(new Term(FIELD_USER_ID,
                                          BizSession.CurrentState.UserId));
var onlyPublic = new TermQuery(new Term(FIELD_IS_PUBLIC, true.ToString()));
var onlyThisUserOrPublic = new BooleanQuery
                               {
                                   { onlyThisUser, Occur.SHOULD },
                                   { onlyPublic, Occur.SHOULD }
                               };

var finalQuery = new BooleanQuery
                     {
                         { onlyThisUserOrPublic, Occur.MUST },
                         { userQuery, Occur.MUST }
                     };

// do the search
var totalToRequest = (criteria.PageNumber + 1) * criteria.PageSize;
var results = searcher.Search(finalQuery, totalToRequest);

So I create additional term queries for matching the current user's id, and for matching public snippets. Then I add both of those term queries to the BooleanQuery onlyThisUserOrPublic, telling it that both conditions need to be matched using Occur.SHOULD. This is like saying, "either the User Id matches the current user's id, or the snippet is public".

Then I add both my new permission query and the user query into another BooleanQuery, telling it this time that both conditions MUST occur. This gives us a final query of (matches user input AND (snippet's user id == current user id OR snippet is public))

And then we loop through the results (there is some paging code there) and make the resulting snippet list, which I omitted for brevity.

To Be Continued...


The tags are not included in the search at this time. Once tag search is done, I will write part 2 of the post concentrating just on that.

Please leave thoughts and comments below.

9 comments:

  1. Thank you for the article.
    I am sure that almost everyone understands the advantages of using online data room over traditional one. But here we face a danger of being hacked and losing important information. So when choosing which VDR to use pay special attention to its security system and better do a virtual data room comparison.

    ReplyDelete
  2. It is a very interesting and useful article. I got a lot interesting facts from this article. I advise you to read it all.

    ReplyDelete
  3. I tried to figure out on their own in this information. But without the help of my friend, I have failed.

    ReplyDelete
  4. So I create additional term queries for matching the current user's id, and for matching public snippets.

    ReplyDelete
  5. Amazing, this is a great article, Clipping Path Associate, is one of the best clipping path service provider around the world.

    ReplyDelete
  6. I thank you for the information and articles you provided

    ReplyDelete