lunedì 7 gennaio 2013

Simply search for multiple patterns using Linq

In this post I would like to write about a simple method for find text patterns. The method must return a set of articles containing the searched pattern in the description field of the correspondig translations record. The most naive solution may be to interpret the token as a single word and to check if it's contained in the translations text:
// string token: searched text
// IEnumerable translations: translations filtered 
//     for the actual language
IEnumerable<Article> articles =
            from t in translations.ToList()
            where t.Contains(token)
            join a in db.Article
                on t.Code equals a.ArticleCode
            select a;
This is very simple but it's not good enough because I would like the user can search with multiple words. I considered that I can treat translations and search patterns as sets of strings and find the intersected elements. Then I read about Linq Intersect method and I decided to treat each translation as a set of string and not as a whole text and treat the search pattern as a set of string splitting it on space char. I wrote a method like this:
// IEnumerable translations: translations filtered 
//     for the actual language
string tokens = token.Split(' ');
IEnumerable<Article> articles =
            from t in translations.ToList()
            where t.Split(' ')
                    .Intersect(tokens, new TokensComparer())
                    .Count() >= tokens.Length
            join a in db.Article
                on t.Code equals a.ArticleCode
            select a;
Two notes about this code:
  • translations.ToList() is needed because the Split method cannot be called otherwise.
  • Count() >= tokens.Length is used for force the behaivor of a logic and taking only the translations containing all the tokens. As alternative you can consider substituting that row with .Count > 0 choosing to intercept all the translations containing at least one of the searched tokens.
  • TokensComparer simply executes a comparison using the Contains method.
     class TokensComparer : IEqualityComparer
        {
            ...
    
            public bool Equals(string token, string text)
            {
                return text.ToUpper().Contains(token.ToUpper());
            }
           
            ...
        }
    
Linq Intersect method create a Hash table for increasing performance in search operations using more memory space. I think my implementation is not the most powerful nor the most efficient, but I retained it a good compromise looking at the customer needs. Please feel free to express your opinions about this.

Nessun commento:

Posta un commento