In this post I would like to write about a simple method for find text patterns.
The method must return a set of articles containing the searched pattern in the description field of the correspondig translations record.
The most naive solution may be to interpret the token as a single word and to check if it's contained in the translations text:
// string token: searched text
// IEnumerable translations: translations filtered
// for the actual language
IEnumerable<Article> articles =
from t in translations.ToList()
where t.Contains(token)
join a in db.Article
on t.Code equals a.ArticleCode
select a;
This is very simple but it's not good enough because I would like the user can search with multiple words.
I considered that I can treat translations and search patterns as sets of strings and find the intersected elements.
Then I read about Linq Intersect method and I decided to treat each translation as a set of string and not as a whole text and treat the search pattern as a set of string splitting it on space char.
I wrote a method like this:
// IEnumerable translations: translations filtered
// for the actual language
string tokens = token.Split(' ');
IEnumerable<Article> articles =
from t in translations.ToList()
where t.Split(' ')
.Intersect(tokens, new TokensComparer())
.Count() >= tokens.Length
join a in db.Article
on t.Code equals a.ArticleCode
select a;
Two notes about this code:
Linq Intersect method create a Hash table for increasing performance in search operations using more memory space.
I think my implementation is not the most powerful nor the most efficient, but I retained it a good compromise looking at the customer needs.
Please feel free to express your opinions about this.
Nessun commento:
Posta un commento