m-chele

lunedì 7 gennaio 2013

Simply search for multiple patterns using Linq

In this post I would like to write about a simple method for find text patterns. The method must return a set of articles containing the searched pattern in the description field of the correspondig translations record. The most naive solution may be to interpret the token as a single word and to check if it's contained in the translations text:

// string token: searched text
// IEnumerable translations: translations filtered 
//     for the actual language
IEnumerable<Article> articles =
            from t in translations.ToList()
            where t.Contains(token)
            join a in db.Article
                on t.Code equals a.ArticleCode
            select a;

This is very simple but it's not good enough because I would like the user can search with multiple words. I considered that I can treat translations and search patterns as sets of strings and find the intersected elements. Then I read about Linq Intersect method and I decided to treat each translation as a set of string and not as a whole text and treat the search pattern as a set of string splitting it on space char. I wrote a method like this:

// IEnumerable translations: translations filtered 
//     for the actual language
string tokens = token.Split(' ');
IEnumerable<Article> articles =
            from t in translations.ToList()
            where t.Split(' ')
                    .Intersect(tokens, new TokensComparer())
                    .Count() >= tokens.Length
            join a in db.Article
                on t.Code equals a.ArticleCode
            select a;

Two notes about this code:

translations.ToList() is needed because the Split method cannot be called otherwise.
Count() >= tokens.Length is used for force the behaivor of a logic and taking only the translations containing all the tokens. As alternative you can consider substituting that row with .Count > 0 choosing to intercept all the translations containing at least one of the searched tokens.

TokensComparer simply executes a comparison using the Contains method.

 class TokensComparer : IEqualityComparer
    {
        ...

        public bool Equals(string token, string text)
        {
            return text.ToUpper().Contains(token.ToUpper());
        }
       
        ...
    }

Linq Intersect method create a Hash table for increasing performance in search operations using more memory space. I think my implementation is not the most powerful nor the most efficient, but I retained it a good compromise looking at the customer needs. Please feel free to express your opinions about this.

domenica 2 dicembre 2012

My late night studies: Git and GitHub

You need a GitHub account and a version of Git up and running on our Linux PC. Open a shell and execute:

git config --list

to see the current git configuration, probably you will see default name and email or nothing(I start taking my experiment over an existing configuration).

Configure Git

To configure Git using this statements:

git config --global user.name = aStrangeName
git config --global user.email = myemail@somedomain.boh

The name identifies the user during commits. If you have GitHub account you should specify the related email in order to push there your commits. Note that this is a global configuration, you can override it for a specific project executing the same statements (omitting the --global option) from within the working directory of that project. For example, to prevent spam you can define a fake user email from the GitHub website and use it in your working directory.

Create a repo

Create a repo in the current directory is pretty simple:

git init

Commit

In order to test commit and push you need to add some file to the list of the ones you want commit:

git add <fileToAdd>

You may want execute

git status

to see what happened in your working directory. In this case you will see <myfile> as a staged file.

git commit -m "comment for commit"

Execute the commit and associate the specified message. After this command you can verify that git status assert that no pending files remain.

Push

Now it's time to interact with your GitHub repo. First create a new repo: a GitHub repo should be created from remote only using the API, the git command cannot do this. For this operation you can use the API through a url-manager like curl or you can access more easily your account and create a new repo from the web interface. Then define a remote repo with:

git remote add origin https://github.com/<mygithubuser>/<mygithubrepo>.git

(you will find the correct url in your GitHub page for the repo). A git standard propose to use the name origin for the remote that point to the main repo. Now you are ready to send your files to the remote repo:

git push origin master

Note that GitHub requires that your git version is equals or greater than 1.7.10. Attempting to push something with an old version will give you a http 403 error. These are only my notes about git, you will find more interesting info at GitHub bootcamp page.

domenica 25 novembre 2012

Il mio primo AgileDay

Lo scorso sabato ho partecipato per la prima volta a un AgileDay. L'evento, giunto, se non erro, alla nona edizione, non necessita di presentazioni, basti pensare che i 600 posti disponibili più i 100 della lista di attesa sono andati esauriti in pochissime ore. Non entrerò nel dettaglio dei singoli talk (sul sito dell'evento si possono trovare riferimenti a informazioni e a commenti più interessanti) ma mi limiterò a raccontare la mia esperienza.

Nella scelta dei talk da seguire, ho cercato di identificare gli argomenti che potevano essere alla portata di un "neo agilista non praticante" e devo dire che sono riuscito a seguire abbastanza bene i racconti degli speaker incontrando qualche difficoltà di comprensione solo su argomenti più "management-oriented" oppure quando si entrava troppo nel dettaglio di metodi lean o scrum che non ho mai approfondito. A questo proposito mi chiedo se potrebbe avere senso nelle prossime edizioni indicare il "grado di difficoltà" del talk oppure se il talk è più orientato al manager o al developer anche se mi rendo conto che sarebbe difficile e forse inopportuno, categorizzare alcuni interventi su questa base.

Ad ogni modo ciò che più mi ha colpito è stato rendermi conto che esistono(in carne e ossa e non solo tra le righe di qualche blog) persone che si occupano di software con passione e aziende che non pensano solo ed esclusivamente al profitto. Il miglior esempio di quanto sto dicendo si può trovare nelle parole di Emanuele DelBono quando all'affermazione "Perché avete deciso di assumere subito a tempo indeterminato invece che tramite altre forme contrattuali?" ha risposto "perché siamo degli idealisti". Insomma ho visto tanta gente appassionata e competente, ho incontrato aziende che credono veramente e non solo a parole, che lo sviluppo del proprio business dipenda dalla qualità del lavoro più che dalla quantità e che vedono i propri collaboratori come un valore da accrescere e non come un costo da limitare...

Un'esperienza da ripetere, decisamente.

mercoledì 17 ottobre 2012

Repository per codice opensource custom

La mia azienda nell'ultimo anno ha optato per l'utilizzo di sistemi e-commerce e CMS opensource per poter proporre soluzioni complete in termini di funzionalità in modo da ridurre al minimo le personalizzazioni al codice applicativo. L'introduzione di questi nuovi sistemi è stata per me molto interessante in quanto, oltre agli sforzi di apprendimento, mi ha portato a scontrarmi con una serie di problemi legati alla gestione del codice che non avevo mai affrontato in modo approfondito:

mantenere aggiornato il mio codice con quello rilasciato dalla community
gestire agevolmente le personalizzazioni per ogni cliente.

La risposta(ovvia) è stata quella di utilizzare un sistema di controllo versione, in particolare Subversion gestito tramite il client Tortoise. La scelta è stata mutuata dal fatto di lavorare in ambiente Windows e di avere a disposizione in azienda un server svn, anche se avrei preferito Git considerato più semplice da apprendere. Premesso che non sono un esperto e che non ho nessuna pretesa di fornire una soluzione o una "best pratice", ho pensato di organizzare il mio repository(in seguito repo) in due rami principali invece(dei "classici" trunk, tags e branches) chiamati "versioni_base" e "personalizzazioni": nel primo creo una nuova diramazione per ogni nuova versione rilasciata dalla community, nel secondo ne creo una per ogni personalizzazione(in pratica per ogni cliente).

 repo
  |
  ----personalizzazioni
  |     |
  |     ----clienteA
  |     |
  |     ----clienteB
  |
  ----versioni_base
       |
       ----ver1
       |
       ----ver2

Le operazioni principali che svolgo quotidianamente sono queste:

creo una nuova personalizzazione come copia di una versione base(tipicamente la più aggiornata) e la modifico con implementazioni necessarie solo per il cliente specifico.
Implemento una nuova funzionalità che sarà valida per tutte le future release: in questo caso mi risulta comodo fare un merge per integrare la modifica nella personalizzazione(concettualmente un branch) se lo sviluppo è avvenuto su una versione base oppure per reintegrare la modifica implementata nella versione base se è stata fatta per un cliente ma mi sono reso conto che potrebbe essere utile anche per altri.
Scarico una nuova versione rilasciata dalla community e la importo nel mio repo creando un nuovo ramo in "versioni_base" ed eseguo un merge con la release più aggiornata della versione base precedente che contiene tutte le mie modifiche.

Volendo fare alcune valutazioni su questo schema mi rendo conto che pur risultando abbastanza chiaro(beh, facile che sia chiaro a me, voi che ne dite?) si potrebbe pensare di utilizzare un solo ramo per la versione base in cui head(la versione corrente del repo) punti sempre all'ultima versione con implementate le aggiunte e identificare le versioni precedenti attraverso la creazione di tags. Si potrebbe anche fare lo stesso ragionamento per il ramo delle personalizzazioni ma su questo sono abbastanza convinto che la separazione dei rami dei clienti porti una maggiore chiarezza. In conclusione questa soluzione mi risulta comoda, almeno per ora, mi riservo di verificarne la bontà al crescere del numero di release e di valutare eventuali suggerimenti in merito.