Microarray Quality Control
As a bioinformatician at DNAVision, I have been managing customer projects and have participated in several national and European research projects (for instance the Exera FP6 project), in which I have managed Affymetrix GeneChips processing and performed bioinformatical and biostatistical analyses.
I have also, and more importantly, been working on the quality control of expression data. This has brought me to develop the yaqcaffy Biocondcutor package. In collaboration with Jean-François Laes, head of the Microarray Unit at DNAVision, I have also performed an in-depth re-analysis of the MAQC-I Affymetrix datasets to objectively and efficiently assess the inter-lab quality of microarray data, as outlined below.
Microarray quality performances have recently gained much attention, notably through the Microarray Quality Control (MAQC) initiative. These are the first steps towards successful and reliable application of microarray technology in clinical practice and regulatory decision-making. This will in turn give rise to multi-site projects, having increased quality requirements. We have used the recently published MAQC reference data sets for the Affymetrix platform to assess quality metrics in terms of variability between and within laboratories.
After two different normalizations (RMA and GCRMA), we performed intra- and inter-site analyses of the four reference RNA data sets using two independent procedures (parametric and non-parametric) to identify probe sets that show significant differences in expression. The 24 intra-site comparisons have been used as controls to define the probe sets that can be effectively considered as differentially expressed. Our results show a good correlation for the intra-site analysis, in concordance with the results found by the MAQC. We then used this list to infer how efficiently inter-site comparisons can recover the correct results. Surprisingly, we observed notable discrepancies for some inter-site comparisons although nearly all the data passed the Affymetrix quality requirements. Finally, we correlated our results to metrics that are widely used to assess the quality of the raw data. This approach enables us to gain reliable insights in how to objectively asses the importance of various quality metrics as well as how to proceed to establish reliable thresholds in large inter-site projects.
During my stay at the Free University of Brussels (ULB), I have been involved in several projects in evolutionary biology. In micro-evolution, I have worked on the phylogeography of the leaf beetle Gonioctena variabilis (pictures here and here). I have also worked on higher taxonomic levels during my PhD thesis, where I compared different phylogenetic markers and used the cetaceans as taxonomic group model. I have also tackled some more theoretical issues in collaboration with Daniele Catanzaro.
Comparison of different phylogenetic markers In the frame of my PhD work, Michel Milinkovitch and I have examined the evolution of cetaceans based on sequences of mitochondrial genomes, the pattern of insertion of SINEs (short interspersed elements) and the DNA sequences of the newly isolated nuclear loci of the latter. The choice of cetaceans is motivated by their evolutionary history, in which adaptive radiation events have been observed. These in turn increase the probability of differential lineage sorting: if the sequences of genes or alleles are polymorphic between speciation events, it is possible, even likely, to observe mismatches between the evolutionary history of these markers, although they are correct. We tackled the study of differential lineage sorting using SINEs, as their random and irreversible insertion significantly decreases the risk of convergence.
Our multi-marker approach allowed us to reconstruct a robust tree from which we analyze these different markers using the signal/noise ratio (quality of the information content of the marker) and effort/signal (implemented efforts to obtain phylogenetic signal). We also discussed conflicting/incorrect phylogenetic relations obtained with different markers. We also described a test to differentiate differential lineage sorting and convergence using SINEs.
More details can be found here (only in French although).
Applicability of the GTR Nucleotide Substitution Model
The models of nucleotide substitutions are the basis of many methods of phylogenetic inference. Among these models, the General Time Reversible (GTR) model is one of the most comprehensive and most used. Waddell and Steel  described a procedure to estimate distances and instantaneous substitution rate for sequences evolving according to the assumptions of the GTR model. However, there are conditions that make this procedure, and as such the GTR model, unenforceable.
We have simulated the evolution of DNA sequences along 12 trees characterized by a set of biologically plausible conditions (different lengths of branches, conditions of (non-)homogeneity of the matrix of instantaneous substitution rates and different lengths of sequences). For each set of conditions, we evaluated (i) the applicability of the GTR model and (ii) the quality of alignments obtained from the simulated data.
Our results indicate that the inapplicability of the procedure of Waddell and Steel  may indeed be regarded as a practical problem as it appears before the difficulties of alignment (step necessary prior to any phylogenetic inference). The probability of this depends on the inapplicability substitution rate and data size.
More details can be found in the publication.
Phylogeography of Gonioctena variabilis
In collaboration with Jacques M. Pasteels and Patrick Mardulyn, I have published a paper reporting a case where the survey of morphological and mitochondrial DNA variation among populations of a species complex of leaf beetle, the Gonioctena variabilis complex, has lead to the identification of a hybrid zone between two species of the complex in Southern Spain.
The complex is divided into four species distributed around the western Mediterranean region. The four species, G. variabilis, Gonioctena aegrota, Gonioctena gobanzi, and Gonioctena pseudogobanzi, are traditionally determined by differences in the morphology of the male genitalia (aedeagus). To gain insight into the history of the speciation process within this species complex, we sampled populations in Portugal, Spain, Southern France, and Northern Italy. We sequenced a portion of the mitochondrial control region of each individual collected.
A haplotype network of these sequences was found to comprise four distinct groups of sequence types, separated by a relatively large number of mutations. Moreover, in most of the samples for which morphological and molecular variation is available, there is a one-to-one correspondence between haplotype group, defined by mitochondrial sequence variation, and morphological groups defined on the basis of the aedeagus, showing evidence of four historically independent evolutionary units.
This supports the use of the aedeagus morphology as a taxonomically informative trait in this species complex and a recent taxonomic revision upgrading four formerly subspecies, corresponding to the evolutionary units identified in the present study, to species status. However, some of the individuals from our samples in Southern Spain, morphologically identified as G. aegrota, were found to possess mitochondrial sequences typical of G. pseudogobanzi. The opposite case was also found. This suggests the presence of a zone of contact and hybridization between G. aegrota and G. pseudogobanzi. The location of this hybrid zone appears to be unusual. We identify historical scenarios that may explain our observations.
More details can be found in Gatto et al., Biological Journal of the Linnean Society, 2008, 94, 105–114.