Bonjour a tous

I am associate professor at University of Angers, in applied mathematics (CNU 26). I am teaching statistics and applied maths at the faculty of Science, to students with a major in biology or chemistry, and also to students pursuing the Master Data Science.

Briefly speaking, in research, I am interested in stochastic processes arising in statistical genetics (e.g. Gaussian and Chi-square processes) and in evolutionary biology (e.g. birth and death process, coalescent process ...). I also study asymptotic statistics, mixture models, Bayesian statistics and high dimensional data analysis.

The applications are related to gene mapping, genomic prediction, genomic selection, phylogenetic network inference, reconciliation between gene trees and species tree, pangenomic and whole genome duplication detection.

Currently, in Angers, I am working with Kilian Raschel (LAREMA) on phylogenetic network, funded by the PanphyloNet project. I am also mentoring Naïa Perinelle, PhD student, with Claudine Landès (IRHS) and Fabien Panloup (LAREMA) on machine learning for polyploid genomes. I am also involved in the project "DevOCGen" supported by "region Occitanie".

My email address

ce.rabieratgmail.com charles-elie.rabieratumontpellier.fr charles-elie.rabieratuniv-angers.fr

Research interests

Asymptotic Statistics / Mixture models / Statistical Genetics / Gaussian Processes / Chi-Square Processes / Computational Biology / Bayesian statistics / Phylogenetic networks / Random Trees / Reconciliation between Gene Trees and Species Trees / High-dimensional data analysis

Member

My postdoc on Bayesian statistics in evolutionary biology / phylogenetic network inference (University of Montpellier)

I worked at IMAG and at ISEM, funded by the Key Initiative Muse Data Science (KIM) and by the Genome Harvest project. KIM supported my project "Statistics for pangenomic data", whereas Genome Harvest supported my project on methodological aspects of phylogenetic networks inference. The application field (collaboration with CIRAD) was the evolutionary history of rice and also the genome mosaic structure of citrus, banana, cocoa, coffee ... We also organized the workshop "Machine learning for life sciences" in Montpellier. During my postdocs, I worked in collaboration with Celine Scornavacca, Fabio Pardi, Vincent Berry (computer scientists), Gilles Didier (applied mathematician) and Jean-Christophe Glaszmann (biologist).

In phylogenetics, species tree inference has been studied extensively for many years, and the theory behind it is relatively well known. However, a species tree is unable to model complicated biological events such as horizontal gene transfer (e.g. procaryotes but also among eucaryotes), hybridization (plants and animals, e.g. yeast, wheat, fish), introgression (e.g. rice, citrus, sea bass) and recombination. In contrast, phylogenetic networks, that differ from species tree because of reticulate edges, are able to capture all those phenomenoms. Phylogenetic networks can help providing answers to important questions raised by biologists. For instance, we can infer the domestication process of cultivated rice by comparing different scenarios associated to different networks. Then, it becomes possible for geneticists to reintroduce some wild diversity into cultivars. According to the International Rice Research Institute, improving rice varieties may help in reducing poverty, in dealing with climate change, in protecting the environment, in making rice healthier, and in increasing food security. I am interested in elaborating a new Bayesian method for inferring phylogenetic network. This method relies on a complicated mathematical model, handling Incomplete lineage sorting (ILS) (an important source of discordance between gene trees), a mutation model (a markov process evolving along gene tree branches) ... Our goal is to be able to infer the rice domestication process, with the help of our new sophisticated method. See the webpages dedicated to our simulator SimSnappNet and to the new method SnappNet .

My postdoc on High dimensional data analysis / Genomic prediction and Genomic Selection (INRA Toulouse)

I was a member of the Applied Mathematics and Computer Science Unit at INRA in Toulouse (MIAT). I worked with Brigitte Mangin (statistician), and Simona Grusea (probabilist, INSA Toulouse), on high dimensional statistics and in particular on Ridge regression and its prediction when the number of SNPs (i.e. regressors) is larger than the number of individuals. The application was genomic selection. We collaborated with a few geneticists interested in different species : Philippe Barre (Raygrass, Lusignan), Gilles Charmet (Wheat, Clermont Ferrand), Jacques David and Muriel Tavaud (Wheat, SupAgro Montpellier). Genomic selection is focused on prediction of breeding values of selection candidates by means of high density of markers. It relies on the assumption that all the genes tend to be in strong linkage disequilibrium (LD) (ie highly correlated) with at least one marker. Our main goal was to find out the factors responsible for a good prediction, in a high dimensional setting. The R code for the Plos One paper is available here . See my articles either applied, or theoretical (perfect LD and imperfect LD). Many questions remain opened for statisticians in Genomic Selection.

My postdoc on stochastic processes in evolutionary biology / Phylogenetics
(university of Madison, Wisconsin, USA)

I was a member of the stat/math phylogenetic group at the university of Madison. This interdisciplinary group is composed of biologists, statisticians and mathematicians : David Baum, Cécile Ané, Bret Larget, Sebastien Roch and students. The main goal was to propose new statistical methods to reconstruct evolutionary trees. Since more and more data from the genome are available, it becomes more and more interesting to study evolution events responsible for the diversity of genomes. Most of the methods involved in phylogenetics are Bayesian, since we have to deal with a huge parameter space. We analyzed thousand of gene families and try to infer the phylogenetic tree for each of these families. Since this work requires a huge amount of computer resources, we work in collaboration with Bill Taylor in order to use the technology offered by HTCondor, the product of years of research by the Center for High Throughput Computing in the Department of Computer Sciences at UW Madison. HTCondor is used by Dreamworks in order to make movies and was also used in the recent discovery of the Higgs boson, a new particle in physics. We were particularly interested in evolutionary events called Whole Genome Duplications (WGD), i.e. an event which creates an organism with additional copies of the entire genome of a species. WGD can be found in bananas, tomatoes, flowering plants ... Download our softwares for reconstructing gene trees and detecting WGD !

My PhD on stochastic processes in statistical genetics / QTL mapping (university of Toulouse)

I completed my PhD at the Laboratory of Statistics and Probability (LSP) of Paul Sabatier's university, Toulouse. My supervisors were Jean-Marc Azais (LSP) and Jean-Michel Elsen (INRA). The research area of my PhD in mathematical statistics was on devising statistical methods for Quantitative Trait Locus detection. I worked on selective genotyping experiments and also on the empirical process defined by the famous Interval mapping of Lander and Botstein.

First, QTL means Quantitative Trait Locus. It is any locus responsible for the variation of a quantitative trait. On this picture, you can see a few plamtrees which come from Lanzarote (Canary Islands). We may wonder why the first palmtree is largely taller than the others. There is maybe a QTL responsible for this particularity. In order to locate the QTL, people scan the genome of the palmtrees and do a statistical test at each position. That is the principle of interval mapping (Lander and Botstein 89). Since a lot of tests are performed on the genome, we are confronting to multiplicity problem. The work of Benjamini and Hochberg (95) on the false discovery rate is a way to deal with this problems of multi testing. Another approach is to study the stochastic process defined by all the correlated tests performed along the genome. I study the asymptotic properties of the corresponding empirical process. The Matlab pictures on the top of this web page gives a summary on the asymptotic properties of the empirical process. On the left, is represented the correlation whereas on the right, is given the mean function of the empirical process, that is to say the signal which contains information about the QTL. On the other hand, genotyping was very expensive in the past. That's why Lebotwitz (1987) proposed to genotype only individuals with extreme phenotypes : it is selective genotyping. I study the asymptotic properties of such a design and study also the empirical process under selective genotyping. Note that selective genotyping is still very useful nowdays, even if thousands of genetic markers are now available. We can increase the signal by genotyping only extreme individuals. Download our softwares to compute the quantiles of the maximum of Gaussian processes and Chi Square processes!

Charles-Elie Rabier

welcome to my home page

Bonjour a tous

My email address

Research interests

Member

My postdoc on Bayesian statistics in evolutionary biology / phylogenetic network inference (University of Montpellier)

My postdoc on High dimensional data analysis / Genomic prediction and Genomic Selection (INRA Toulouse)

My postdoc on stochastic processes in evolutionary biology / Phylogenetics
(university of Madison, Wisconsin, USA)

My PhD on stochastic processes in statistical genetics / QTL mapping (university of Toulouse)

Charles-Elie Rabier

welcome to my home page

Bonjour a tous

My email address

Research interests

Member

My postdoc on Bayesian statistics in evolutionary biology / phylogenetic network inference (University of Montpellier)

My postdoc on High dimensional data analysis / Genomic prediction and Genomic Selection (INRA Toulouse)

My postdoc on stochastic processes in evolutionary biology / Phylogenetics (university of Madison, Wisconsin, USA)

My PhD on stochastic processes in statistical genetics / QTL mapping (university of Toulouse)

My postdoc on stochastic processes in evolutionary biology / Phylogenetics
(university of Madison, Wisconsin, USA)