Title: ThaliaDB, a tool for data management and genetic diversity data exploration

Steinbach Delphine

INRA, France


Mrs Delphine Steinbach, studied Biology, Plant Physiology and Computer Science at the Nancy Sciences University, in France. She graduated as software engineer in 1992 (master 2 degree) with a double skill in Physiology. She worked at Genethon, to help scientists to find genes involved in complex genetics disease. She joined the team of J. Weissenbach, contributed to the first genetic map of the Human Genome, Nature 2006 and moved to Genoscope, the national sequencing center. In 2000, she joined INRA, the national institute for agronomical research and was the vice director of URGI research unit, leading its bioinformatics facility. Since 2015, she leads the ABI-SOFT INRA group at Genetique Quantitative and Evolution – The Moulon, on the campus of University Paris-Saclay.


Diversity and association genetics studies lead to manipulate a large number of individual, lines, clones and/or populations. Moreover, emergence of high-throughput technologies for both genotyping and phenotyping generates a large amount of data. These data need to be stored and managed in order to make requests and to organize datasets to be able to perform genetic diversity data exploration and association genetics analysis. ThaliaDB, V3.4, is developed for scientists to facilitate their data management and analysis. The database holds genetic resources data (germplasm/accessions), seed lots, samples, markers and genotyping and phenotyping datasets (fields environments, multiple traits under different conditions). It is well adapted for data useful to apply GWAS or genomic selection methods. It can manage high-throughput results coming from different projects and experiments and propose several views and options to explore these data and to give access to them for reuse. ThaliaDB has since july 2018, a new module to store results from population structure analysis and to represent them with a graphical charts. As new feature, it allows also representation of germplasms on a world map.  The Web tool offers to users a Select (Data view) mode and an Admin (Data administration and loading) mode. Data confidentiality is maintained using user accounts and specific levels of rights can be set on data. It enables data extraction in CSV format. The version 3 is operational in our lab since 2017 with maize data that have been produced from projects of A. Charcosset’s GQMS team and theirs partners, for 20 years. It contains today data from 23 projects, more than 3000 in bred lines, 1000 populations, 400 hybrids, 6000 seed lots, 48 genotyping experiments (dealing with more than 1 million of markers of recent technologies such as GBS) and phenotyping data coming from 32 experiments. The tool is in test in another lab for tomato and melon data. Perspectives are to test it on wheat and poplar data. ThaliaDB is developed in Python under Framework Django, running under PostGreSQL and MongoDb databases management system. Interoperability is done with external information system such as INRA URGI GnpIS plant information system (D. Steinbach NAR Databases Journal 2013, doi: 10.1093/database/bat058) and GnpIS-GnpAsso tool, through germplasms DOI identifiers. The management of traits ontologies (CropOntology) to improve data quality is currently in development in 3.5 version.