Frequently Asked Questions

What is the purpose of Gephebase?
Advances in genome sequencing and editing are accelerating the rate of discovery at a quick pace, and now, it is crucial to develop a universal,  single  resource  integrating  all  this  knowledge. The first goal of Gephebase is thus to gather all the published data about the genes and the mutations responsible for evolutionary changes in in Eukaryotes (mostly animals, yeasts and plants) into a single database, so that biologists can easily browse published data for their topic of interest (transposable elements, epigenetic mutations, snakes, carotenoid content, etc.).
The second goal of Gephebase is to perform meta-analysis on the compiled data to try to extract interesting trends about evolution and genetics.
The third goal is to help breeders, conservationists and others identify the most promising target genes for traits of interest, with potential applications in many fields (crop improvement, parasite and pest control, bioconservation, genetic diagnostic).

What has come up from Gephebase so far?
The following papers have already published meta-analyses of Gephebase or of previous versions of the dataset (2008-2014).
Courtier-Orgogozo V, Martin A (2020) The Coding Loci of Evolution and Domestication: Current Knowledge and Implications for Bio-Inspired Genome Editing. Journal of Experimental Biology 223: jeb208934. PDF
Courtier-Orgogozo, V., Arnoult, L., Prigent, S. R., Wiltgen, S., & Martin, A. (2020). Gephebase, a database of genotype–phenotype relationships for natural and domesticated variation in Eukaryotes. Nucleic acids research, 48(D1), D696-D703. PDF
Courtier-Orgogozo V, Martin A, Predicting the genetic loci of past evolution. To be published in The culture of predictability and the nature of the unpredictable. Life, evolution, behaviour. Eds. Ceccarelli D, Frezza G. PDF
Martin, A., & Courtier-Orgogozo, V. (2017). Morphological evolution repeatedly caused by mutations in signaling ligand genes. In Diversity and Evolution of Butterfly Wing Patterns (pp. 59-87). Springer, Singapore. PDF
Arnoult, L. A. (2014). La marche génétique de l’évolution. Biologie Aujourd'hui, 208(3), 237-249.
Martin, A., & Orgogozo, V. (2013). The loci of repeated evolution: a catalog of genetic hotspots of phenotypic variation. Evolution, 67(5), 1235-1250. PDF
Streisfeld, M. A., & Rausher, M. D. (2011). Population genetics, pleiotropy, and the preferential fixation of mutations during adaptive evolution. Evolution: International Journal of Organic Evolution, 65(3), 629-642.
Stern, D. L., & Orgogozo, V. (2009). Is genetic evolution predictable?. Science, 323(5915), 746-751. PDF
Stern, D. L., & Orgogozo, V. (2008). The loci of evolution: how predictable is genetic evolution?. Evolution, 62(9), 2155-2177. PDF

How are relevant papers identified for inclusion in Gephebase?
Searches for relevant papers are done manually by our team of curators. We screen major journals in evolutionary genetics, perform keyword searches on online search tools, and we pay particular attention to citations in primary research articles as well as in review papers. If you want to suggest papers to be included, please click on “Suggest an article” in the top bar menu.

Why is my gene or paper of interest not in Gephebase?
Studies are included manually into Gephebase by our team of curators and it is likely that we have missed the research you are referring to. Please click on "Suggest an Article" in the top bar above to suggest papers to be included in Gephebase or email us.

How is the data curated?
The data is curated manually by a small team of biologists. We discuss regularly about the criteria for inclusion into Gephebase, and the way the data should be curated, to try to get the the most reliable and trustful database as possible. If you want to join our team of curators, please contact us, as we definitely need more curators.

How comprehensive is Gephebase?
Since there is currently no method to detect all the relevant papers for inclusion into Gephebase, we cannot be sure that we haven’t missed a few papers. We believe that the database is pretty comprehensive until 2013 for all species.

Can I use Gephebase to do my own meta-analysis?
Yes, of course! Please feel free to contact us for technical details about the database. SQL searches are available on request. Please let us know about your publications and presentations of your meta-analyses using Gephebase, as we can include them on the Gephebase website.

Gephebase is a great resource. How can I help?
The Gephebase project started a few years ago, thanks to the fruitful collaboration of Arnaud Martin and Virginie Courtier-Orgogozo, and it is a huge project. We need help on many aspects, and not just bioinformatics (communication, meta-analysis, reviews, etc.), so please contact us. We are so glad when other people want to join us in this big adventure.

What are the criteria for inclusion in Gephebase?
There are multiple types of experimental evidence supporting a relationship between a genetic mutation and a phenotypic change. For sake of simplicity and efficiency, each gene-phenotype association is attributed only one type of Experimental Evidence among three possibilities: “Association Mapping”, “Linkage Mapping”, or “Candidate Gene”. This choice is made by Gephebase curators based on the best evidence available for a given genotype-phenotype relationship. Gene-to-phenotype identified by Linkage Mapping with resolutions below 500kb have priority in the dataset. Association Mapping studies are included based on individual judgment, with a strong bias towards associations that have been confirmed in reverse genetics studies. In other words, Gephebase intends to be more stringent than a compilation of statistically-significant SNPs, and attempts to select studies where a given genotype-phenotype association is relatively well supported or understood.

The advanced search uses boolean operator, but how can I add parentheses to group the arguments together?
An argument preceded by AND/ANDNOT is grouped with the previous argument. For example, the search OR A AND B OR C ANDNOT D will be interpreted as a (A AND B) OR (C NOT D). Conversely, if you need a search A AND (B OR C), use this input structure: OR A AND B OR A AND C

Certains mutations are grouped within one entry whereas others are separated into several entries. What exactly is an entry in Gephebase?
Each entry corresponds to one allelic difference at a given gene, either between two closely related species or between two individuals, and its associated phenotypic change. We chose to group certain genetic changes together. When several mutations within the same gene in a given individual are found, with each mutation affecting the trait of interest (intralineage hotspot, several causative mutations within an haplotype), all are grouped into a single entry. In contrast, when independent mutations occur in the same gene in distinct individuals of the same species, leading to similar phenotypic changes (intraspecific parallel evolution, convergent evolution), then different entries are recorded for each lineage-specific haplotype. You can nevertheless group all the entries for one gene in one species if you choose the option “group mutations” in the results page.

How should I pronounce Gephebase? What does Gephebase mean?
Gephebase should be pronounced as in “Ge”notype- “Phe”notype Data”base”. Gephebase is a compilation of “gephes”. A gephe is an abstract entity composed of a variation at a genetic locus (two alleles), its associated phenotypic change (two distinct phenotypic states), and their relationships. See this paper on "The differential view of genotype-phenotype relationships" for details.

Any other question?
Please e-mail us (click in the banner below). We will be happy to answer your questions.