The most popular genes in the human genome

By Elie Dolgin

Peter Kerpedjiev needed a crash course in genetics. A software engineer with some training in bioinformatics, he was pursuing a PhD and thought it would really help to know some fundamentals of biology. “If I wanted to have an intelligent conversation with someone, what genes do I need to know about?” he wondered.

Kerpedjiev went straight to the data. For years, the US National Library of Medicine (NLM) has been systematically tagging almost every paper in its popular PubMed database that contains some information about what a gene does. Kerpedjiev extracted all the papers marked as describing the structure, function or location of a gene or the protein it encodes.

Sorting through the records, he compiled a list of the most studied genes of all time — a sort of ‘top hits’ of the human genome, and several other genomes besides.

Heading the list, he found, is a gene called TP53. Three years ago, when Kerpedjiev first did his analysis, researchers had scrutinized the gene or the protein it produces, p53, in some 6,600 papers. Today, that number is at about 8,500 and counting. On average, around two papers are published each day describing new details of the basic biology of TP53.

