Science Videos Events Forum About Research Courses BECOME A MEMBER Login

Study Finds Human Gene Linked to Larger Brains Arose from Non-Protein Coding (“Junk”) DNA

Researchers have discovered a key process by which new genes from non-protein coding DNA undergoes mutations to enable export from the nucleus into the cellular cytoplasm where the new gene can be translated into novel polypeptides. In the new study the researchers have shown that far from being accessories, new gene products are often integral in key phenotype characteristics, such as larger brains in human-specific de novo genes from non-protein coding DNA. But before such genes can become novel protein products, they must change to escape the nuclear localization fated for long non-coding RNA sequences: the study elucidates the mutations involved in enabling nuclear export where the new gene can access the translational machinery of the ribosome, and demonstrates via knock-out and overexpression experiments the functional role of novo genes from non-protein coding DNA in organism development, like the enlargement of the cerebral cortex in humans.  

By: William Brown, scientist at the Resonance Science Foundation

The origin of novel protein-coding genes de novo was once considered so improbable that it verged on the impossible. This was a major reason why gradual evolution was favored to punctuated speciation, as it was considered that it must take millions of years for a useful gene product to emerge via the blind & random processes favored by conventional theory. However, in less than a decade, and especially in the last five years, this view has been overturned by extensive evidence of gene duplication, alternative splicing mechanisms, and pre-adaptation of non-coding DNA in diverse eukaryotic lineages to generate novel functional gene products. Rather than an extremely rare occurrence, it is now evident that there is a relatively constant trickle of proto genes released into the testing ground of natural selection. Interestingly, it has been discovered that gene-product neosynthesis can occur from sections of DNA that code for seemingly useless long RNA transcripts (what are called long non-coding RNA, or lncRNA).

The phenomenon of de novo gene birth from non-protein coding DNA is surprising, because the generic structural form of protein is amyloid, so that random polypeptides are expected to be toxic, like the amyloid plaques that characterize Alzheimer’s and neurodegenerative disease. Yet, it has become clear that useful, non-toxic gene products can indeed originate de novo from non-protein coding sequences [1], and these novel products often emerge as fully functional gene products with little-to-no intermediary acclimation phases, in an all-or-nothing type of emergence— invalidating theories of gradual evolutionary exaptation and supporting the pre-adaptation model, which proposes the existence of exaggerated gene-like characteristics in new genes and an all-or-nothing transition to functionality [2].

Although gene duplication has been reported as the predominant mechanism of the origin of new genes [3, 4], there is now an abundance of data showing that new proteins also evolve from non-protein coding DNA regions [5, 6, 7]. In contrast to gene duplication— where many of the transcription initiation, messenger RNA (mRNA) splicing / editing, and ribonucleoprotein nuclear export elements are already present— the process of de novo gene birth from long non-coding RNA sequences is surprising because many of the transcriptional and mRNA processing elements required to generate a functional gene transcript are not present. They must be generated from scratch (de novo synthesis), which requires key functional mutations to enable proper splicing and editing of the RNA sequences into valid mRNA transcripts.  

To elucidate this process and further demonstrate how functional gene products are generated from non-protein coding DNA sequences, a study reported in Science [and published in Nature Ecology & Evolution, 8] has identified a human-specific gene that plays a key role in developing large complex brains that arose from long non-coding RNA. The study’s authors were able to show how the gene originated from lncRNA, showing homology between the gene and lncRNA-specific sequences, and identifying key changes in RNA splice-related sequences that enabled RNA nuclear export. Via introduction of new splicing elements, the lncRNA sequences are changed to be able to leave the nucleus where there is access to the ribosome, thus becoming functional mRNA transcripts. Perhaps the most surprising result of the study is that these newly generated (de novo) genes from long non-coding RNA sequences have biological functionality.   

For this latter characteristic, the research team identified 74 human/hominoid-specific novel genes, half of which emerged after the ancestral split of the human and chimpanzee lineages. Selecting one of these 74 genes that is human-specific and expressed in brain development, the research team demonstrated experimentally that knock-out or over-expression of the gene in human embryonic stem cells accelerated or delayed the neuronal maturation of cortical organoids, respectively. What’s more, when the gene was ectopically expressed in transgenic mice the test organisms developed enlarged brains with higher cortical structure like enfolding, a humanlike characteristic typical of brain morphology in the cerebral cortex.

The study therefore demonstrates how a pool of proto genes from non-protein coding DNA sequences can be rapidly exapted into functional mRNA and novel proteins that underlie key phenotypic changes in speciation— like the development of enlarged complex brains, significant in the human lineage. The pre-adaptation of the proto gene reservoir is highly intriguing and certainly raises further questions, as it suggests that the exaptation of non-protein coding DNA sequences may not be entirely serendipitous but instead that there is a natural mechanism operational to generate proto genes that can be rapidly exapted in the all-or-nothing fashion of pre-adaptation.


[1] McLysaght, A. & Guerzoni, D. New genes from non-coding sequence: the role of de novo protein-coding genes in eukaryotic evolutionary innovation. Phil. Trans. R. Soc. B 370, 20140332 (2015), DOI: 10.1098/rstb.2014.0332

[2] Wilson, B. A., Foy, S. G., Neme, R. & Masel, J. Young genes are highly disordered as predicted by the preadaptation hypothesis of de novo gene birth. Nat. Ecol. Evol. 1, 0146–0146 (2017)

[3] Chen, S., Krinsky, B. H. & Long, M. New genes as drivers of phenotypic evolution. Nat. Rev. Genet. 14, 645–660 (2013)

[4] Long, M., Betran, E., Thornton, K. & Wang, W. The origin of new genes: glimpses from the young and old. Nat. Rev. Genet. 4, 865–875 (2003)

[5] Li, C. Y. et al. A human-specific de novo protein-coding gene associated with human brain functions. PLoS Comput. Biol. 6, e1000734 (2010)

[6] Xie, C. et al. Hominoid-specific de novo protein-coding genes originating from long non-coding RNAs. PLoS Genet. 8, e1002942 (2012)

[7] Carvunis, A. R. et al. Proto-genes and de novo gene birth. Nature 487, 370–374 (2012)

[8] N. A. An et al., “De novo genes with a lncRNA origin encode unique human brain developmental functionality,” Nat Ecol Evol, pp. 1–15, Jan. 2023, doi: 10.1038/s41559-022-01925-6


50% Complete

Two Step

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.