Newswise — The black cottonwood tree, or Populus trichocarpa (poplar), serves as a model organism for scientists studying the structure, growth, development, and genetics of plants. Poplar was the first tree genome scientists sequenced, and now they use it to study topics such as bioenergy, drought tolerance, and wood formation.

But despite poplar’s broad use in plant biology, little is known about its centromeres—regions in both animals and plants that join the two halves of a chromosome. Centromeres are unique because they are inherited epigenetically, meaning their location on the chromosome is passed down from generation to generation regardless of the differing DNA sequences between generations. The centromere also plays an important role in meiosis—the type of cell division that produces plant pollen—in the distribution of DNA into new cells.

A team at the US Department of Energy’s (DOE’s) Oak Ridge National Laboratory (ORNL) recently mapped the locations of centromeres in poplar using pre-existing genome sequence data. Deborah Weighill, a former graduate student at the Bredesen Center for Interdisciplinary Research and Graduate Education at ORNL and the University of Tennessee, Knoxville, performed a subsequent genomic analysis on the Cray XK7 Titan supercomputer at the Oak Ridge Leadership Computing Facility, a DOE Office of Science User Facility at ORNL. She found that genetic variants in the sequence of DNA at the centromere and the sequence of a particular protein structure that this DNA wraps around—called centromeric histone 3 (CENH3)—show similar occurrence patterns.

“There is a hypothesis that there is an evolutionary relationship between the sequence of the centromere and the sequence of the protein. This kind of relationship would evolve a stronger binding between these two components and result in a centromere that is more stable during cell division,” Weighill said. “If your centromere isn’t stable, then your chromosomes aren’t going to be able to behave properly during cell division.”

A particular copy of CENH3 and the centromeric region show a coevolutionary signature—meaning their genetic variation patterns suggest they evolved together. This copy of CENH3 may be the primary homolog, or ancestry-sharing gene, that binds to the centromeric DNA. The study opens new opportunities to study the genes related to centromeric functioning in plants.

The ever-popular poplar

Higher-level organisms such as eukaryotes—which are packed full of membrane-enclosed organelles—have two sets of chromosomes. During cell division, these chromosomes organize themselves so that the correct number of chromosomes ends up in each of the resulting cells.

Like a choreographed dance, the chromosomes all line up and bind to their equivalent chromosome partners at the centromere. The centromere—although not always in the middle of the chromosome—is where fibers attach to pull the chromatids (the halves of a single copied chromosome) apart into the daughter cells. Until now, only approximate visual representations of poplar centromeres had been published in the scientific literature.

“There have been attempts to identify these previously, but the actual locations have never been determined numerically,” Weighill said.

Because the biology community studies poplar extensively—with much of this research performed at the Center for Bioenergy Innovation (CBI) at ORNL—the ORNL team had a large repository of data from which to draw. First, they had the full genome sequence of the plant. Second, they had data about poplar’s single nucleotide polymorphisms (SNPs—pronounced “snips”), genetic variations occurring in single structural units of DNA (nucleotides) that can act as biological markers and/or affect gene function. And third, they had data about poplar’s DNA methylation, a process that alters the accessibility of DNA and changes how genes are expressed.

The team studied the data, looking specifically for chromosomal areas that had three things: a low density of genes, a low density of SNPs, and a high density of methylation. The centromere is found in regions with these characteristics.

“Genes are likely sparse in the centromeric region because it’s a disruptive structure,” Weighill said. “But the exact reason why all these things are the way they are is not perfectly understood.”

The team validated the centromere locations using data from other plants and a technique called BLAST, or basic local alignment search tool. BLAST is an algorithm used to study similarities between DNA sequences by making rapid comparisons of their sequences. The regions predicted by the initial comparisons were verified by the BLAST technique.

“We validated our approach for finding centromere repeat sequences by comparing our results to other plants where the centromeres have been identified,” Weighill said. “We BLAST-matched the sequences against the poplar genome, and we saw an increased density of centromere-like repeat sequences in the areas we predicted.”

A vast number of variants

Weighill then investigated the hypothesis that the genetic variants in the DNA in the centromeric region coevolved with genetic variants in centromeric histones, proteins that hold DNA together as it wraps around them. Using the Titan supercomputer and the Combinatorial Metrics (CoMet) code, a comparative genomics code developed by ORNL’s Wayne Joubert, Weighill calculated the correlations between gene variants across a population of about 1,000 different poplar trees. 

After running nearly 100 trillion comparisons to calculate the co-occurrence between all the pairs of SNPs, she found that concentrations of gene variants in the centromeric region were correlated with gene variants in a histone protein on chromosome 2. 

“When two variants occur together very frequently, that’s a signal of some functional relationship between them,” Weighill said.

The result suggests that the CENH3 gene homolog on chromosome 2 in poplar is functioning as the main template for the centromeric histone CENH3.

“This is definitely evidence for and not against the evolutionary hypothesis we were exploring,” Weighill said. 

The most interesting part of the result, she said, is that although scientists believe two copies of the gene are involved in the expression of CENH3, only one copy of the gene—the one on chromosome 2—is showing a gene signature that corresponds to CENH3. The two genes—the other being a copy on chromosome 14—appear to be functionally divergent, meaning they may be on separate evolutionary trajectories.

Weighill said the initial findings showing the location of poplar’s centromeres can now be applied to other algorithms. Because numerous genetic algorithms contain information about how the chromosomes recombine—an action dependent on the centromere’s location—the finding could prove useful for further plant analyses at CBI. 

As for her own work, Weighill is taking on new challenges in human genetic variation at Harvard University, where she is currently a postdoctoral researcher in the Department of Biostatistics in the Harvard T.H. Chan School of Public Health after earning her PhD in energy science and engineering from the Bredesen Center at the University of Tennessee, Knoxville, under ORNL’s Dan Jacobson.

“As long as I’m working on something that’s relevant to the planet and society, then I’m happy,” Weighill said. “I like the hope that my work can do some good.”

The research was funded by the BioEnergy Science Center, CBI, and the Plant-Microbe Interfaces Scientific Focus Area in the Genomic Science Program, all supported by the Office of Biological and Environmental Research in the DOE Office of Science.

Related Publication: D. Weighill, D. Macaya-Sanz, S. P. DiFazio, W. Joubert, M. Shah, J. Schmutz, A. Sreedasyam, G. Tuskan, and D. Jacobson, “Wavelet-Based Genomic Signal Processing for Centromere Identification and Hypothesis Generation.” Frontiers in Genetics 10 (2019): 487, doi:10.3389/fgene.2019.00487.

Journal Link: Frontiers in Genetics, May-2019