Search | VHL Regional Portal

Genome-wide strand asymmetry in massively parallel reporter activity favors genic strands.

Roberts, Brian S; Partridge, E Christopher; Moyers, Bryan A; Agarwal, Vikram; Newberry, Kimberly M; Martin, Beth K; Shendure, Jay; Myers, Richard M; Cooper, Gregory M.

Genome Res ; 31(5): 866-876, 2021 05.

Article in English | MEDLINE | ID: mdl-33879525

ABSTRACT

Massively parallel reporter assays (MPRAs) are useful tools to characterize regulatory elements in human genomes. An aspect of MPRAs that is not typically the focus of analysis is their intrinsic ability to differentiate activity levels for a given sequence element when placed in both of its possible orientations relative to the reporter construct. Here, we describe pervasive strand asymmetry of MPRA signals in data sets from multiple reporter configurations in both published and newly reported data. These effects are reproducible across different cell types and in different treatments within a cell type and are observed both within and outside of annotated regulatory elements. From elements in gene bodies, MPRA strand asymmetry favors the sense strand, suggesting that function related to endogenous transcription is driving the phenomenon. Similarly, we find that within Alu mobile element insertions, strand asymmetry favors the transcribed strand of the ancestral retrotransposon. The effect is consistent across the multiplicity of Alu elements in human genomes and is more pronounced in less diverged Alu elements. We find sequence features driving MPRA strand asymmetry and show its prediction from sequence alone. We see some evidence for RNA stabilization and transcriptional activation mechanisms and hypothesize that the effect is driven by natural selection favoring efficient transcription. Our results indicate that strand asymmetry is a pervasive and reproducible feature in MPRA data. More importantly, the fact that MPRA asymmetry favors naturally transcribed strands suggests that it stems from preserved biological functions that have a substantial, global impact on gene and genome evolution.

Subject(s)

Genome, Human , Regulatory Sequences, Nucleic Acid , Gene Expression Regulation , Genes, Reporter , Humans

Genome sequencing for early-onset or atypical dementia: high diagnostic yield and frequent observation of multiple contributory alleles.

Cochran, J Nicholas; McKinley, Emily C; Cochran, Meagan; Amaral, Michelle D; Moyers, Bryan A; Lasseigne, Brittany N; Gray, David E; Lawlor, James M J; Prokop, Jeremy W; Geier, Ethan G; Holt, James M; Thompson, Michelle L; Newberry, J Scott; Yokoyama, Jennifer S; Worthey, Elizabeth A; Geldmacher, David S; Love, Marissa Natelson; Cooper, Gregory M; Myers, Richard M; Roberson, Erik D.

Cold Spring Harb Mol Case Stud ; 5(6)2019 12.

Article in English | MEDLINE | ID: mdl-31836585

ABSTRACT

We assessed the results of genome sequencing for early-onset dementia. Participants were selected from a memory disorders clinic. Genome sequencing was performed along with C9orf72 repeat expansion testing. All returned sequencing results were Sanger-validated. Prior clinical diagnoses included Alzheimer's disease, frontotemporal dementia, and unspecified dementia. The mean age of onset was 54 (41-76). Fifty percent of patients had a strong family history, 37.5% had some, and 12.5% had no known family history. Nine of 32 patients (28%) had a variant defined as pathogenic or likely pathogenic (P/LP) by American College of Medical Genetics and Genomics standards, including variants in APP, C9orf72, CSF1R, and MAPT Nine patients (including three with P/LP variants) harbored established risk alleles with moderate penetrance (odds ratios of â¼2-5) in ABCA7, AKAP9, GBA, PLD3, SORL1, and TREM2 All six patients harboring these moderate penetrance variants but not P/LP variants also had one or two APOE Îµ4 alleles. One patient had two APOE Îµ4 alleles with no other established contributors. In total, 16 patients (50%) harbored one or more genetic variants likely to explain symptoms. We identified variants of uncertain significance (VUSs) in ABI3, ADAM10, ARSA, GRID2IP, MME, NOTCH3, PLCD1, PSEN1, TM2D3, TNK1, TTC3, and VPS13C, also often along with other variants. In summary, genome sequencing for early-onset dementia frequently identified multiple established or possible contributory alleles. These observations add support for an oligogenic model for early-onset dementia.

Subject(s)

Alzheimer Disease/genetics , Dementia/genetics , Aged , Alleles , Apolipoprotein E4/genetics , Base Sequence , C9orf72 Protein/genetics , Chromosome Mapping , Female , Genetic Association Studies , Genetic Predisposition to Disease , Genetic Variation , Genome-Wide Association Study , Humans , Male , Middle Aged , Odds Ratio , Penetrance , Risk Factors , Whole Genome Sequencing/methods

Toward Reducing Phylostratigraphic Errors and Biases.

Moyers, Bryan A; Zhang, Jianzhi.

Genome Biol Evol ; 10(8): 2037-2048, 2018 08 01.

Article in English | MEDLINE | ID: mdl-30060201

ABSTRACT

Phylostratigraphy is a method for estimating gene age, usually applied to large numbers of genes in order to detect nonrandom age-distributions of gene properties that could shed light on mechanisms of gene origination and evolution. However, phylostratigraphy underestimates gene age with a nonnegligible probability. The underestimation is severer for genes with certain properties, creating spurious age distributions of these properties and those correlated with these properties. Here we explore three strategies to reduce phylostratigraphic error/bias. First, we test several alternative homology detection methods (PSIBLAST, HMMER, PHMMER, OMA, and GLAM2Scan) in phylostratigraphy, but fail to find any that noticeably outperforms the commonly used BLASTP. Second, using machine learning, we look for predictors of error-prone genes to exclude from phylostratigraphy, but cannot identify reliable predictors. Finally, we remove from phylostratigraphic analysis genes exhibiting errors in simulation, which by definition minimizes error/bias if the simulation is sufficiently realistic. Using this last approach, we show that some previously reported phylostratigraphic trends (e.g., younger proteins tend to evolve more rapidly and be shorter) disappear or even reverse, reconfirming the necessity of controlling phylostratigraphic error/bias. Taken together, our analyses demonstrate that phylostratigraphic errors/biases are refractory to several potential solutions but can be controlled at least partially by the exclusion of error-prone genes identified via realistic simulations. These results are expected to stimulate the judicious use of error-aware phylostratigraphy and reevaluation of previous phylostratigraphic findings.

Subject(s)

Bias , Evolution, Molecular , Phylogeny , Algorithms , Animals , Computer Simulation , Humans , Machine Learning , Models, Genetic , Software , Statistics, Nonparametric

Further Simulations and Analyses Demonstrate Open Problems of Phylostratigraphy.

Moyers, Bryan A; Zhang, Jianzhi.

Genome Biol Evol ; 9(6): 1519-1527, 2017 06 01.

Article in English | MEDLINE | ID: mdl-28637261

ABSTRACT

Phylostratigraphy, originally designed for gene age estimation by BLAST-based protein homology searches of sequenced genomes, has been widely used for studying patterns and inferring mechanisms of gene origination and evolution. We previously showed by computer simulation that phylostratigraphy underestimates gene age for a nonnegligible fraction of genes and that the underestimation is severer for genes with certain properties such as fast evolution and short protein sequences. Consequently, many previously reported age distributions of gene properties may have been methodological artifacts rather than biological realities. Domazet-Loso and colleagues recently argued that our simulations were flawed and that phylostratigraphic bias does not impact inferences about gene emergence and evolution. Here we discuss conceptual difficulties of phylostratigraphy, identify numerous problems in Domazet-Loso et al.'s argument, reconfirm phylostratigraphic error using simulations suggested by Domazet-Loso and colleagues, and demonstrate that a phylostratigraphic trend claimed to be robust to error disappears when genes likely to be error-resistant are analyzed. We conclude that extreme caution is needed in interpreting phylostratigraphic results because of the inherent biases of the method and that reanalysis using genes exhibiting no error in realistic simulations may help reduce spurious findings.

Subject(s)

Evolution, Molecular , Genomics/methods , Phylogeny , Computer Simulation , Genomics/standards , Humans , Models, Genetic

Evaluating Phylostratigraphic Evidence for Widespread De Novo Gene Birth in Genome Evolution.

Moyers, Bryan A; Zhang, Jianzhi.

Mol Biol Evol ; 33(5): 1245-56, 2016 05.

Article in English | MEDLINE | ID: mdl-26758516

ABSTRACT

The source of genetic novelty is an area of wide interest and intense investigation. Although gene duplication is conventionally thought to dominate the production of new genes, this view was recently challenged by a proposal of widespread de novo gene origination in eukaryotic evolution. Specifically, distributions of various gene properties such as coding sequence length, expression level, codon usage, and probability of being subject to purifying selection among groups of genes with different estimated ages were reported to support a model in which new protein-coding proto-genes arise from noncoding DNA and gradually integrate into cellular networks. Here we show that the genomic patterns asserted to support widespread de novo gene origination are largely attributable to biases in gene age estimation by phylostratigraphy, because such patterns are also observed in phylostratigraphic analysis of simulated genes bearing identical ages. Furthermore, there is no evidence of purifying selection on very young de novo genes previously claimed to show such signals. Together, these findings are consistent with the prevailing view that de novo gene birth is a relatively minor contributor to new genes in genome evolution. They also illustrate the danger of using phylostratigraphy in the study of new gene origination without considering its inherent bias.

Subject(s)

Biological Evolution , Genomics/methods , Models, Genetic , Mutation , Animals , Codon , Computer Simulation , Databases, Nucleic Acid , Evolution, Molecular , Gene Duplication , Humans , Open Reading Frames , Phylogeny , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae Proteins/genetics

Phylostratigraphic bias creates spurious patterns of genome evolution.

Moyers, Bryan A; Zhang, Jianzhi.

Mol Biol Evol ; 32(1): 258-67, 2015 Jan.

Article in English | MEDLINE | ID: mdl-25312911

ABSTRACT

Phylostratigraphy is a method for dating the evolutionary emergence of a gene or gene family by identifying its homologs across the tree of life, typically by using BLAST searches. Applying this method to all genes in a species, or genomic phylostratigraphy, allows investigation of genome-wide patterns in new gene origination at different evolutionary times and thus has been extensively used. However, gene age estimation depends on the challenging task of detecting distant homologs via sequence similarity, which is expected to have differential accuracies for different genes. Here, we evaluate the accuracy of phylostratigraphy by realistic computer simulation with parameters estimated from genomic data, and investigate the impact of its error on findings of genome evolution. We show that 1) phylostratigraphy substantially underestimates gene age for a considerable fraction of genes, 2) the error is especially serious when the protein evolves rapidly, is short, and/or its most conserved block of sites is small, and 3) these errors create spurious nonuniform distributions of various gene properties among age groups, many of which cannot be predicted a priori. Given the high likelihood that conclusions about gene age are faulty, we advocate the use of realistic simulation to determine if observations from phylostratigraphy are explainable, at least qualitatively, by a null model of biased measurement, and in all cases, critical evaluation of results.

Subject(s)

Drosophila/genetics , Genome, Insect , Genomics/standards , Animals , Computer Simulation , Drosophila/classification , Evolution, Molecular , Genomics/methods , Phylogeny

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL