Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
medRxiv ; 2024 Apr 04.
Artigo em Inglês | MEDLINE | ID: mdl-38633780

RESUMO

Autism Spectrum Disorder (ASD) arises from complex genetic and environmental factors, with inherited genetic variation playing a substantial role. This study introduces a novel approach to uncover moderate effect size (MES) genes in ASD, which individually do not meet the ASD liability threshold but collectively contribute when paired with specific other MES genes. Analyzing 10,795 families from the SPARK dataset, we identified 97 MES genes forming 50 significant gene pairs, demonstrating a substantial association with ASD when considered in tandem, but not individually. Our method leverages familial inheritance patterns and statistical analyses, refined by comparisons against control cohorts, to elucidate these gene pairs' contribution to ASD liability. Furthermore, expression profile analyses of these genes in brain tissues underscore their relevance to ASD pathology. This study underscores the complexity of ASD's genetic landscape, suggesting that gene combinations, beyond high impact single-gene mutations, significantly contribute to the disorder's etiology and heterogeneity. Our findings pave the way for new avenues in understanding ASD's genetic underpinnings and developing targeted therapeutic strategies.

2.
Cell Genom ; 3(9): 100407, 2023 Sep 13.
Artigo em Inglês | MEDLINE | ID: mdl-37719148

RESUMO

[This corrects the article DOI: 10.1016/j.xgen.2023.100305.].

3.
Appl Plant Sci ; 11(4): e11533, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37601314

RESUMO

Premise: Robust standards to evaluate quality and completeness are lacking in eukaryotic structural genome annotation, as genome annotation software is developed using model organisms and typically lacks benchmarking to comprehensively evaluate the quality and accuracy of the final predictions. The annotation of plant genomes is particularly challenging due to their large sizes, abundant transposable elements, and variable ploidies. This study investigates the impact of genome quality, complexity, sequence read input, and method on protein-coding gene predictions. Methods: The impact of repeat masking, long-read and short-read inputs, and de novo and genome-guided protein evidence was examined in the context of the popular BRAKER and MAKER workflows for five plant genomes. The annotations were benchmarked for structural traits and sequence similarity. Results: Benchmarks that reflect gene structures, reciprocal similarity search alignments, and mono-exonic/multi-exonic gene counts provide a more complete view of annotation accuracy. Transcripts derived from RNA-read alignments alone are not sufficient for genome annotation. Gene prediction workflows that combine evidence-based and ab initio approaches are recommended, and a combination of short and long reads can improve genome annotation. Adding protein evidence from de novo assemblies, genome-guided transcriptome assemblies, or full-length proteins from OrthoDB generates more putative false positives as implemented in the current workflows. Post-processing with functional and structural filters is highly recommended. Discussion: While the annotation of non-model plant genomes remains complex, this study provides recommendations for inputs and methodological approaches. We discuss a set of best practices to generate an optimal plant genome annotation and present a more robust set of metrics to evaluate the resulting predictions.

4.
Cell Genom ; 3(6): 100305, 2023 Jun 14.
Artigo em Inglês | MEDLINE | ID: mdl-37388907

RESUMO

Somatic mutations have important biological ramifications while exerting substantial rate, type, and genomic location heterogeneity. Yet, their sporadic occurrence makes them difficult to study at scale and across individuals. Lymphoblastoid cell lines (LCLs), a model system for human population and functional genomics, harbor large numbers of somatic mutations and have been extensively genotyped. By comparing 1,662 LCLs, we report that the mutational landscape of the genome varies across individuals in terms of the number of mutations, their genomic locations, and their spectra; this variation may itself be modulated by somatic trans-acting mutations. Mutations attributed to the translesion DNA polymerase η follow two different modes of formation, with one mode accounting for the hypermutability of the inactive X chromosome. Nonetheless, the distribution of mutations along the inactive X chromosome appears to follow an epigenetic memory of the active form.

5.
Cell Genom ; 3(6): 100315, 2023 Jun 14.
Artigo em Inglês | MEDLINE | ID: mdl-37388911

RESUMO

The patterns of genomic mutations are associated with various genomic features, most notably late replication timing, yet it remains contested which mutation types and signatures relate to DNA replication dynamics and to what extent. Here, we perform high-resolution comparisons of mutational landscapes between lymphoblastoid cell lines, chronic lymphocytic leukemia tumors, and three colon adenocarcinoma cell lines, including two with mismatch repair deficiency. Using cell-type-matched replication timing profiles, we demonstrate that mutation rates exhibit heterogeneous replication timing associations among cell types. This cell-type heterogeneity extends to the underlying mutational pathways, as mutational signatures show inconsistent replication timing bias between cell types. Moreover, replicative strand asymmetries exhibit similar cell-type specificity, albeit with different relationships to replication timing than mutation rates. Overall, we reveal an underappreciated complexity and cell-type specificity of mutational pathways and their relationship to replication timing.

6.
Proc Natl Acad Sci U S A ; 120(10): e2213896120, 2023 03 07.
Artigo em Inglês | MEDLINE | ID: mdl-36848554

RESUMO

DNA is replicated according to a defined spatiotemporal program that is linked to both gene regulation and genome stability. The evolutionary forces that have shaped replication timing programs in eukaryotic species are largely unknown. Here, we studied the molecular causes and consequences of replication timing evolution across 94 humans, 95 chimpanzees, and 23 rhesus macaques. Replication timing differences recapitulated the species' phylogenetic tree, suggesting continuous evolution of the DNA replication timing program in primates. Hundreds of genomic regions had significant replication timing variation between humans and chimpanzees, of which 66 showed advances in replication origin firing in humans, while 57 were delayed. Genes overlapping these regions displayed correlated changes in expression levels and chromatin structure. Many human-chimpanzee variants also exhibited interindividual replication timing variation, pointing to ongoing evolution of replication timing at these loci. Association of replication timing variation with genetic variation revealed that DNA sequence evolution can explain replication timing variation between species. Taken together, DNA replication timing shows substantial and ongoing evolution in the human lineage that is driven by sequence alterations and could impact regulatory evolution at specific genomic sites.


Assuntos
Período de Replicação do DNA , Pan troglodytes , Animais , Humanos , Pan troglodytes/genética , Período de Replicação do DNA/genética , Macaca mulatta/genética , Filogenia , Eucariotos
7.
Hum Mol Genet ; 31(17): 2899-2917, 2022 08 25.
Artigo em Inglês | MEDLINE | ID: mdl-35394024

RESUMO

Cellular proliferation depends on the accurate and timely replication of the genome. Several genetic diseases are caused by mutations in key DNA replication genes; however, it remains unclear whether these genes influence the normal program of DNA replication timing. Similarly, the factors that regulate DNA replication dynamics are poorly understood. To systematically identify trans-acting modulators of replication timing, we profiled replication in 184 cell lines from three cell types, encompassing 60 different gene knockouts or genetic diseases. Through a rigorous approach that considers the background variability of replication timing, we concluded that most samples displayed normal replication timing. However, mutations in two genes showed consistently abnormal replication timing. The first gene was RIF1, a known modulator of replication timing. The second was MCM10, a highly conserved member of the pre-replication complex. Cells from a single patient carrying MCM10 mutations demonstrated replication timing variability comprising 46% of the genome and at different locations than RIF1 knockouts. Replication timing alterations in the mutated MCM10 cells were predominantly comprised of replication delays and initiation site gains and losses. Taken together, this study demonstrates the remarkable robustness of the human replication timing program and reveals MCM10 as a novel candidate modulator of DNA replication timing.


Assuntos
Período de Replicação do DNA , Proteínas de Manutenção de Minicromossomo , Proteínas de Ciclo Celular/genética , Proteínas de Ciclo Celular/metabolismo , Linhagem Celular , Replicação do DNA/genética , Período de Replicação do DNA/genética , Humanos , Proteínas de Manutenção de Minicromossomo/genética , Origem de Replicação
8.
Mol Ecol Resour ; 22(2): 695-710, 2022 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-34383377

RESUMO

We performed gene and genome targeted SNP discovery towards the development of a genome-wide, multispecies genotyping array for tropical pines. Pooled RNA-seq data from shoots of seedlings from five tropical pine species was used to identify transcript-based SNPs resulting in 1.3 million candidate Affymetrix SNP probe sets. In addition, we used a custom 40 K probe set to perform capture-seq in pooled DNA from 81 provenances representing the natural ranges of six tropical pine species in Mexico and Central America resulting in 563 K candidate SNP probe sets. Altogether, 300 K RNA-seq (72%) and 120 K capture-seq (28%) derived SNP probe sets were tiled on a 420 K screening array that was used to genotype 576 trees representing the 81 provenances and commercial breeding material. Based on the screening array results, 50 K SNPs were selected for commercial SNP array production including 20 K polymorphic SNPs for P. patula, P. tecunumanii, P. oocarpa and P. caribaea, 15 K for P. greggii and P. maximinoi, 13 K for P. elliottii and 8K for P. pseudostrobus. We included 9.7 K ancestry informative SNPs that will be valuable for species and hybrid discrimination. Of the 50 K SNP markers, 25% are polymorphic in only one species, while 75% are shared by two or more species. The Pitro50K SNP chip will be useful for population genomics and molecular breeding in this group of pine species that, together with their hybrids, represent the majority of fast-growing tropical and subtropical pine plantations globally.


Assuntos
Pinus , Árvores , Genoma , Genótipo , Pinus/genética , Melhoramento Vegetal , Polimorfismo de Nucleotídeo Único , Árvores/genética
9.
Genetics ; 219(3)2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-34740250

RESUMO

Regulation of DNA replication and copy number is necessary to promote genome stability and maintain cell and tissue function. DNA replication is regulated temporally in a process known as replication timing (RT). Rap1-interacting factor 1 (Rif1) is a key regulator of RT and has a critical function in copy number control in polyploid cells. Previously, we demonstrated that Rif1 functions with SUUR to inhibit replication fork progression and promote underreplication (UR) of specific genomic regions. How Rif1-dependent control of RT factors into its ability to promote UR is unknown. By applying a computational approach to measure RT in Drosophila polyploid cells, we show that SUUR and Rif1 have differential roles in controlling UR and RT. Our findings reveal that Rif1 acts to promote late replication, which is necessary for SUUR-dependent underreplication. Our work provides new insight into the process of UR and its links to RT.


Assuntos
Proteínas de Transporte/metabolismo , Período de Replicação do DNA , Proteínas de Drosophila/metabolismo , Drosophila melanogaster/genética , Animais , Animais Geneticamente Modificados , Proteínas de Transporte/genética , Biologia Computacional , Variações do Número de Cópias de DNA , Proteínas de Ligação a DNA/genética , Proteínas de Ligação a DNA/metabolismo , Proteínas de Drosophila/genética , Feminino , Poliploidia , RNA-Seq
10.
Appl Plant Sci ; 9(6): e11439, 2021 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-34268018

RESUMO

PREMISE: An informatics approach was used for the construction of an Axiom genotyping array from heterogeneous, high-throughput sequence data to assess the complex genome of loblolly pine (Pinus taeda). METHODS: High-throughput sequence data, sourced from exome capture and whole genome reduced-representation approaches from 2698 trees across five sequence populations, were analyzed with the improved genome assembly and annotation for the loblolly pine. A variant detection, filtering, and probe design pipeline was developed to detect true variants across and within populations. From 8.27 million variants, a total of 642,275 were evaluated and 423,695 of those were screened across a range-wide population. RESULTS: The final informatics and screening approach delivered an Axiom array representing 46,439 high-confidence variants to the forest tree breeding and genetics community. Based on the annotated reference genome, 34% were located in or directly upstream or downstream of genic regions. DISCUSSION: The Pita50K array represents a genome-wide resource developed from sequence data for an economically important conifer, loblolly pine. It uniquely integrates independent projects that assessed trees sampled across the native range. The challenges associated with the large and repetitive genome are addressed in the development of this resource.

11.
G3 (Bethesda) ; 10(11): 3907-3919, 2020 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-32948606

RESUMO

The giant sequoia (Sequoiadendron giganteum) of California are massive, long-lived trees that grow along the U.S. Sierra Nevada mountains. Genomic data are limited in giant sequoia and producing a reference genome sequence has been an important goal to allow marker development for restoration and management. Using deep-coverage Illumina and Oxford Nanopore sequencing, combined with Dovetail chromosome conformation capture libraries, the genome was assembled into eleven chromosome-scale scaffolds containing 8.125 Gbp of sequence. Iso-Seq transcripts, assembled from three distinct tissues, was used as evidence to annotate a total of 41,632 protein-coding genes. The genome was found to contain, distributed unevenly across all 11 chromosomes and in 63 orthogroups, over 900 complete or partial predicted NLR genes, of which 375 are supported by annotation derived from protein evidence and gene modeling. This giant sequoia reference genome sequence represents the first genome sequenced in the Cupressaceae family, and lays a foundation for using genomic tools to aid in giant sequoia conservation and management.


Assuntos
Sequoiadendron , Cromossomos , Genoma , Sequenciamento de Nucleotídeos em Larga Escala , Anotação de Sequência Molecular , Árvores
12.
Plant J ; 102(2): 410-423, 2020 04.
Artigo em Inglês | MEDLINE | ID: mdl-31823432

RESUMO

Juglans (walnuts), the most speciose genus in the walnut family (Juglandaceae), represents most of the family's commercially valuable fruit and wood-producing trees. It includes several species used as rootstock for their resistance to various abiotic and biotic stressors. We present the full structural and functional genome annotations of six Juglans species and one outgroup within Juglandaceae (Juglans regia, J. cathayensis, J. hindsii, J. microcarpa, J. nigra, J. sigillata and Pterocarya stenoptera) produced using BRAKER2 semi-unsupervised gene prediction pipeline and additional tools. For each annotation, gene predictors were trained using 19 tissue-specific J. regia transcriptomes aligned to the genomes. Additional functional evidence and filters were applied to multi-exonic and mono-exonic putative genes to yield between 27 000 and 44 000 high-confidence gene models per species. Comparison of gene models to the BUSCO embryophyta dataset suggested that, on average, genome annotation completeness was 85.6%. We utilized these high-quality annotations to assess gene family evolution within Juglans, and among Juglans and selected Eurosid species. We found notable contractions in several gene families in J. hindsii, including disease resistance-related wall-associated kinase (WAK), Catharanthus roseus receptor-like kinase (CrRLK1L) and others involved in abiotic stress response. Finally, we confirmed an ancient whole-genome duplication that took place in a common ancestor of Juglandaceae using site substitution comparative analysis.


Assuntos
Genoma de Planta/genética , Genômica , Juglans/genética , Transcriptoma , Resistência à Doença/genética , Juglans/fisiologia , Estresse Fisiológico
13.
PLoS Genet ; 15(12): e1007979, 2019 12.
Artigo em Inglês | MEDLINE | ID: mdl-31860654

RESUMO

Simulations of close relatives and identical by descent (IBD) segments are common in genetic studies, yet most past efforts have utilized sex averaged genetic maps and ignored crossover interference, thus omitting features known to affect the breakpoints of IBD segments. We developed Ped-sim, a method for simulating relatives that can utilize either sex-specific or sex averaged genetic maps and also either a model of crossover interference or the traditional Poisson model for inter-crossover distances. To characterize the impact of previously ignored mechanisms, we simulated data for all four combinations of these factors. We found that modeling crossover interference decreases the standard deviation of pairwise IBD proportions by 10.4% on average in full siblings through second cousins. By contrast, sex-specific maps increase this standard deviation by 4.2% on average, and also impact the number of segments relatives share. Most notably, using sex-specific maps, the number of segments half-siblings share is bimodal; and when combined with interference modeling, the probability that sixth cousins have non-zero IBD sharing ranges from 9.0 to 13.1%, depending on the sexes of the individuals through which they are related. We present new analytical results for the distributions of IBD segments under these models and show they match results from simulations. Finally, we compared IBD sharing rates between simulated and real relatives and find that the combination of sex-specific maps and interference modeling most accurately captures IBD rates in real data. Ped-sim is open source and available from https://github.com/williamslab/ped-sim.


Assuntos
Mapeamento Cromossômico/métodos , Simulação por Computador , Caracteres Sexuais , Feminino , Variação Genética , Genética Populacional , Genoma Humano , Humanos , Masculino , Modelos Genéticos , Linhagem , Distribuição de Poisson
14.
Genomics Proteomics Bioinformatics ; 17(3): 305-310, 2019 06.
Artigo em Inglês | MEDLINE | ID: mdl-31437583

RESUMO

Published genomes frequently contain erroneous gene models that represent issues associated with identification of open reading frames, start sites, splice sites, and related structural features. The source of these inconsistencies is often traced back to integration across text file formats designed to describe long read alignments and predicted gene structures. In addition, the majority of gene prediction frameworks do not provide robust downstream filtering to remove problematic gene annotations, nor do they represent these annotations in a format consistent with current file standards. These frameworks also lack consideration for functional attributes, such as the presence or absence of protein domains that can be used for gene model validation. To provide oversight to the increasing number of published genome annotations, we present a software package, the Gene Filtering, Analysis, and Conversion (gFACs), to filter, analyze, and convert predicted gene models and alignments. The software operates across a wide range of alignment, analysis, and gene prediction files with a flexible framework for defining gene models with reliable structural and functional attributes. gFACs supports common downstream applications, including genome browsers, and generates extensive details on the filtering process, including distributions that can be visualized to further assess the proposed gene space. gFACs is freely available and implemented in Perl with support from BioPerl libraries at https://gitlab.com/PlantGenomicsLab/gFACs.


Assuntos
Conversão Gênica , Genoma , Genômica/métodos , Anotação de Sequência Molecular , Animais , Humanos , Fases de Leitura Aberta , Software , Especificidade da Espécie
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA