Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 31.490
Filtrar
1.
BMC Med Genomics ; 17(1): 85, 2024 Apr 16.
Artigo em Inglês | MEDLINE | ID: mdl-38622594

RESUMO

BACKGROUND: Multilocus pathogenic variants (MPVs) are genetic changes that affect multiple gene loci or regions of the genome, collectively leading to multiple molecular diagnoses. MPVs may also contribute to intrafamilial phenotypic variability between affected individuals within a nuclear family. In this study, we aim to gain further insights into the influence of MPVs on a disease manifestation in individual research subjects and explore the complexities of the human genome within a familial context. METHODS: We conducted a systematic reanalysis of exome sequencing data and runs of homozygosity (ROH) regions of 47 sibling pairs previously diagnosed with various neurodevelopmental disorders (NDD). RESULTS: We found siblings with MPVs driven by long ROH regions in 8.5% of families (4/47). The patients with MPVs exhibited significantly higher FROH values (p-value = 1.4e-2) and larger total ROH length (p-value = 1.8e-2). Long ROH regions mainly contribute to this pattern; the siblings with MPVs have a larger total size of long ROH regions than their siblings in all families (p-value = 6.9e-3). Whereas the short ROH regions in the siblings with MPVs are lower in total size compared to their sibling pairs with single locus pathogenic variants (p-value = 0.029), and there are no statistically significant differences in medium ROH regions between sibling pairs (p-value = 0.52). CONCLUSION: This study sheds light on the significance of considering MPVs in families with affected sibling pairs and the role of ROH as an adjuvant tool in explaining clinical variability within families. Identifying individuals carrying MPVs may have implications for disease management, identification of possible disease risks to different family members, genetic counseling and exploring personalized treatment approaches.


Assuntos
Genoma Humano , Irmãos , Humanos , Estudos Retrospectivos , Homozigoto , Polimorfismo de Nucleotídeo Único , Variação Biológica da População , Genótipo
2.
Int J Mol Sci ; 25(7)2024 Mar 31.
Artigo em Inglês | MEDLINE | ID: mdl-38612733

RESUMO

In the human genome, two short open reading frames (ORFs) separated by a transcriptional silencer and a small intervening sequence stem from the gene SMIM45. The two ORFs show different translational characteristics, and they also show divergent patterns of evolutionary development. The studies presented here describe the evolution of the components of SMIM45. One ORF consists of an ultra-conserved 68 amino acid (aa) sequence, whose origins can be traced beyond the evolutionary age of divergence of the elephant shark, ~462 MYA. The silencer also has ancient origins, but it has a complex and divergent pattern of evolutionary formation, as it overlaps both at the 68 aa ORF and the intervening sequence. The other ORF consists of 107 aa. It develops during primate evolution but is found to originate de novo from an ancestral non-coding genomic region with root origins within the Afrothere clade of placental mammals, whose evolutionary age of divergence is ~99 MYA. The formation of the complete 107 aa ORF during primate evolution is outlined, whereby sequence development is found to occur through biased mutations, with disruptive random mutations that also occur but lead to a dead-end. The 107 aa ORF is of particular significance, as there is evidence to suggest it is a protein that may function in human brain development. Its evolutionary formation presents a view of a human-specific ORF and its linked silencer that were predetermined in non-primate ancestral species. The genomic position of the silencer offers interesting possibilities for the regulation of transcription of the 107 aa ORF. A hypothesis is presented with respect to possible spatiotemporal expression of the 107 aa ORF in embryonic tissues.


Assuntos
Genoma Humano , Placenta , Feminino , Gravidez , Animais , Humanos , Fases de Leitura Aberta/genética , Sequência de Aminoácidos , Primatas , Mamíferos
3.
Int J Mol Sci ; 25(7)2024 Apr 06.
Artigo em Inglês | MEDLINE | ID: mdl-38612882

RESUMO

Non-coding RNAs have been described as crucial regulators of gene expression and guards of cellular homeostasis. Some recent papers focused on vault RNAs, one of the classes of non-coding RNA, and their role in cell proliferation, tumorigenesis, apoptosis, cancer response to therapy, and autophagy, which makes them potential therapy targets in oncology. In the human genome, four vault RNA paralogues can be distinguished. They are associated with vault complexes, considered the largest ribonucleoprotein complexes. The protein part of these complexes consists of a major vault protein (MVP) and two minor vault proteins (vPARP and TEP1). The name of the complex, as well as vault RNA, comes from the hollow barrel-shaped structure that resembles a vault. Their sequence and structure are highly evolutionarily conserved and show many similarities in comparison with different species, but vault RNAs have various roles. Vaults were discovered in 1986, and their functions remained unclear for many years. Although not much is known about their contribution to cell metabolism, it has become clear that vault RNAs are involved in various processes and pathways associated with cancer progression and modulating cell functioning in normal and pathological stages. In this review, we discuss known functions of human vault RNAs in the context of cellular metabolism, emphasizing processes related to cancer and cancer therapy efficacy.


Assuntos
Carcinogênese , Genoma Humano , Humanos , Transformação Celular Neoplásica , Apoptose , RNA/genética
4.
PLoS One ; 19(4): e0300545, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38558075

RESUMO

Short tandem repeat (STR) variation is an often overlooked source of variation between genomes. STRs comprise about 3% of the human genome and are highly polymorphic. Some cause Mendelian disease, and others affect gene expression. Their contribution to common disease is not well-understood, but recent software tools designed to genotype STRs using short read sequencing data will help address this. Here, we compare software that genotypes common STRs and rarer STR expansions genome-wide, with the aim of applying them to population-scale genomes. By using the Genome-In-A-Bottle (GIAB) consortium and 1000 Genomes Project short-read sequencing data, we compare performance in terms of sequence length, depth, computing resources needed, genotyping accuracy and number of STRs genotyped. To ensure broad applicability of our findings, we also measure genotyping performance against a set of genomes from clinical samples with known STR expansions, and a set of STRs commonly used for forensic identification. We find that HipSTR, ExpansionHunter and GangSTR perform well in genotyping common STRs, including the CODIS 13 core STRs used for forensic analysis. GangSTR and ExpansionHunter outperform HipSTR for genotyping call rate and memory usage. ExpansionHunter denovo (EHdn), STRling and GangSTR outperformed STRetch for detecting expanded STRs, and EHdn and STRling used considerably less processor time compared to GangSTR. Analysis on shared genomic sequence data provided by the GIAB consortium allows future performance comparisons of new software approaches on a common set of data, facilitating comparisons and allowing researchers to choose the best software that fulfils their needs.


Assuntos
Genoma Humano , Repetições de Microssatélites , Humanos , Repetições de Microssatélites/genética , Software , Genômica , Genótipo , Sequenciamento de Nucleotídeos em Larga Escala
5.
Sci Adv ; 10(14): eadl6595, 2024 Apr 05.
Artigo em Inglês | MEDLINE | ID: mdl-38569022

RESUMO

Mutually beneficial partnerships between genomics researchers and North American Indigenous Nations are rare yet becoming more common. Here, we present one such partnership that provides insight into the peopling of the Americas and furnishes another line of evidence that can be used to further treaty and Indigenous rights. We show that the genomics of sampled individuals from the Blackfoot Confederacy belong to a previously undescribed ancient lineage that diverged from other genomic lineages in the Americas in Late Pleistocene times. Using multiple complementary forms of knowledge, we provide a scenario for Blackfoot population history that fits with oral tradition and provides a plausible model for the evolutionary process of the peopling of the Americas.


Assuntos
Evolução Biológica , Genômica , Humanos , América , Genoma Humano
6.
Gigascience ; 132024 Jan 02.
Artigo em Inglês | MEDLINE | ID: mdl-38573185

RESUMO

BACKGROUND: Culture-free real-time sequencing of clinical metagenomic samples promises both rapid pathogen detection and antimicrobial resistance profiling. However, this approach introduces the risk of patient DNA leakage. To mitigate this risk, we need near-comprehensive removal of human DNA sequences at the point of sequencing, typically involving the use of resource-constrained devices. Existing benchmarks have largely focused on the use of standardized databases and largely ignored the computational requirements of depletion pipelines as well as the impact of human genome diversity. RESULTS: We benchmarked host removal pipelines on simulated and artificial real Illumina and Nanopore metagenomic samples. We found that construction of a custom kraken database containing diverse human genomes results in the best balance of accuracy and computational resource usage. In addition, we benchmarked pipelines using kraken and minimap2 for taxonomic classification of Mycobacterium reads using standard and custom databases. With a database representative of the Mycobacterium genus, both tools obtained improved specificity and sensitivity, compared to the standard databases for classification of Mycobacterium tuberculosis. Computational efficiency of these custom databases was superior to most standard approaches, allowing them to be executed on a laptop device. CONCLUSIONS: Customized pangenome databases provide the best balance of accuracy and computational efficiency when compared to standard databases for the task of human read removal and M. tuberculosis read classification from metagenomic samples. Such databases allow for execution on a laptop, without sacrificing accuracy, an especially important consideration in low-resource settings. We make all customized databases and pipelines freely available.


Assuntos
Mycobacterium tuberculosis , Humanos , Mycobacterium tuberculosis/genética , Benchmarking , Bases de Dados Factuais , Genoma Humano , Metagenoma
7.
Sci Rep ; 14(1): 7988, 2024 04 05.
Artigo em Inglês | MEDLINE | ID: mdl-38580715

RESUMO

In the human genome, heterozygous sites refer to genomic positions with a different allele or nucleotide variant on the maternal and paternal chromosomes. Resolving these allelic differences by chromosomal copy, also known as phasing, is achievable on a short-read sequencer when using a library preparation method that captures long-range genomic information. TELL-Seq is a library preparation that captures long-range genomic information with the aid of molecular identifiers (barcodes). The same barcode is used to tag the reads derived from the same long DNA fragment within a range of up to 200 kilobases (kb), generating linked-reads. This strategy can be used to phase an entire genome. Here, we introduce a TELL-Seq protocol developed for targeted applications, enabling the phasing of enriched loci of varying sizes, purity levels, and heterozygosity. To validate this protocol, we phased 2-200 kb loci enriched with different methods: CRISPR/Cas9-mediated excision coupled with pulse-field electrophoresis for the longest fragments, CRISPR/Cas9-mediated protection from exonuclease digestion for mid-size fragments, and long PCR for the shortest fragments. All selected loci have known clinical relevance: BRCA1, BRCA2, MLH1, MSH2, MSH6, APC, PMS2, SCN5A-SCN10A, and PKI3CA. Collectively, the analyses show that TELL-Seq can accurately phase 2-200 kb targets using a short-read sequencer.


Assuntos
Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , DNA/genética , Genoma Humano
8.
Yi Chuan ; 46(3): 209-218, 2024 Mar 20.
Artigo em Inglês | MEDLINE | ID: mdl-38632099

RESUMO

Long interspersed elements-1(LINE-1) is the only autonomous transposon in human genome,and its retrotransposition results in change of cellular genome structure and function, leading occurrence of various severe diseases. As a central key intermediated component during life cycle of LINE-1 retrotransposition, the host modification of LINE-1 mRNA affects the LINE-1 transposition directly. N6-adenosine methylation(m6A), the most abundant epigenetic modification on eukaryotic RNA, is dynamically reversible. m6A modification is also found on LINE-1 mRNA, and it participants regulation of the whole LINE-1 replication cycle, with affecting LINE-1 retrotransposition as well as its adjacent genes expression, followed by influencing genomic stability, cellular self-renewal, and differentiation potential, which plays important roles in human development and diseases. In this review, we summarize the research progress in LINE-1 m6A modification, including its modification positions, patterns and related mechanisms, hoping to provide a new sight on the mechanism research and treatment of related diseases.


Assuntos
Adenosina/análogos & derivados , Genoma Humano , RNA , Humanos , Metilação , RNA/metabolismo , RNA Mensageiro/genética
9.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38600664

RESUMO

Small open reading frames (smORFs) have been acknowledged to play various roles on essential biological pathways and affect human beings from diabetes to tumorigenesis. Predicting smORFs in silico is quite a prerequisite for processing the omics data. Here, we proposed the smORF-coding-potential-predicting framework, sOCP, which provides functions to construct a model for predicting novel smORFs in some species. The sOCP model constructed in human was based on in-frame features and the nucleotide bias around the start codon, and the small feature subset was proved to be competent enough and avoid overfitting problems for complicated models. It showed more advanced prediction metrics than previous methods and could correlate closely with experimental evidence in a heterogeneous dataset. The model was applied to Rattus norvegicus and exhibited satisfactory performance. We then scanned smORFs with ATG and non-ATG start codons from the human genome and generated a database containing about a million novel smORFs with coding potential. Around 72 000 smORFs are located on the lncRNA regions of the genome. The smORF-encoded peptides may be involved in biological pathways rare for canonical proteins, including glucocorticoid catabolic process and the prokaryotic defense system. Our work provides a model and database for human smORF investigation and a convenient tool for further smORF prediction in other species.


Assuntos
Genoma Humano , Peptídeos , Animais , Humanos , Ratos , Fases de Leitura Aberta , Peptídeos/genética , Proteínas/genética
10.
Database (Oxford) ; 20242024 Apr 11.
Artigo em Inglês | MEDLINE | ID: mdl-38602506

RESUMO

Short Tandem Repeats (STRs) are genetic markers made up of repeating DNA sequences. The variations of the STRs are widely studied in forensic analysis, population studies and genetic testing for a variety of neuromuscular disorders. Understanding polymorphic STR variation and its cause is crucial for deciphering genetic information and finding links to various disorders. In this paper, we present STRIDE-DB, a novel and unique platform to explore STR Instability and its Phenotypic Relevance, and a comprehensive database of STRs in the human genome. We utilized RepeatMasker to identify all the STRs in the human genome (hg19) and combined it with frequency data from the 1000 Genomes Project. STRIDE-DB, a user-friendly resource, plays a pivotal role in investigating the relationship between STR variation, instability and phenotype. By harnessing data from genome-wide association studies (GWAS), ClinVar database, Alu loci, Haploblocks in genome and Conservation of the STRs, it serves as an important tool for researchers exploring the variability of STRs in the human genome and its direct impact on phenotypes. STRIDE-DB has its broad applicability and significance in various research domains like forensic sciences and other repeat expansion disorders. Database URL: https://stridedb.igib.res.in.


Assuntos
Genoma Humano , Estudo de Associação Genômica Ampla , Humanos , Genoma Humano/genética , Fenótipo , Repetições de Microssatélites/genética , Bases de Dados Factuais
11.
Genome Biol Evol ; 16(3)2024 Mar 02.
Artigo em Inglês | MEDLINE | ID: mdl-38482769

RESUMO

Background selection describes the reduction in neutral diversity caused by selection against deleterious alleles at other loci. It is typically assumed that the purging of deleterious alleles affects linked neutral variants, and indeed simulations typically only treat a genomic window. However, background selection at unlinked loci also depresses neutral diversity. In agreement with previous analytical approximations, in our simulations of a human-like genome with a realistically high genome-wide deleterious mutation rate, the effects of unlinked background selection exceed those of linked background selection. Background selection reduces neutral genetic diversity by a factor that is independent of census population size. Outside of genic regions, the strength of background selection increases with the mean selection coefficient, contradicting the linked theory but in agreement with the unlinked theory. Neutral diversity within genic regions is fairly independent of the strength of selection. Deleterious genetic load among haploid individuals is underdispersed, indicating nonindependent evolution of deleterious mutations. Empirical evidence for underdispersion was previously interpreted as evidence for global epistasis, but we recover it from a non-epistatic model.


Assuntos
Variação Genética , Seleção Genética , Humanos , Mutação , Genoma Humano , Alelos , Modelos Genéticos
12.
BMC Genomics ; 25(1): 273, 2024 Mar 12.
Artigo em Inglês | MEDLINE | ID: mdl-38475709

RESUMO

BACKGROUND: There are two major genetic types of Epstein-Barr Virus (EBV): type 1 (EBV-1) and type 2 (EBV-2). EBV functions by manipulating gene expression in host B cells, using virus-encoded gene regulatory proteins including Epstein-Barr Nuclear Antigen 2 (EBNA2). While type 1 EBNA2 is known to interact with human transcription factors (hTFs) such as RBPJ, EBF1, and SPI1 (PU.1), type 2 EBNA2 shares only ~ 50% amino acid identity with type 1 and thus may have distinct binding partners, human genome binding locations, and functions. RESULTS: In this study, we examined genome-wide EBNA2 binding in EBV-1 and EBV-2 transformed human B cells to identify shared and unique EBNA2 interactions with the human genome, revealing thousands of type-specific EBNA2 ChIP-seq peaks. Computational predictions based on hTF motifs and subsequent ChIP-seq experiments revealed that both type 1 and 2 EBNA2 co-occupy the genome with SPI1 and AP-1 (BATF and JUNB) hTFs. However, type 1 EBNA2 showed preferential co-occupancy with EBF1, and type 2 EBNA2 preferred RBPJ. These differences in hTF co-occupancy revealed possible mechanisms underlying type-specific gene expression of known EBNA2 human target genes: MYC (shared), CXCR7 (type 1 specific), and CD21 (type 2 specific). Both type 1 and 2 EBNA2 binding events were enriched at systemic lupus erythematosus (SLE) and multiple sclerosis (MS) risk loci, while primary biliary cholangitis (PBC) risk loci were specifically enriched for type 2 peaks. CONCLUSIONS: This study reveals extensive type-specific EBNA2 interactions with the human genome, possible differences in EBNA2 interaction partners, and a possible new role for type 2 EBNA2 in autoimmune disorders. Our results highlight the importance of considering EBV type in the control of human gene expression and disease-related investigations.


Assuntos
Infecções por Vírus Epstein-Barr , Herpesvirus Humano 4 , Humanos , Herpesvirus Humano 4/genética , Herpesvirus Humano 4/metabolismo , Infecções por Vírus Epstein-Barr/genética , Infecções por Vírus Epstein-Barr/metabolismo , Genoma Humano , Antígenos Nucleares do Vírus Epstein-Barr/genética , Antígenos Nucleares do Vírus Epstein-Barr/metabolismo , Proteínas Virais/genética , Fatores de Transcrição/metabolismo
13.
BMC Genomics ; 25(1): 318, 2024 Mar 28.
Artigo em Inglês | MEDLINE | ID: mdl-38549092

RESUMO

BACKGROUND: Detecting structural variations (SVs) at the population level using next-generation sequencing (NGS) requires substantial computational resources and processing time. Here, we compared the performances of 11 SV callers: Delly, Manta, GridSS, Wham, Sniffles, Lumpy, SvABA, Canvas, CNVnator, MELT, and INSurVeyor. These SV callers have been recently published and have been widely employed for processing massive whole-genome sequencing datasets. We evaluated the accuracy, sequence depth, running time, and memory usage of the SV callers. RESULTS: Notably, several callers exhibited better calling performance for deletions than for duplications, inversions, and insertions. Among the SV callers, Manta identified deletion SVs with better performance and efficient computing resources, and both Manta and MELT demonstrated relatively good precision regarding calling insertions. We confirmed that the copy number variation callers, Canvas and CNVnator, exhibited better performance in identifying long duplications as they employ the read-depth approach. Finally, we also verified the genotypes inferred from each SV caller using a phased long-read assembly dataset, and Manta showed the highest concordance in terms of the deletions and insertions. CONCLUSIONS: Our findings provide a comprehensive understanding of the accuracy and computational efficiency of SV callers, thereby facilitating integrative analysis of SV profiles in diverse large-scale genomic datasets.


Assuntos
Variações do Número de Cópias de DNA , Genômica , Humanos , Sequenciamento Completo do Genoma , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de DNA , Genoma Humano , Variação Estrutural do Genoma
14.
PLoS Genet ; 20(3): e1011144, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38507461

RESUMO

Across the human genome, there are large-scale fluctuations in genetic diversity caused by the indirect effects of selection. This "linked selection signal" reflects the impact of selection according to the physical placement of functional regions and recombination rates along chromosomes. Previous work has shown that purifying selection acting against the steady influx of new deleterious mutations at functional portions of the genome shapes patterns of genomic variation. To date, statistical efforts to estimate purifying selection parameters from linked selection models have relied on classic Background Selection theory, which is only applicable when new mutations are so deleterious that they cannot fix in the population. Here, we develop a statistical method based on a quantitative genetics view of linked selection, that models how polygenic additive fitness variance distributed along the genome increases the rate of stochastic allele frequency change. By jointly predicting the equilibrium fitness variance and substitution rate due to both strong and weakly deleterious mutations, we estimate the distribution of fitness effects (DFE) and mutation rate across three geographically distinct human samples. While our model can accommodate weaker selection, we find evidence of strong selection operating similarly across all human samples. Although our quantitative genetic model of linked selection fits better than previous models, substitution rates of the most constrained sites disagree with observed divergence levels. We find that a model incorporating selective interference better predicts observed divergence in conserved regions, but overall our results suggest uncertainty remains about the processes generating fitness variation in humans.


Assuntos
Modelos Genéticos , Seleção Genética , Humanos , Evolução Molecular , Frequência do Gene/genética , Mutação , Genoma Humano/genética , Variação Genética , Aptidão Genética
15.
Cell Mol Life Sci ; 81(1): 157, 2024 Mar 31.
Artigo em Inglês | MEDLINE | ID: mdl-38556602

RESUMO

Over half of human genomic DNA is composed of repetitive sequences generated throughout evolution by prolific mobile genetic parasites called transposable elements (TEs). Long disregarded as "junk" or "selfish" DNA, TEs are increasingly recognized as formative elements in genome evolution, wired intimately into the structure and function of the human genome. Advances in sequencing technologies and computational methods have ushered in an era of unprecedented insight into how TE activity impacts human biology in health and disease. Here we discuss the current views on how TEs have shaped the regulatory landscape of the human genome, how TE activity is implicated in human cancers, and how recent findings motivate novel strategies to leverage TE activity for improved cancer therapy. Given the crucial role of methodological advances in TE biology, we pair our conceptual discussions with an in-depth review of the inherent technical challenges in studying repeats, specifically related to structural variation, expression analyses, and chromatin regulation. Lastly, we provide a catalog of existing and emerging assays and bioinformatic software that altogether are enabling the most sophisticated and comprehensive investigations yet into the regulation and function of interspersed repeats in cancer genomes.


Assuntos
Elementos de DNA Transponíveis , Neoplasias , Humanos , Elementos de DNA Transponíveis/genética , Biologia Computacional , Genoma Humano , Neoplasias/genética , Evolução Molecular
16.
Mol Genet Genomics ; 299(1): 37, 2024 Mar 18.
Artigo em Inglês | MEDLINE | ID: mdl-38494535

RESUMO

Identity by descent (IBD) segments, uninterrupted DNA segments derived from the same ancestral chromosomes, are widely used as indicators of relationships in genetics. A great deal of research focuses on IBD segments between related pairs, while the statistical analyses of segments in irrelevant individuals are rare. In this study, we investigated the basic informative features of IBD segments in unrelated pairs in Chinese populations from the 1000 Genome Project. A total of 5922 IBD segments in Chinese interpopulation unrelated individual pairs were detected via IBIS and the average length of IBD was 3.71 Mb in length. It was found that 17.86% of unrelated pairs shared at least one IBD segment in the Chinese cohort. Furthermore, a total of 49 chromosomal regions where IBD segments clustered in high abundance were identified, which might be sharing hotspots in the human genome. Such regions could also be observed in other ancestry populations, which implies that similar IBD backgrounds also exist. Altogether, these results demonstrated the distribution of common background IBD segments, which helps improve the accuracy in pedigree studies based on IBD analysis.


Assuntos
Povo Asiático , Genoma Humano , Humanos , Povo Asiático/genética , Genoma Humano/genética , Linhagem , Projetos de Pesquisa , China
17.
Genome Biol ; 25(1): 69, 2024 Mar 11.
Artigo em Inglês | MEDLINE | ID: mdl-38468278

RESUMO

BACKGROUND: Long-read sequencing can enable the detection of base modifications, such as CpG methylation, in single molecules of DNA. The most commonly used methods for long-read sequencing are nanopore developed by Oxford Nanopore Technologies (ONT) and single molecule real-time (SMRT) sequencing developed by Pacific Bioscience (PacBio). In this study, we systematically compare the performance of CpG methylation detection from long-read sequencing. RESULTS: We demonstrate that CpG methylation detection from 7179 nanopore-sequenced DNA samples is highly accurate and consistent with 132 oxidative bisulfite-sequenced (oxBS) samples, isolated from the same blood draws. We introduce quality filters for CpGs that further enhance the accuracy of CpG methylation detection from nanopore-sequenced DNA, while removing at most 30% of CpGs. We evaluate the per-site performance of CpG methylation detection across different genomic features and CpG methylation rates and demonstrate how the latest R10.4 flowcell chemistry and base-calling algorithms improve methylation detection from nanopore sequencing. Additionally, we show how the methylation detection of 50 SMRT-sequenced genomes compares to nanopore sequencing and oxBS. CONCLUSIONS: This study provides the first systematic comparison of CpG methylation detection tools for long-read sequencing methods. We compare two commonly used computational methods for the detection of CpG methylation in a large number of nanopore genomes, including samples sequenced using the latest R10.4 nanopore flowcell chemistry and 50 SMRT sequenced samples. We provide insights into the strengths and limitations of each sequencing method as well as recommendations for standardization and evaluation of tools designed for genome-scale modified base detection using long-read sequencing.


Assuntos
Metilação de DNA , Genoma Humano , Humanos , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , DNA
18.
Comput Biol Med ; 171: 108230, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38442554

RESUMO

Interpreting single-cell chromatin accessibility data is crucial for understanding intercellular heterogeneity regulation. Despite the progress in computational methods for analyzing this data, there is still a lack of a comprehensive analytical framework and a user-friendly online analysis tool. To fill this gap, we developed a pre-trained deep learning-based framework, single-cell auto-correlation transformers (scAuto), to overcome the challenge. Following DNABERT's methodology of pre-training and fine-tuning, scAuto learns a general understanding of DNA sequence's grammar by being pre-trained on unlabeled human genome via self-supervision; it is then transferred to the single-cell chromatin accessibility analysis task of scATAC-seq data for supervised fine-tuning. We extensively validated scAuto on the Buenrostro2018 dataset, demonstrating its superior performance on chromatin accessibility prediction, single-cell clustering, and data denoising. Based on scAuto, we further developed an interactive web server for single-cell chromatin accessibility data analysis. It integrates tutorial-style interfaces for those with limited programming skills. The platform is accessible at http://zhanglab.icaup.cn. To our knowledge, this work is expected to help analyze single-cell chromatin accessibility data and facilitate the development of precision medicine.


Assuntos
Cromatina , DNA , Humanos , Análise de Sequência de DNA , Genoma Humano , Análise de Dados
19.
Science ; 383(6688): 1215-1222, 2024 Mar 15.
Artigo em Inglês | MEDLINE | ID: mdl-38484065

RESUMO

DNA replication is initiated at multiple loci to ensure timely duplication of eukaryotic genomes. Sister replication forks progress bidirectionally, and replication terminates when two convergent forks encounter one another. To investigate the coordination of replication forks, we developed a replication-associated in situ HiC method to capture chromatin interactions involving nascent DNA. We identify more than 2000 fountain-like structures of chromatin contacts in human and mouse genomes, indicative of coupling of DNA replication forks. Replication fork interaction not only occurs between sister forks but also involves forks from two distinct origins to predetermine replication termination. Termination-associated chromatin fountains are sensitive to replication stress and lead to coupled forks-associated genomic deletions in cancers. These findings reveal the spatial organization of DNA replication forks within the chromatin context.


Assuntos
Cromatina , Replicação do DNA , DNA , Genoma Humano , Animais , Humanos , Camundongos , Cromatina/química , DNA/química , DNA/genética , Conformação Proteica , Sequenciamento de Nucleotídeos em Larga Escala
20.
Bioinformatics ; 40(3)2024 Mar 04.
Artigo em Inglês | MEDLINE | ID: mdl-38444093

RESUMO

MOTIVATION: Structural variants (SVs) play a causal role in numerous diseases but can be difficult to detect and accurately genotype (determine zygosity) with short-read genome sequencing data (SRS). Improving SV genotyping accuracy in SRS data, particularly for the many SVs first detected with long-read sequencing, will improve our understanding of genetic variation. RESULTS: NPSV-deep is a deep learning-based approach for genotyping previously reported insertion and deletion SVs that recasts this task as an image similarity problem. NPSV-deep predicts the SV genotype based on the similarity between pileup images generated from the actual SRS data and matching SRS simulations. We show that NPSV-deep consistently matches or improves upon the state-of-the-art for SV genotyping accuracy across different SV call sets, samples and variant types, including a 25% reduction in genotyping errors for the Genome-in-a-Bottle (GIAB) high-confidence SVs. NPSV-deep is not limited to the SVs as described; it improves deletion genotyping concordance a further 1.5 percentage points for GIAB SVs (92%) by automatically correcting imprecise/incorrectly described SVs. AVAILABILITY AND IMPLEMENTATION: Python/C++ source code and pre-trained models freely available at https://github.com/mlinderm/npsv2.


Assuntos
Aprendizado Profundo , Humanos , Genótipo , Genoma Humano , Software , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Variação Estrutural do Genoma
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...