Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 1.130
Filtrar
1.
Cell ; 187(5): 1024-1037, 2024 Feb 29.
Artigo em Inglês | MEDLINE | ID: mdl-38290514

RESUMO

This perspective focuses on advances in genome technology over the last 25 years and their impact on germline variant discovery within the field of human genetics. The field has witnessed tremendous technological advances from microarrays to short-read sequencing and now long-read sequencing. Each technology has provided genome-wide access to different classes of human genetic variation. We are now on the verge of comprehensive variant detection of all forms of variation for the first time with a single assay. We predict that this transition will further transform our understanding of human health and biology and, more importantly, provide novel insights into the dynamic mutational processes shaping our genomes.


Assuntos
Variação Estrutural do Genoma , Genômica , Humanos , Genômica/métodos , Mutação em Linhagem Germinativa , Mutação , Tecnologia
2.
Cell ; 187(6): 1547-1562.e13, 2024 Mar 14.
Artigo em Inglês | MEDLINE | ID: mdl-38428424

RESUMO

We sequenced and assembled using multiple long-read sequencing technologies the genomes of chimpanzee, bonobo, gorilla, orangutan, gibbon, macaque, owl monkey, and marmoset. We identified 1,338,997 lineage-specific fixed structural variants (SVs) disrupting 1,561 protein-coding genes and 136,932 regulatory elements, including the most complete set of human-specific fixed differences. We estimate that 819.47 Mbp or ∼27% of the genome has been affected by SVs across primate evolution. We identify 1,607 structurally divergent regions wherein recurrent structural variation contributes to creating SV hotspots where genes are recurrently lost (e.g., CARD, C4, and OLAH gene families) and additional lineage-specific genes are generated (e.g., CKAP2, VPS36, ACBD7, and NEK5 paralogs), becoming targets of rapid chromosomal diversification and positive selection (e.g., RGPD gene family). High-fidelity long-read sequencing has made these dynamic regions of the genome accessible for sequence-level analyses within and between primate species.


Assuntos
Genoma , Primatas , Animais , Humanos , Sequência de Bases , Primatas/classificação , Primatas/genética , Evolução Biológica , Análise de Sequência de DNA , Variação Estrutural do Genoma
3.
Cell ; 186(11): 2438-2455.e22, 2023 05 25.
Artigo em Inglês | MEDLINE | ID: mdl-37178687

RESUMO

The generation of distinct messenger RNA isoforms through alternative RNA processing modulates the expression and function of genes, often in a cell-type-specific manner. Here, we assess the regulatory relationships between transcription initiation, alternative splicing, and 3' end site selection. Applying long-read sequencing to accurately represent even the longest transcripts from end to end, we quantify mRNA isoforms in Drosophila tissues, including the transcriptionally complex nervous system. We find that in Drosophila heads, as well as in human cerebral organoids, 3' end site choice is globally influenced by the site of transcription initiation (TSS). "Dominant promoters," characterized by specific epigenetic signatures including p300/CBP binding, impose a transcriptional constraint to define splice and polyadenylation variants. In vivo deletion or overexpression of dominant promoters as well as p300/CBP loss disrupted the 3' end expression landscape. Our study demonstrates the crucial impact of TSS choice on the regulation of transcript diversity and tissue identity.


Assuntos
Processamento Alternativo , Isoformas de RNA , Sítio de Iniciação de Transcrição , Humanos , Poliadenilação , Regiões Promotoras Genéticas , Isoformas de RNA/metabolismo , RNA Mensageiro/metabolismo
4.
Cell ; 182(1): 145-161.e23, 2020 07 09.
Artigo em Inglês | MEDLINE | ID: mdl-32553272

RESUMO

Structural variants (SVs) underlie important crop improvement and domestication traits. However, resolving the extent, diversity, and quantitative impact of SVs has been challenging. We used long-read nanopore sequencing to capture 238,490 SVs in 100 diverse tomato lines. This panSV genome, along with 14 new reference assemblies, revealed large-scale intermixing of diverse genotypes, as well as thousands of SVs intersecting genes and cis-regulatory regions. Hundreds of SV-gene pairs exhibit subtle and significant expression changes, which could broadly influence quantitative trait variation. By combining quantitative genetics with genome editing, we show how multiple SVs that changed gene dosage and expression levels modified fruit flavor, size, and production. In the last example, higher order epistasis among four SVs affecting three related transcription factors allowed introduction of an important harvesting trait in modern tomato. Our findings highlight the underexplored role of SVs in genotype-to-phenotype relationships and their widespread importance and utility in crop improvement.


Assuntos
Produtos Agrícolas/genética , Regulação da Expressão Gênica de Plantas , Variação Estrutural do Genoma , Solanum lycopersicum/genética , Alelos , Sistema Enzimático do Citocromo P-450/genética , Ecótipo , Epistasia Genética , Frutas/genética , Duplicação Gênica , Genoma de Planta , Genótipo , Endogamia , Anotação de Sequência Molecular , Fenótipo , Melhoramento Vegetal , Locos de Características Quantitativas/genética
5.
Cell ; 176(6): 1310-1324.e10, 2019 03 07.
Artigo em Inglês | MEDLINE | ID: mdl-30827684

RESUMO

DNA rearrangements resulting in human genome structural variants (SVs) are caused by diverse mutational mechanisms. We used long- and short-read sequencing technologies to investigate end products of de novo chromosome 17p11.2 rearrangements and query the molecular mechanisms underlying both recurrent and non-recurrent events. Evidence for an increased rate of clustered single-nucleotide variant (SNV) mutation in cis with non-recurrent rearrangements was found. Indel and SNV formation are associated with both copy-number gains and losses of 17p11.2, occur up to ∼1 Mb away from the breakpoint junctions, and favor C > G transversion substitutions; results suggest that single-stranded DNA is formed during the genesis of the SV and provide compelling support for a microhomology-mediated break-induced replication (MMBIR) mechanism for SV formation. Our data show an additional mutational burden of MMBIR consisting of hypermutation confined to the locus and manifesting as SNVs and indels predominantly within genes.


Assuntos
Cromossomos Humanos Par 17 , Mutação , Anormalidades Múltiplas/genética , Pontos de Quebra do Cromossomo , Transtornos Cromossômicos/genética , Duplicação Cromossômica/genética , Variações do Número de Cópias de DNA , Reparo do DNA/genética , Replicação do DNA , Rearranjo Gênico , Genoma Humano , Variação Estrutural do Genoma , Humanos , Mutação INDEL , Modelos Genéticos , Polimorfismo de Nucleotídeo Único , Recombinação Genética , Análise de Sequência de DNA/métodos , Síndrome de Smith-Magenis/genética
6.
Mol Cell ; 82(1): 209-217.e7, 2022 01 06.
Artigo em Inglês | MEDLINE | ID: mdl-34951964

RESUMO

Extrachromosomal circular DNA (eccDNA) is common in somatic tissue, but its existence and effects in the human germline are unexplored. We used microscopy, long-read DNA sequencing, and new analytic methods to document thousands of eccDNAs from human sperm. EccDNAs derived from all genomic regions and mostly contained a single DNA fragment, although some consisted of multiple fragments. The generation of eccDNA inversely correlates with the meiotic recombination rate, and chromosomes with high coding-gene density and Alu element abundance form the least eccDNA. Analysis of insertions in human genomes further indicates that eccDNA can persist in the human germline when the circular molecules reinsert themselves into the chromosomes. Our results suggest that eccDNA has transient and permanent effects on the germline. They explain how differences in the physical and genetic map might arise and offer an explanation of how Alu elements coevolved with genes to protect genome integrity against deleterious mutations producing eccDNA.


Assuntos
Cromossomos Humanos , DNA Circular/metabolismo , Meiose , Recombinação Genética , Espermatozoides/metabolismo , Elementos Alu , DNA Circular/genética , Evolução Molecular , Regulação da Expressão Gênica no Desenvolvimento , Humanos , Masculino , Mutação
7.
Mol Cell ; 81(5): 998-1012.e7, 2021 03 04.
Artigo em Inglês | MEDLINE | ID: mdl-33440169

RESUMO

Pre-mRNA processing steps are tightly coordinated with transcription in many organisms. To determine how co-transcriptional splicing is integrated with transcription elongation and 3' end formation in mammalian cells, we performed long-read sequencing of individual nascent RNAs and precision run-on sequencing (PRO-seq) during mouse erythropoiesis. Splicing was not accompanied by transcriptional pausing and was detected when RNA polymerase II (Pol II) was within 75-300 nucleotides of 3' splice sites (3'SSs), often during transcription of the downstream exon. Interestingly, several hundred introns displayed abundant splicing intermediates, suggesting that splicing delays can take place between the two catalytic steps. Overall, splicing efficiencies were correlated among introns within the same transcript, and intron retention was associated with inefficient 3' end cleavage. Remarkably, a thalassemia patient-derived mutation introducing a cryptic 3'SS improved both splicing and 3' end cleavage of individual ß-globin transcripts, demonstrating functional coupling between the two co-transcriptional processes as a determinant of productive gene output.


Assuntos
Células Eritroides/metabolismo , Eritropoese/genética , RNA Polimerase II/genética , Splicing de RNA , Elongação da Transcrição Genética , Globinas beta/genética , Animais , Sequência de Bases , Diferenciação Celular , Linhagem Celular Tumoral , Células Eritroides/citologia , Éxons , Humanos , Íntrons , Leucócitos/citologia , Leucócitos/metabolismo , Camundongos , Mutação , Clivagem do RNA , RNA Polimerase II/metabolismo , Sítios de Splice de RNA , Spliceossomos/genética , Spliceossomos/metabolismo , Globinas beta/deficiência , Talassemia beta/genética , Talassemia beta/metabolismo , Talassemia beta/patologia
8.
Am J Hum Genet ; 111(3): 544-561, 2024 03 07.
Artigo em Inglês | MEDLINE | ID: mdl-38307027

RESUMO

Cervical cancer is caused by human papillomavirus (HPV) infection, has few approved targeted therapeutics, and is the most common cause of cancer death in low-resource countries. We characterized 19 cervical and four head and neck cancer cell lines using long-read DNA and RNA sequencing and identified the HPV types, HPV integration sites, chromosomal alterations, and cancer driver mutations. Structural variation analysis revealed telomeric deletions associated with DNA inversions resulting from breakage-fusion-bridge (BFB) cycles. BFB is a common mechanism of chromosomal alterations in cancer, and our study applies long-read sequencing to this important chromosomal rearrangement type. Analysis of the inversion sites revealed staggered ends consistent with exonuclease digestion of the DNA after breakage. Some BFB events are complex, involving inter- or intra-chromosomal insertions or rearrangements. None of the BFB breakpoints had telomere sequences added to resolve the dicentric chromosomes, and only one BFB breakpoint showed chromothripsis. Five cell lines have a chromosomal region 11q BFB event, with YAP1-BIRC3-BIRC2 amplification. Indeed, YAP1 amplification is associated with a 10-year-earlier age of diagnosis of cervical cancer and is three times more common in African American women. This suggests that individuals with cervical cancer and YAP1-BIRC3-BIRC2 amplification, especially those of African ancestry, might benefit from targeted therapy. In summary, we uncovered valuable insights into the mechanisms and consequences of BFB cycles in cervical cancer using long-read sequencing.


Assuntos
Infecções por Papillomavirus , Neoplasias do Colo do Útero , Feminino , Humanos , Neoplasias do Colo do Útero/genética , Aberrações Cromossômicas , Telômero/genética , DNA
9.
Am J Hum Genet ; 111(9): 1914-1931, 2024 Sep 05.
Artigo em Inglês | MEDLINE | ID: mdl-39079539

RESUMO

A major fraction of loci identified by genome-wide association studies (GWASs) mediate alternative splicing, but mechanistic interpretation is hindered by the technical limitations of short-read RNA sequencing (RNA-seq), which cannot directly link splicing events to full-length protein isoforms. Long-read RNA-seq represents a powerful tool to characterize transcript isoforms, and recently, infer protein isoform existence. Here, we present an approach that integrates information from GWASs, splicing quantitative trait loci (sQTLs), and PacBio long-read RNA-seq in a disease-relevant model to infer the effects of sQTLs on the ultimate protein isoform products they encode. We demonstrate the utility of our approach using bone mineral density (BMD) GWAS data. We identified 1,863 sQTLs from the Genotype-Tissue Expression (GTEx) project in 732 protein-coding genes that colocalized with BMD associations (H4PP ≥ 0.75). We generated PacBio Iso-Seq data (N = ∼22 million full-length reads) on human osteoblasts, identifying 68,326 protein-coding isoforms, of which 17,375 (25%) were unannotated. By casting the sQTLs onto protein isoforms, we connected 809 sQTLs to 2,029 protein isoforms from 441 genes expressed in osteoblasts. Overall, we found that 74 sQTLs influenced isoforms likely impacted by nonsense-mediated decay and 190 that potentially resulted in the expression of unannotated protein isoforms. Finally, we functionally validated colocalizing sQTLs in TPM2, in which siRNA-mediated knockdown in osteoblasts showed two TPM2 isoforms with opposing effects on mineralization but exhibited no effect upon knockdown of the entire gene. Our approach should be to generalize across diverse clinical traits and to provide insights into protein isoform activities modulated by GWAS loci.


Assuntos
Processamento Alternativo , Densidade Óssea , Estudo de Associação Genômica Ampla , Isoformas de Proteínas , Proteogenômica , Locos de Características Quantitativas , Humanos , Isoformas de Proteínas/genética , Densidade Óssea/genética , Processamento Alternativo/genética , Proteogenômica/métodos , Osteoblastos/metabolismo , Polimorfismo de Nucleotídeo Único
10.
Genome Res ; 2024 Oct 02.
Artigo em Inglês | MEDLINE | ID: mdl-39358016

RESUMO

DNA modifications in bacteria present diverse types and distributions, playing crucial functional roles. Current methods for detecting bacterial DNA modifications via nanopore sequencing typically involve comparing raw current signals to a methylation-free control. In this study, we found that bacterial DNA modification induces errors in nanopore reads. And these errors are found only in one strand but not the other, showing a strand-specific bias. Leveraging this discovery, we developed Hammerhead, a pioneering pipeline designed for de novo methylation discovery that circumvents the necessity of raw signal inference and a methylation-free control. The majority (14 out of 16) of the identified motifs can be validated by raw signal comparison methods or by identifying corresponding methyltransferases in bacteria. Additionally, we included a novel polishing strategy employing duplex reads to correct modification-induced errors in bacterial genome assemblies, achieving a reduction of over 85% in such errors. In summary, Hammerhead enables users to effectively locate bacterial DNA methylation sites from nanopore FASTQ/FASTA reads, thus holds promise as a routine pipeline for a wide range of nanopore sequencing applications, such as genome assembly, metagenomic binning, decontaminating eukaryotic genome assembly, and functional analysis for DNA modifications.

11.
Trends Genet ; 39(1): 31-33, 2023 01.
Artigo em Inglês | MEDLINE | ID: mdl-36207147

RESUMO

Disturbance in the regulation of transcript structure plays a crucial role in human disease. In a recent study, Glinos et al. characterized allele-specific transcript alterations in long-read RNA sequencing (RNA-seq) data derived from multiple human tissues and provide a high-resolution view of how disease-associated genetic variants affect transcript structure.


Assuntos
RNA , Transcriptoma , Humanos , Transcriptoma/genética , Alelos , RNA/genética , Análise de Sequência de RNA , Sequência de Bases , Sequenciamento de Nucleotídeos em Larga Escala , Perfilação da Expressão Gênica
12.
Trends Genet ; 39(9): 649-671, 2023 09.
Artigo em Inglês | MEDLINE | ID: mdl-37230864

RESUMO

Long-read sequencing (LRS) technologies have provided extremely powerful tools to explore genomes. While in the early years these methods suffered technical limitations, they have recently made significant progress in terms of read length, throughput, and accuracy and bioinformatics tools have strongly improved. Here, we aim to review the current status of LRS technologies, the development of novel methods, and the impact on genomics research. We will explore the most impactful recent findings made possible by these technologies focusing on high-resolution sequencing of genomes and transcriptomes and the direct detection of DNA and RNA modifications. We will also discuss how LRS methods promise a more comprehensive understanding of human genetic variation, transcriptomics, and epigenetics for the coming years.


Assuntos
Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Genômica/métodos , Análise de Sequência de DNA/métodos , Biologia Computacional , Perfilação da Expressão Gênica/métodos
13.
Annu Rev Genomics Hum Genet ; 24: 109-132, 2023 08 25.
Artigo em Inglês | MEDLINE | ID: mdl-37075062

RESUMO

DNA sequencing has revolutionized medicine over recent decades. However, analysis of large structural variation and repetitive DNA, a hallmark of human genomes, has been limited by short-read technology, with read lengths of 100-300 bp. Long-read sequencing (LRS) permits routine sequencing of human DNA fragments tens to hundreds of kilobase pairs in size, using both real-time sequencing by synthesis and nanopore-based direct electronic sequencing. LRS permits analysis of large structural variation and haplotypic phasing in human genomes and has enabled the discovery and characterization of rare pathogenic structural variants and repeat expansions. It has also recently enabled the assembly of a complete, gapless human genome that includes previously intractable regions, such as highly repetitive centromeres and homologous acrocentric short arms. With the addition of protocols for targeted enrichment, direct epigenetic DNA modification detection, and long-range chromatin profiling, LRS promises to launch a new era of understanding of genetic diversity and pathogenic mutations in human populations.


Assuntos
DNA , Sequências Repetitivas de Ácido Nucleico , Humanos , Análise de Sequência de DNA/métodos , Sequência de Bases , Mutação , DNA/genética
14.
Am J Hum Genet ; 110(8): 1343-1355, 2023 08 03.
Artigo em Inglês | MEDLINE | ID: mdl-37541188

RESUMO

Despite significant progress in unraveling the genetic causes of neurodevelopmental disorders (NDDs), a substantial proportion of individuals with NDDs remain without a genetic diagnosis after microarray and/or exome sequencing. Here, we aimed to assess the power of short-read genome sequencing (GS), complemented with long-read GS, to identify causal variants in participants with NDD from the National Institute for Health and Care Research (NIHR) BioResource project. Short-read GS was conducted on 692 individuals (489 affected and 203 unaffected relatives) from 465 families. Additionally, long-read GS was performed on five affected individuals who had structural variants (SVs) in technically challenging regions, had complex SVs, or required distal variant phasing. Causal variants were identified in 36% of affected individuals (177/489), and a further 23% (112/489) had a variant of uncertain significance after multiple rounds of re-analysis. Among all reported variants, 88% (333/380) were coding nuclear SNVs or insertions and deletions (indels), and the remainder were SVs, non-coding variants, and mitochondrial variants. Furthermore, long-read GS facilitated the resolution of challenging SVs and invalidated variants of difficult interpretation from short-read GS. This study demonstrates the value of short-read GS, complemented with long-read GS, in investigating the genetic causes of NDDs. GS provides a comprehensive and unbiased method of identifying all types of variants throughout the nuclear and mitochondrial genomes in individuals with NDD.


Assuntos
Genoma Humano , Transtornos do Neurodesenvolvimento , Humanos , Genoma Humano/genética , Mapeamento Cromossômico , Sequência de Bases , Mutação INDEL , Transtornos do Neurodesenvolvimento/genética
15.
Am J Hum Genet ; 110(8): 1229-1248, 2023 08 03.
Artigo em Inglês | MEDLINE | ID: mdl-37541186

RESUMO

Despite advances in clinical genetic testing, including the introduction of exome sequencing (ES), more than 50% of individuals with a suspected Mendelian condition lack a precise molecular diagnosis. Clinical evaluation is increasingly undertaken by specialists outside of clinical genetics, often occurring in a tiered fashion and typically ending after ES. The current diagnostic rate reflects multiple factors, including technical limitations, incomplete understanding of variant pathogenicity, missing genotype-phenotype associations, complex gene-environment interactions, and reporting differences between clinical labs. Maintaining a clear understanding of the rapidly evolving landscape of diagnostic tests beyond ES, and their limitations, presents a challenge for non-genetics professionals. Newer tests, such as short-read genome or RNA sequencing, can be challenging to order, and emerging technologies, such as optical genome mapping and long-read DNA sequencing, are not available clinically. Furthermore, there is no clear guidance on the next best steps after inconclusive evaluation. Here, we review why a clinical genetic evaluation may be negative, discuss questions to be asked in this setting, and provide a framework for further investigation, including the advantages and disadvantages of new approaches that are nascent in the clinical sphere. We present a guide for the next best steps after inconclusive molecular testing based upon phenotype and prior evaluation, including when to consider referral to research consortia focused on elucidating the underlying cause of rare unsolved genetic disorders.


Assuntos
Exoma , Testes Genéticos , Humanos , Exoma/genética , Análise de Sequência de DNA , Fenótipo , Sequenciamento do Exoma , Doenças Raras
16.
Am J Hum Genet ; 110(2): 240-250, 2023 02 02.
Artigo em Inglês | MEDLINE | ID: mdl-36669496

RESUMO

Spinal muscular atrophy, a leading cause of early infant death, is caused by bi-allelic mutations of SMN1. Sequence analysis of SMN1 is challenging due to high sequence similarity with its paralog SMN2. Both genes have variable copy numbers across populations. Furthermore, without pedigree information, it is currently not possible to identify silent carriers (2+0) with two copies of SMN1 on one chromosome and zero copies on the other. We developed Paraphase, an informatics method that identifies full-length SMN1 and SMN2 haplotypes, determines the gene copy numbers, and calls phased variants using long-read PacBio HiFi data. The SMN1 and SMN2 copy-number calls by Paraphase are highly concordant with orthogonal methods (99.2% for SMN1 and 100% for SMN2). We applied Paraphase to 438 samples across 5 ethnic populations to conduct a population-wide haplotype analysis of these highly homologous genes. We identified major SMN1 and SMN2 haplogroups and characterized their co-segregation through pedigree-based analyses. We identified two SMN1 haplotypes that form a common two-copy SMN1 allele in African populations. Testing positive for these two haplotypes in an individual with two copies of SMN1 gives a silent carrier risk of 88.5%, which is significantly higher than the currently used marker (1.7%-3.0%). Extending beyond simple copy-number testing, Paraphase can detect pathogenic variants and enable potential haplotype-based screening of silent carriers through statistical phasing of haplotypes into alleles. Future analysis of larger population data will allow identification of more diverse haplotypes and genetic markers for silent carriers.


Assuntos
Atrofia Muscular Espinal , Lactente , Humanos , Atrofia Muscular Espinal/genética , Atrofia Muscular Espinal/diagnóstico , Mutação , Dosagem de Genes , Linhagem , Análise de Sequência , Proteína 1 de Sobrevivência do Neurônio Motor/genética , Proteína 2 de Sobrevivência do Neurônio Motor/genética
17.
Am J Hum Genet ; 110(3): 427-441, 2023 03 02.
Artigo em Inglês | MEDLINE | ID: mdl-36787739

RESUMO

Ewing sarcoma (EwS) is a rare bone and soft tissue malignancy driven by chromosomal translocations encoding chimeric transcription factors, such as EWSR1-FLI1, that bind GGAA motifs forming novel enhancers that alter nearby expression. We propose that germline microsatellite variation at the 6p25.1 EwS susceptibility locus could impact downstream gene expression and EwS biology. We performed targeted long-read sequencing of EwS blood DNA to characterize variation and genomic features important for EWSR1-FLI1 binding. We identified 50 microsatellite alleles at 6p25.1 and observed that EwS-affected individuals had longer alleles (>135 bp) with more GGAA repeats. The 6p25.1 GGAA microsatellite showed chromatin features of an EWSR1-FLI1 enhancer and regulated expression of RREB1, a transcription factor associated with RAS/MAPK signaling. RREB1 knockdown reduced proliferation and clonogenic potential and reduced expression of cell cycle and DNA replication genes. Our integrative analysis at 6p25.1 details increased binding of longer GGAA microsatellite alleles with acquired EWSR-FLI1 to promote Ewing sarcomagenesis by RREB1-mediated proliferation.


Assuntos
Neoplasias Ósseas , Sarcoma de Ewing , Humanos , Alelos , Neoplasias Ósseas/genética , Neoplasias Ósseas/patologia , Linhagem Celular Tumoral , Regulação Neoplásica da Expressão Gênica , Proteínas de Fusão Oncogênica/genética , Proteínas de Fusão Oncogênica/metabolismo , Proteína Proto-Oncogênica c-fli-1/genética , Proteína Proto-Oncogênica c-fli-1/metabolismo , Proteína EWS de Ligação a RNA/genética , Proteína EWS de Ligação a RNA/metabolismo , Sarcoma de Ewing/genética , Sarcoma de Ewing/metabolismo , Sarcoma de Ewing/patologia
18.
RNA ; 30(8): 955-966, 2024 Jul 16.
Artigo em Inglês | MEDLINE | ID: mdl-38777382

RESUMO

The long noncoding RNA TERRA is transcribed from telomeres in virtually all eukaryotes with linear chromosomes. In humans, TERRA transcription is driven in part by promoters comprising CpG dinucleotide-rich repeats of 29 bp repeats, believed to be present in half of the subtelomeres. Thus far, TERRA expression has been analyzed mainly using molecular biology-based approaches that only generate partial and somehow biased results. Here, we present a novel experimental pipeline to study human TERRA based on long-read sequencing (TERRA ONTseq). By applying TERRA ONTseq to different cell lines, we show that the vast majority of human telomeres produce TERRA and that the cellular levels of TERRA transcripts vary according to their chromosomes of origin. Using TERRA ONTseq, we also identified regions containing TERRA transcription start sites (TSSs) in more than half of human subtelomeres. TERRA TSS regions are generally found immediately downstream from 29 bp repeat-related sequences, which appear to be more widespread than previously estimated. Finally, we isolated a novel TERRA promoter from the highly expressed subtelomere of the long arm of Chromosome 7. With the development of TERRA ONTseq, we provide a refined picture of human TERRA biogenesis and expression and we equip the scientific community with an invaluable tool for future studies.


Assuntos
Regiões Promotoras Genéticas , RNA Longo não Codificante , Telômero , Sítio de Iniciação de Transcrição , Transcriptoma , Humanos , Telômero/genética , Telômero/metabolismo , RNA Longo não Codificante/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de RNA/métodos
19.
Brief Bioinform ; 25(5)2024 Jul 25.
Artigo em Inglês | MEDLINE | ID: mdl-39256200

RESUMO

Copy number variations (CNVs) play pivotal roles in disease susceptibility and have been intensively investigated in human disease studies. Long-read sequencing technologies offer opportunities for comprehensive structural variation (SV) detection, and numerous methodologies have been developed recently. Consequently, there is a pressing need to assess these methods and aid researchers in selecting appropriate techniques for CNV detection using long-read sequencing. Hence, we conducted an evaluation of eight CNV calling methods across 22 datasets from nine publicly available samples and 15 simulated datasets, covering multiple sequencing platforms. The overall performance of CNV callers varied substantially and was influenced by the input dataset type, sequencing depth, and CNV type, among others. Specifically, the PacBio CCS sequencing platform outperformed PacBio CLR and Nanopore platforms regarding CNV detection recall rates. A sequencing depth of 10x demonstrated the capability to identify 85% of the CNVs detected in a 50x dataset. Moreover, deletions were more generally detectable than duplications. Among the eight benchmarked methods, cuteSV, Delly, pbsv, and Sniffles2 demonstrated superior accuracy, while SVIM exhibited high recall rates.


Assuntos
Algoritmos , Variações do Número de Cópias de DNA , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Biologia Computacional/métodos , Genoma Humano
20.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38385878

RESUMO

Structural Variants (SVs) are a crucial type of genetic variant that can significantly impact phenotypes. Therefore, the identification of SVs is an essential part of modern genomic analysis. In this article, we present kled, an ultra-fast and sensitive SV caller for long-read sequencing data given the specially designed approach with a novel signature-merging algorithm, custom refinement strategies and a high-performance program structure. The evaluation results demonstrate that kled can achieve optimal SV calling compared to several state-of-the-art methods on simulated and real long-read data for different platforms and sequencing depths. Furthermore, kled excels at rapid SV calling and can efficiently utilize multiple Central Processing Unit (CPU) cores while maintaining low memory usage. The source code for kled can be obtained from https://github.com/CoREse/kled.


Assuntos
Algoritmos , Genômica , Fenótipo , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA