Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 315
Filtrar
1.
Cell ; 186(7): 1493-1511.e40, 2023 03 30.
Artigo em Inglês | MEDLINE | ID: mdl-37001506

RESUMO

Understanding how genetic variants impact molecular phenotypes is a key goal of functional genomics, currently hindered by reliance on a single haploid reference genome. Here, we present the EN-TEx resource of 1,635 open-access datasets from four donors (∼30 tissues × âˆ¼15 assays). The datasets are mapped to matched, diploid genomes with long-read phasing and structural variants, instantiating a catalog of >1 million allele-specific loci. These loci exhibit coordinated activity along haplotypes and are less conserved than corresponding, non-allele-specific ones. Surprisingly, a deep-learning transformer model can predict the allele-specific activity based only on local nucleotide-sequence context, highlighting the importance of transcription-factor-binding motifs particularly sensitive to variants. Furthermore, combining EN-TEx with existing genome annotations reveals strong associations between allele-specific and GWAS loci. It also enables models for transferring known eQTLs to difficult-to-profile tissues (e.g., from skin to heart). Overall, EN-TEx provides rich data and generalizable models for more accurate personal functional genomics.


Assuntos
Epigenoma , Locos de Características Quantitativas , Estudo de Associação Genômica Ampla , Genômica , Fenótipo , Polimorfismo de Nucleotídeo Único
2.
Cell ; 185(16): 3025-3040.e6, 2022 08 04.
Artigo em Inglês | MEDLINE | ID: mdl-35882231

RESUMO

Non-allelic recombination between homologous repetitive elements contributes to evolution and human genetic disorders. Here, we combine short- and long-DNA read sequencing of repeat elements with a new bioinformatics pipeline to show that somatic recombination of Alu and L1 elements is widespread in the human genome. Our analysis uncovers tissue-specific non-allelic homologous recombination hallmarks; moreover, we find that centromeres and cancer-associated genes are enriched for retroelements that may act as recombination hotspots. We compare recombination profiles in human-induced pluripotent stem cells and differentiated neurons and find that the neuron-specific recombination of repeat elements accompanies chromatin changes during cell-fate determination. Finally, we report that somatic recombination profiles are altered in Parkinson's and Alzheimer's disease, suggesting a link between retroelement recombination and genomic instability in neurodegeneration. This work highlights a significant contribution of the somatic recombination of repeat elements to genomic diversity in health and disease.


Assuntos
Genoma Humano , Retroelementos , Elementos Alu/genética , Recombinação Homóloga , Humanos , Elementos Nucleotídeos Longos e Dispersos , Sequências Repetitivas de Ácido Nucleico
3.
Cell ; 174(2): 433-447.e19, 2018 07 12.
Artigo em Inglês | MEDLINE | ID: mdl-29909985

RESUMO

Nearly all prostate cancer deaths are from metastatic castration-resistant prostate cancer (mCRPC), but there have been few whole-genome sequencing (WGS) studies of this disease state. We performed linked-read WGS on 23 mCRPC biopsy specimens and analyzed cell-free DNA sequencing data from 86 patients with mCRPC. In addition to frequent rearrangements affecting known prostate cancer genes, we observed complex rearrangements of the AR locus in most cases. Unexpectedly, these rearrangements include highly recurrent tandem duplications involving an upstream enhancer of AR in 70%-87% of cases compared with <2% of primary prostate cancers. A subset of cases displayed AR or MYC enhancer duplication in the context of a genome-wide tandem duplicator phenotype associated with CDK12 inactivation. Our findings highlight the complex genomic structure of mCRPC, nominate alterations that may inform prostate cancer treatment, and suggest that additional recurrent events in the non-coding mCRPC genome remain to be discovered.


Assuntos
Neoplasias de Próstata Resistentes à Castração/patologia , Receptores Androgênicos/genética , Sequenciamento Completo do Genoma , Idoso , Anilidas/uso terapêutico , Quinases Ciclina-Dependentes/genética , Quinases Ciclina-Dependentes/metabolismo , Elementos Facilitadores Genéticos/genética , Duplicação Gênica , Rearranjo Gênico , Genes myc , Loci Gênicos , Haplótipos , Humanos , Masculino , Pessoa de Meia-Idade , Metástase Neoplásica , PTEN Fosfo-Hidrolase/genética , Fenótipo , Antígeno Prostático Específico/sangue , Neoplasias de Próstata Resistentes à Castração/tratamento farmacológico , Neoplasias de Próstata Resistentes à Castração/genética , Inibidores de Proteínas Quinases/uso terapêutico , Piridinas/uso terapêutico
4.
Am J Hum Genet ; 111(8): 1524-1543, 2024 Aug 08.
Artigo em Inglês | MEDLINE | ID: mdl-39053458

RESUMO

Gene misexpression is the aberrant transcription of a gene in a context where it is usually inactive. Despite its known pathological consequences in specific rare diseases, we have a limited understanding of its wider prevalence and mechanisms in humans. To address this, we analyzed gene misexpression in 4,568 whole-blood bulk RNA sequencing samples from INTERVAL study blood donors. We found that while individual misexpression events occur rarely, in aggregate they were found in almost all samples and a third of inactive protein-coding genes. Using 2,821 paired whole-genome and RNA sequencing samples, we identified that misexpression events are enriched in cis for rare structural variants. We established putative mechanisms through which a subset of SVs lead to gene misexpression, including transcriptional readthrough, transcript fusions, and gene inversion. Overall, we develop misexpression as a type of transcriptomic outlier analysis and extend our understanding of the variety of mechanisms by which genetic variants can influence gene expression.


Assuntos
Regulação da Expressão Gênica , Humanos , Análise de Sequência de RNA , Variação Genética , Variação Estrutural do Genoma/genética , Transcriptoma/genética , Doadores de Sangue
5.
Annu Rev Genomics Hum Genet ; 24: 109-132, 2023 08 25.
Artigo em Inglês | MEDLINE | ID: mdl-37075062

RESUMO

DNA sequencing has revolutionized medicine over recent decades. However, analysis of large structural variation and repetitive DNA, a hallmark of human genomes, has been limited by short-read technology, with read lengths of 100-300 bp. Long-read sequencing (LRS) permits routine sequencing of human DNA fragments tens to hundreds of kilobase pairs in size, using both real-time sequencing by synthesis and nanopore-based direct electronic sequencing. LRS permits analysis of large structural variation and haplotypic phasing in human genomes and has enabled the discovery and characterization of rare pathogenic structural variants and repeat expansions. It has also recently enabled the assembly of a complete, gapless human genome that includes previously intractable regions, such as highly repetitive centromeres and homologous acrocentric short arms. With the addition of protocols for targeted enrichment, direct epigenetic DNA modification detection, and long-range chromatin profiling, LRS promises to launch a new era of understanding of genetic diversity and pathogenic mutations in human populations.


Assuntos
DNA , Sequências Repetitivas de Ácido Nucleico , Humanos , Análise de Sequência de DNA/métodos , Sequência de Bases , Mutação , DNA/genética
6.
Am J Hum Genet ; 110(8): 1343-1355, 2023 08 03.
Artigo em Inglês | MEDLINE | ID: mdl-37541188

RESUMO

Despite significant progress in unraveling the genetic causes of neurodevelopmental disorders (NDDs), a substantial proportion of individuals with NDDs remain without a genetic diagnosis after microarray and/or exome sequencing. Here, we aimed to assess the power of short-read genome sequencing (GS), complemented with long-read GS, to identify causal variants in participants with NDD from the National Institute for Health and Care Research (NIHR) BioResource project. Short-read GS was conducted on 692 individuals (489 affected and 203 unaffected relatives) from 465 families. Additionally, long-read GS was performed on five affected individuals who had structural variants (SVs) in technically challenging regions, had complex SVs, or required distal variant phasing. Causal variants were identified in 36% of affected individuals (177/489), and a further 23% (112/489) had a variant of uncertain significance after multiple rounds of re-analysis. Among all reported variants, 88% (333/380) were coding nuclear SNVs or insertions and deletions (indels), and the remainder were SVs, non-coding variants, and mitochondrial variants. Furthermore, long-read GS facilitated the resolution of challenging SVs and invalidated variants of difficult interpretation from short-read GS. This study demonstrates the value of short-read GS, complemented with long-read GS, in investigating the genetic causes of NDDs. GS provides a comprehensive and unbiased method of identifying all types of variants throughout the nuclear and mitochondrial genomes in individuals with NDD.


Assuntos
Genoma Humano , Transtornos do Neurodesenvolvimento , Humanos , Genoma Humano/genética , Mapeamento Cromossômico , Sequência de Bases , Mutação INDEL , Transtornos do Neurodesenvolvimento/genética
7.
Plant J ; 117(2): 342-363, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-37831618

RESUMO

Attenuated strains of the naturally occurring plant pathogen Agrobacterium tumefaciens can transfer virtually any DNA sequence of interest to model plants and crops. This has made Agrobacterium-mediated transformation (AMT) one of the most commonly used tools in agricultural biotechnology. Understanding AMT, and its functional consequences, is of fundamental importance given that it sits at the intersection of many fundamental fields of study, including plant-microbe interactions, DNA repair/genome stability, and epigenetic regulation of gene expression. Despite extensive research and use of AMT over the last 40 years, the extent of genomic disruption associated with integrating exogenous DNA into plant genomes using this method remains underappreciated. However, new technologies like long-read sequencing make this disruption more apparent, complementing previous findings from multiple research groups that have tackled this question in the past. In this review, we cover progress on the molecular mechanisms involved in Agrobacterium-mediated DNA integration into plant genomes. We also discuss localized mutations at the site of insertion and describe the structure of these DNA insertions, which can range from single copy insertions to large concatemers, consisting of complex DNA originating from different sources. Finally, we discuss the prevalence of large-scale genomic rearrangements associated with the integration of DNA during AMT with examples. Understanding the intended and unintended effects of AMT on genome stability is critical to all plant researchers who use this methodology to generate new genetic variants.


Assuntos
Epigênese Genética , Plantas , Plantas/genética , Plantas/microbiologia , Agrobacterium tumefaciens/genética , Genômica , DNA , Instabilidade Genômica/genética , Transformação Genética , DNA Bacteriano/genética , Plantas Geneticamente Modificadas/genética
8.
Am J Hum Genet ; 109(4): 647-668, 2022 04 07.
Artigo em Inglês | MEDLINE | ID: mdl-35240056

RESUMO

The impact of copy-number variations (CNVs) on complex human traits remains understudied. We called CNVs in 331,522 UK Biobank participants and performed genome-wide association studies (GWASs) between the copy number of CNV-proxy probes and 57 continuous traits, revealing 131 signals spanning 47 phenotypes. Our analysis recapitulated well-known associations (e.g., 1q21 and height), revealed the pleiotropy of recurrent CNVs (e.g., 26 and 16 traits for 16p11.2-BP4-BP5 and 22q11.21, respectively), and suggested gene functionalities (e.g., MARF1 in female reproduction). Forty-eight CNV signals (38%) overlapped with single-nucleotide polymorphism (SNP)-GWASs signals for the same trait. For instance, deletion of PDZK1, which encodes a urate transporter scaffold protein, decreased serum urate levels, while deletion of RHD, which encodes the Rhesus blood group D antigen, associated with hematological traits. Other signals overlapped Mendelian disorder regions, suggesting variable expressivity and broad impact of these loci, as illustrated by signals mapping to Rotor syndrome (SLCO1B1/3), renal cysts and diabetes syndrome (HNF1B), or Charcot-Marie-Tooth (PMP22) loci. Total CNV burden negatively impacted 35 traits, leading to increased adiposity, liver/kidney damage, and decreased intelligence and physical capacity. Thirty traits remained burden associated after correcting for CNV-GWAS signals, pointing to a polygenic CNV architecture. The burden negatively correlated with socio-economic indicators, parental lifespan, and age (survivorship proxy), suggesting a contribution to decreased longevity. Together, our results showcase how studying CNVs can expand biological insights, emphasizing the critical role of this mutational class in shaping human traits and arguing in favor of a continuum between Mendelian and complex diseases.


Assuntos
Variações do Número de Cópias de DNA , Estudo de Associação Genômica Ampla , Variações do Número de Cópias de DNA/genética , Feminino , Humanos , Transportador 1 de Ânion Orgânico Específico do Fígado , Herança Multifatorial , Fenótipo , Polimorfismo de Nucleotídeo Único/genética
9.
Am J Hum Genet ; 109(8): 1353-1365, 2022 08 04.
Artigo em Inglês | MEDLINE | ID: mdl-35931048

RESUMO

Copy-number variants and structural variants (CNVs/SVs) drive many neurodevelopmental-related disorders. While many neurodevelopmental-related CNVs/SVs give rise to complex phenotypes, the overlap in phenotypic presentation between independent CNVs can be extensive and provides a motivation for shared approaches. This confluence at the level of clinical phenotype implies convergence in at least some aspects of the underlying genomic mechanisms. With this perspective, our Commission on Novel Technologies for Neurodevelopmental CNVs asserts that the time has arrived to approach neurodevelopmental-related CNVs/SVs as a class of disorders that can be identified, investigated, and treated on the basis of shared mechanisms and/or pathways (e.g., molecular, neurological, or developmental). To identify common etiologic mechanisms among uncommon neurodevelopmental-related disorders and to potentially identify common therapies, it is paramount for teams of scientists, clinicians, and patients to unite their efforts. We bring forward novel, collaborative, and integrative strategies to translational CNV/SV research that engages diverse stakeholders to help expedite therapeutic outcomes. We articulate a clear vision for piloted roadmap strategies to reduce patient/caregiver burden and redundancies, increase efficiency, avoid siloed data, and accelerate translational discovery across CNV/SV-based syndromes.


Assuntos
Transtornos do Neurodesenvolvimento , Defesa do Paciente , Variações do Número de Cópias de DNA/genética , Genoma , Humanos , Transtornos do Neurodesenvolvimento/genética , Transtornos do Neurodesenvolvimento/terapia , Fenótipo
10.
Brief Bioinform ; 24(2)2023 03 19.
Artigo em Inglês | MEDLINE | ID: mdl-36869850

RESUMO

Alignment is the cornerstone of many long-read pipelines and plays an essential role in resolving structural variants (SVs). However, forced alignments of SVs embedded in long reads, inflexibility of integrating novel SVs models and computational inefficiency remain problems. Here, we investigate the feasibility of resolving long-read SVs with alignment-free algorithms. We ask: (1) Is it possible to resolve long-read SVs with alignment-free approaches? and (2) Does it provide an advantage over existing approaches? To this end, we implemented the framework named Linear, which can flexibly integrate alignment-free algorithms such as the generative model for long-read SV detection. Furthermore, Linear addresses the problem of compatibility of alignment-free approaches with existing software. It takes as input long reads and outputs standardized results existing software can directly process. We conducted large-scale assessments in this work and the results show that the sensitivity, and flexibility of Linear outperform alignment-based pipelines. Moreover, the computational efficiency is orders of magnitude faster.


Assuntos
Genoma Humano , Software , Humanos , Algoritmos , Análise de Sequência , Modelos Estatísticos , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala
11.
Mol Syst Biol ; 20(4): 362-373, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38355920

RESUMO

Unraveling the genetic sources of gene expression variation is essential to better understand the origins of phenotypic diversity in natural populations. Genome-wide association studies identified thousands of variants involved in gene expression variation, however, variants detected only explain part of the heritability. In fact, variants such as low-frequency and structural variants (SVs) are poorly captured in association studies. To assess the impact of these variants on gene expression variation, we explored a half-diallel panel composed of 323 hybrids originated from pairwise crosses of 26 natural Saccharomyces cerevisiae isolates. Using short- and long-read sequencing strategies, we established an exhaustive catalog of single nucleotide polymorphisms (SNPs) and SVs for this panel. Combining this dataset with the transcriptomes of all hybrids, we comprehensively mapped SNPs and SVs associated with gene expression variation. While SVs impact gene expression variation, SNPs exhibit a higher effect size with an overrepresentation of low-frequency variants compared to common ones. These results reinforce the importance of dissecting the heritability of complex traits with a comprehensive catalog of genetic variants at the population level.


Assuntos
Estudo de Associação Genômica Ampla , Saccharomyces cerevisiae , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Expressão Gênica , Polimorfismo de Nucleotídeo Único/genética , Variação Genética
12.
Mol Ther ; 32(5): 1298-1310, 2024 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-38459694

RESUMO

Undesired on- and off-target effects of CRISPR-Cas nucleases remain a challenge in genome editing. While the use of Cas9 nickases has been shown to minimize off-target mutagenesis, their use in therapeutic genome editing has been hampered by a lack of efficacy. To overcome this limitation, we and others have developed double-nickase-based strategies to generate staggered DNA double-strand breaks to mediate gene disruption or gene correction with high efficiency. However, the impact of paired single-strand nicks on genome integrity has remained largely unexplored. Here, we developed a novel CAST-seq pipeline, dual CAST, to characterize chromosomal aberrations induced by paired CRISPR-Cas9 nickases at three different loci in primary keratinocytes derived from patients with epidermolysis bullosa. While targeting COL7A1, COL17A1, or LAMA3 with Cas9 nucleases caused previously undescribed chromosomal rearrangements, no chromosomal translocations were detected following paired-nickase editing. While the double-nicking strategy induced large deletions/inversions within a 10 kb region surrounding the target sites at all three loci, similar to the nucleases, the chromosomal on-target aberrations were qualitatively different and included a high proportion of insertions. Taken together, our data indicate that double-nickase approaches combine efficient editing with greatly reduced off-target effects but still leave substantial chromosomal aberrations at on-target sites.


Assuntos
Sistemas CRISPR-Cas , Desoxirribonuclease I , Edição de Genes , Queratinócitos , Humanos , Edição de Genes/métodos , Desoxirribonuclease I/metabolismo , Desoxirribonuclease I/genética , Queratinócitos/metabolismo , Quebras de DNA de Cadeia Dupla , Aberrações Cromossômicas , Colágeno Tipo VII/genética , Colágeno Tipo VII/metabolismo , Células Cultivadas
13.
Proc Natl Acad Sci U S A ; 119(23): e2121469119, 2022 06 07.
Artigo em Inglês | MEDLINE | ID: mdl-35658077

RESUMO

Recent studies have revealed a surprising diversity of sex chromosomes in vertebrates. However, the detailed mechanism of their turnover is still elusive. To understand this process, it is necessary to compare closely related species in terms of sex-determining genes and the chromosomes harboring them. Here, we explored the genus Takifugu, in which one strong candidate sex-determining gene, Amhr2, has been identified. To trace the processes involved in transitions in the sex-determination system in this genus, we studied 12 species and found that while the Amhr2 locus likely determines sex in the majority of Takifugu species, three species have acquired sex-determining loci at different chromosomal locations. Nevertheless, the generation of genome assemblies for the three species revealed that they share a portion of the male-specific supergene that contains a candidate sex-determining gene, GsdfY, along with genes that potentially play a role in male fitness. The shared supergene spans ∼100 kb and is flanked by two duplicated regions characterized by CACTA transposable elements. These results suggest that the shared supergene has taken over the role of sex-determining locus from Amhr2 in lineages leading to the three species, and repeated translocations of the supergene underlie the turnover of sex chromosomes in these lineages. These findings highlight the underestimated role of a mobile supergene in the turnover of sex chromosomes in vertebrates.


Assuntos
Processos de Determinação Sexual , Takifugu , Animais , Elementos de DNA Transponíveis/genética , Evolução Molecular , Cromossomos Sexuais/genética , Processos de Determinação Sexual/genética , Takifugu/genética , Translocação Genética
14.
Genomics ; 116(3): 110854, 2024 05.
Artigo em Inglês | MEDLINE | ID: mdl-38701989

RESUMO

Several studies demonstrated that populations living in the Tibetan plateau are genetically and physiologically adapted to high-altitude conditions, showing genomic signatures ascribable to the action of natural selection. However, so far most of them relied solely on inferences drawn from the analysis of coding variants and point mutations. To fill this gap, we focused on the possible role of polymorphic transposable elements in influencing the adaptation of Tibetan and Sherpa highlanders. To do so, we compared high-altitude and middle/low-lander individuals of East Asian ancestry by performing in silico analyses and differentiation tests on 118 modern and ancient samples. We detected several transposable elements associated with high altitude, which map genes involved in cardiovascular, hematological, chem-dependent and respiratory conditions, suggesting that metabolic and signaling pathways taking part in these functions are disproportionately impacted by the effect of environmental stressors in high-altitude individuals. To our knowledge, our study is the first hinting to a possible role of transposable elements in the adaptation of Tibetan and Sherpa highlanders.


Assuntos
Altitude , Elementos de DNA Transponíveis , Humanos , Aclimatação/genética , Adaptação Fisiológica/genética , Povo Asiático/genética , Polimorfismo Genético , Tibet
15.
Genes Chromosomes Cancer ; 63(8): e23255, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-39149945

RESUMO

Near-haploidization, that is, loss of one copy of most chromosomes, is a relatively rare phenomenon in most tumors, but is enriched among certain soft tissue sarcomas, including undifferentiated pleomorphic sarcoma (UPS). Presumably, near-haploidization can arise through many mechanisms. This study aimed to identify gene rearrangements that could cause near-haploidization. We here present two UPS in which near-haploidization was an early event, identified through single nucleotide polymorphism (SNP) array analysis. One of the cases was studied further using whole genome and transcriptome sequencing, as well as cytogenetic and molecular cytogenetic methods. Both tumors had chromosomal rearrangements in the form of copy number shifts/structural variants affecting the SMC1A gene. These findings suggest that cohesin defects could contribute to mitotic errors resulting in massive loss of chromosomes. SMC1A encodes one of the components of the cohesin multiprotein complex, which is critical for proper alignment of the sister chromatids during S-phase and separation to opposite spindle poles. Further studies should explore the role of cohesin defects in near-haploidization in other sarcomas and to clarify its role in tumor development.


Assuntos
Proteínas de Ciclo Celular , Proteínas Cromossômicas não Histona , Sarcoma , Humanos , Proteínas Cromossômicas não Histona/genética , Proteínas de Ciclo Celular/genética , Sarcoma/genética , Sarcoma/patologia , Haploidia , Polimorfismo de Nucleotídeo Único , Masculino , Feminino , Coesinas , Adulto , Pessoa de Meia-Idade
16.
Semin Cell Dev Biol ; 121: 171-185, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-34429265

RESUMO

The three-dimensional structure of the human genome has been proven to have a significant functional impact on gene expression. The high-order spatial chromatin is organised first by looping mediated by multiple protein factors, and then it is further formed into larger structures of topologically associated domains (TADs) or chromatin contact domains (CCDs), followed by A/B compartments and finally the chromosomal territories (CTs). The genetic variation observed in human population influences the multi-scale structures, posing a question regarding the functional impact of structural variants reflected by the variability of the genes expression patterns. The current methods of evaluating the functional effect include eQTLs analysis which uses statistical testing of influence of variants on spatially close genes. Rarely, non-coding DNA sequence changes are evaluated by their impact on the biomolecular interaction network (BIN) reflecting the cellular interactome that can be analysed by the classical graph-theoretic algorithms. Therefore, in the second part of the review, we introduce the concept of BIN, i.e. a meta-network model of the complete molecular interactome developed by integrating various biological networks. The BIN meta-network model includes DNA-protein binding by the plethora of protein factors as well as chromatin interactions, therefore allowing connection of genomics with the downstream biomolecular processes present in a cell. As an illustration, we scrutinise the chromatin interactions mediated by the CTCF protein detected in a ChIA-PET experiment in the human lymphoblastoid cell line GM12878. In the corresponding BIN meta-network the DNA spatial proximity is represented as a graph model, combined with the Proteins-Interaction Network (PIN) of human proteome using the Gene Association Network (GAN). Furthermore, we enriched the BIN with the signalling and metabolic pathways and Gene Ontology (GO) terms to assert its functional context. Finally, we mapped the Single Nucleotide Polymorphisms (SNPs) from the GWAS studies and identified the chromatin mutational hot-spots associated with a significant enrichment of SNPs related to autoimmune diseases. Afterwards, we mapped Structural Variants (SVs) from healthy individuals of 1000 Genomes Project and identified an interesting example of the missing protein complex associated with protein Q6GYQ0 due to a deletion on chromosome 14. Such an analysis using the meta-network BIN model is therefore helpful in evaluating the influence of genetic variation on spatial organisation of the genome and its functional effect in a cell.


Assuntos
Cromatina/metabolismo , Genoma Humano/genética , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Mapas de Interação de Proteínas/genética , Humanos
17.
BMC Genomics ; 25(1): 903, 2024 Sep 30.
Artigo em Inglês | MEDLINE | ID: mdl-39350025

RESUMO

BACKGROUND: Structural variants (SVs) such as deletions, duplications, and insertions are known to contribute to phenotypic variation but remain challenging to identify and genotype. A more complete, accessible, and assessable collection of SVs will assist efforts to study SV function in cattle and to incorporate SV genotyping into animal evaluation. RESULTS: In this work we produced a large and deeply characterized collection of SVs in Holstein cattle using two popular SV callers (Manta and Smoove) and publicly available Illumina whole-genome sequence (WGS) read sets from 310 samples (290 male, 20 female, mean 20X coverage). Manta and Smoove identified 31 K and 68 K SVs, respectively. In total the SVs cover 5% (Manta) and 6% (Smoove) of the reference genome, in contrast to the 1% impacted by SNPs and indels. SV genotypes from each caller were confirmed to accurately recapitulate animal relationships estimated using WGS SNP genotypes from the same dataset, with Manta genotypes outperforming Smoove, and deletions outperforming duplications. To support efforts to link the SVs to phenotypic variation, overlapping and tag SNPs were identified for each SV, using genotype sets extracted from the WGS results corresponding to two bovine SNP chips (BovineSNP50 and BovineHD). 9% (Manta) and 11% (Smoove) of the SVs were found to have overlapping BovineHD panel SNPs, while 21% (Manta) and 9% (Smoove) have BovineHD panel tag SNPs. A custom interactive database ( https://svdb-dc.pslab.ca ) containing the identified sequence variants with extensive annotations, gene feature information, and BAM file content for all SVs was created to enable the evaluation and prioritization of SVs for further study. Illustrative examples involving the genes POPDC3, ORM1, G2E3, FANCI, TFB1M, FOXC2, N4BP2, GSTA3, and COPA show how this resource can be used to find well-supported genic SVs, determine SV breakpoints, design genotyping approaches, and identify processed pseudogenes masquerading as deletions. CONCLUSIONS: The resources developed through this study can be used to explore sequence variation in Holstein cattle and to develop strategies for studying SVs of interest. The lack of overlapping and tag SNPs from commonly used SNP chips for most of the SVs suggests that other genotyping approaches will be needed (for example direct genotyping) to understand their potential contributions to phenotype. The included SV genotype assessments point to challenges in characterizing SVs, especially duplications, using short-read data and support ongoing efforts to better characterize cattle genomes through long-read sequencing. Lastly, the identification of previously known functional SVs and additional CDS-overlapping SVs supports the phenotypic relevance of this dataset.


Assuntos
Genótipo , Polimorfismo de Nucleotídeo Único , Animais , Bovinos/genética , Feminino , Sequenciamento Completo do Genoma , Masculino , Variação Estrutural do Genoma , Bases de Dados Genéticas , Fenótipo , Genoma , Genômica/métodos
18.
BMC Genomics ; 25(1): 980, 2024 Oct 19.
Artigo em Inglês | MEDLINE | ID: mdl-39425080

RESUMO

BACKGROUND: Certain structural variants (SVs) including large-scale genetic copy number variants, as well as copy number-neutral inversions and translocations may not all be resolved by chromosome karyotype studies. The identification of genetic risk factors for Parkinson's disease (PD) has been primarily focused on the gene-disruptive single nucleotide variants. In contrast, larger SVs, which may significantly influence human phenotypes, have been largely underexplored. Optical genomic mapping (OGM) represents a novel approach that offers greater sensitivity and resolution for detecting SVs. In this study, we used induced pluripotent stem cell (iPSC) lines of patients with PD-linked SNCA and PRKN variants as a proof of concept to (i) show the detection of pathogenic SVs in PD with OGM and (ii) provide a comprehensive screening of genetic abnormalities in iPSCs. RESULTS: OGM detected SNCA gene triplication and duplication in patient-derived iPSC lines, which were not identified by long-read sequencing. Additionally, various exon deletions were confirmed by OGM in the PRKN gene of iPSCs, of which exon 3-5 and exon 2 deletions were unable to phase with conventional multiplex-ligation-dependent probe amplification. In terms of chromosomal abnormalities in iPSCs, no gene fusions, no aneuploidy but two balanced inter-chromosomal translocations were detected in one line that were absent in the parental fibroblasts and not identified by routine single nucleotide variant karyotyping. CONCLUSIONS: In summary, OGM can detect pathogenic SVs in PD-linked genes as well as reveal genomic abnormalities for iPSCs that were not identified by other techniques, which is supportive for OGM's future use in gene discovery and iPSC line screening.


Assuntos
Células-Tronco Pluripotentes Induzidas , Doença de Parkinson , alfa-Sinucleína , Humanos , Células-Tronco Pluripotentes Induzidas/metabolismo , Doença de Parkinson/genética , Doença de Parkinson/patologia , alfa-Sinucleína/genética , Mapeamento Cromossômico , Variação Estrutural do Genoma , Variações do Número de Cópias de DNA , Linhagem Celular
19.
BMC Genomics ; 25(1): 54, 2024 Jan 11.
Artigo em Inglês | MEDLINE | ID: mdl-38212678

RESUMO

BACKGROUND: Feeding costs represent the largest expenditures in beef production. Therefore, the animal efficiency in converting feed in high-quality protein for human consumption plays a major role in the environmental impact of the beef industry and in the beef producers' profitability. In this context, breeding animals for improved feed efficiency through genomic selection has been considered as a strategic practice in modern breeding programs around the world. Copy number variation (CNV) is a less-studied source of genetic variation that can contribute to phenotypic variability in complex traits. In this context, this study aimed to: (1) identify CNV and CNV regions (CNVRs) in the genome of Nellore cattle (Bos taurus indicus); (2) assess potential associations between the identified CNVR and weaning weight (W210), body weight measured at the time of selection (WSel), average daily gain (ADG), dry matter intake (DMI), residual feed intake (RFI), time spent at the feed bunk (TF), and frequency of visits to the feed bunk (FF); and, (3) perform functional enrichment analyses of the significant CNVR identified for each of the traits evaluated. RESULTS: A total of 3,161 CNVs and 561 CNVRs ranging from 4,973 bp to 3,215,394 bp were identified. The CNVRs covered up to 99,221,894 bp (3.99%) of the Nellore autosomal genome. Seventeen CNVR were significantly associated with dry matter intake and feeding frequency (number of daily visits to the feed bunk). The functional annotation of the associated CNVRs revealed important candidate genes related to metabolism that may be associated with the phenotypic expression of the evaluated traits. Furthermore, Gene Ontology (GO) analyses revealed 19 enrichment processes associated with FF. CONCLUSIONS: A total of 3,161 CNVs and 561 CNVRs were identified and characterized in a Nellore cattle population. Various CNVRs were significantly associated with DMI and FF, indicating that CNVs play an important role in key biological pathways and in the phenotypic expression of feeding behavior and growth traits in Nellore cattle.


Assuntos
Variações do Número de Cópias de DNA , Estudo de Associação Genômica Ampla , Humanos , Bovinos/genética , Animais , Fenótipo , Ingestão de Alimentos/genética , Comportamento Alimentar , Ração Animal/análise
20.
BMC Genomics ; 25(1): 898, 2024 Sep 30.
Artigo em Inglês | MEDLINE | ID: mdl-39350042

RESUMO

BACKGROUND: Lung cancer is a heterogeneous disease and the primary cause of cancer-related mortality worldwide. Somatic mutations, including large structural variants, are important biomarkers in lung cancer for selecting targeted therapy. Genomic studies in lung cancer have been conducted using short-read sequencing. Emerging long-read sequencing technologies are a promising alternative to study somatic structural variants, however there is no current consensus on how to process data and call somatic events. In this study, we preformed whole genome sequencing of lung cancer and matched non-tumour samples using long and short read sequencing to comprehensively benchmark three sequence aligners and seven structural variant callers comprised of generic callers (SVIM, Sniffles2, DELLY in generic mode and cuteSV) and somatic callers (Severus, SAVANA, nanomonsv and DELLY in somatic modes). RESULTS: Different combinations of aligners and variant callers influenced somatic structural variant detection. The choice of caller had a significant influence on somatic structural variant detection in terms of variant type, size, sensitivity, and accuracy. The performance of each variant caller was assessed by comparing to somatic structural variants identified by short-read sequencing. When compared to somatic structural variants detected with short-read sequencing, more events were detected with long-read sequencing. The mean recall of somatic variant events identified by long-read sequencing was higher for the somatic callers (72%) than generic callers (53%). Among the somatic callers when using the minimap2 aligner, SAVANA and Severus achieved the highest recall at 79.5% and 79.25% respectively, followed by nanomonsv with a recall of 72.5%. CONCLUSION: Long-read sequencing can identify somatic structural variants in clincal samples. The longer reads have the potential to improve our understanding of cancer development and inform personalized cancer treatment.


Assuntos
Neoplasias Pulmonares , Sequenciamento por Nanoporos , Neoplasias Pulmonares/genética , Humanos , Sequenciamento por Nanoporos/métodos , Mutação , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequenciamento Completo do Genoma/métodos
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa