Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 44
Filtrar
1.
NAR Genom Bioinform ; 6(2): lqae031, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38666213

RESUMO

DNA variation analysis has become indispensable in many aspects of modern biomedicine, most prominently in the comparison of normal and tumor samples. Thousands of samples are collected in local sequencing efforts and public databases requiring highly scalable, portable, and automated workflows for streamlined processing. Here, we present nf-core/sarek 3, a well-established, comprehensive variant calling and annotation pipeline for germline and somatic samples. It is suitable for any genome with a known reference. We present a full rewrite of the original pipeline showing a significant reduction of storage requirements by using the CRAM format and runtime by increasing intra-sample parallelization. Both are leading to a 70% cost reduction in commercial clouds enabling users to do large-scale and cross-platform data analysis while keeping costs and CO2 emissions low. The code is available at https://nf-co.re/sarek.

2.
N Biotechnol ; 77: 1-11, 2023 Nov 25.
Artigo em Inglês | MEDLINE | ID: mdl-37329982

RESUMO

Deep learning has already revolutionised the way a wide range of data is processed in many areas of daily life. The ability to learn abstractions and relationships from heterogeneous data has provided impressively accurate prediction and classification tools to handle increasingly big datasets. This has a significant impact on the growing wealth of omics datasets, with the unprecedented opportunity for a better understanding of the complexity of living organisms. While this revolution is transforming the way these data are analyzed, explainable deep learning is emerging as an additional tool with the potential to change the way biological data is interpreted. Explainability addresses critical issues such as transparency, so important when computational tools are introduced especially in clinical environments. Moreover, it empowers artificial intelligence with the capability to provide new insights into the input data, thus adding an element of discovery to these already powerful resources. In this review, we provide an overview of the transformative effects explainable deep learning is having on multiple sectors, ranging from genome engineering and genomics, from radiomics to drug design and clinical trials. We offer a perspective to life scientists, to better understand the potential of these tools, and a motivation to implement them in their research, by suggesting learning resources they can use to move their first steps in this field.


Assuntos
Inteligência Artificial , Aprendizado Profundo , Desenho de Fármacos
3.
Int J Mol Sci ; 23(23)2022 Nov 22.
Artigo em Inglês | MEDLINE | ID: mdl-36498841

RESUMO

Horizontal gene transfer (HGT) is well described in prokaryotes: it plays a crucial role in evolution, and has functional consequences in insects and plants. However, less is known about HGT in humans. Studies have reported bacterial integrations in cancer patients, and microbial sequences have been detected in data from well-known human sequencing projects. Few of the existing tools for investigating HGT are highly automated. Thanks to the adoption of Nextflow for life sciences workflows, and to the standards and best practices curated by communities such as nf-core, fully automated, portable, and scalable pipelines can now be developed. Here we present nf-core/hgtseq to facilitate the analysis of HGT from sequencing data in different organisms. We showcase its performance by analysing six exome datasets from five mammals. Hgtseq can be run seamlessly in any computing environment and accepts data generated by existing exome and whole-genome sequencing projects; this will enable researchers to expand their analyses into this area. Fundamental questions are still open about the mechanisms and the extent or role of horizontal gene transfer: by releasing hgtseq we provide a standardised tool which will enable a systematic investigation of this phenomenon, thus paving the way for a better understanding of HGT.


Assuntos
Evolução Molecular , Transferência Genética Horizontal , Animais , Humanos , Células Procarióticas , Bactérias/genética , Sequência de Bases , Filogenia , Mamíferos/genética
4.
Mol Biol Evol ; 39(8)2022 08 03.
Artigo em Inglês | MEDLINE | ID: mdl-35881460

RESUMO

Centromeres are epigenetically specified by the histone H3 variant CENP-A and typically associated with highly repetitive satellite DNA. We previously discovered natural satellite-free neocentromeres in Equus caballus and Equus asinus. Here, through ChIP-seq with an anti-CENP-A antibody, we found an extraordinarily high number of centromeres lacking satellite DNA in the zebras Equus burchelli (15 of 22) and Equus grevyi (13 of 23), demonstrating that the absence of satellite DNA at the majority of centromeres is compatible with genome stability and species survival and challenging the role of satellite DNA in centromere function. Nine satellite-free centromeres are shared between the two species in agreement with their recent separation. We assembled all centromeric regions and improved the reference genome of E. burchelli. Sequence analysis of the CENP-A binding domains revealed that they are LINE-1 and AT-rich with four of them showing DNA amplification. In the two zebras, satellite-free centromeres emerged from centromere repositioning or following Robertsonian fusion. In five chromosomes, the centromeric function arose near the fusion points, which are located within regions marked by traces of ancestral pericentromeric sequences. Therefore, besides centromere repositioning, Robertsonian fusions are an important source of satellite-free centromeres during evolution. Finally, in one case, a satellite-free centromere was seeded on an inversion breakpoint. At 11 chromosomes, whose primary constrictions seemed to be associated with satellite repeats by cytogenetic analysis, satellite-free neocentromeres were instead located near the ancestral inactivated satellite-based centromeres; therefore, the centromeric function has shifted away from a satellite repeat containing locus to a satellite-free new position.


Assuntos
Centrômero , DNA Satélite , Animais , Centrômero/genética , Centrômero/metabolismo , Proteína Centromérica A/genética , DNA Satélite/genética , Histonas/metabolismo , Cavalos/genética
5.
Nature ; 604(7906): 509-516, 2022 04.
Artigo em Inglês | MEDLINE | ID: mdl-35396579

RESUMO

Rare coding variation has historically provided the most direct connections between gene function and disease pathogenesis. By meta-analysing the whole exomes of 24,248 schizophrenia cases and 97,322 controls, we implicate ultra-rare coding variants (URVs) in 10 genes as conferring substantial risk for schizophrenia (odds ratios of 3-50, P < 2.14 × 10-6) and 32 genes at a false discovery rate of <5%. These genes have the greatest expression in central nervous system neurons and have diverse molecular functions that include the formation, structure and function of the synapse. The associations of the NMDA (N-methyl-D-aspartate) receptor subunit GRIN2A and AMPA (α-amino-3-hydroxy-5-methyl-4-isoxazole propionic acid) receptor subunit GRIA3 provide support for dysfunction of the glutamatergic system as a mechanistic hypothesis in the pathogenesis of schizophrenia. We observe an overlap of rare variant risk among schizophrenia, autism spectrum disorders1, epilepsy and severe neurodevelopmental disorders2, although different mutation types are implicated in some shared genes. Most genes described here, however, are not implicated in neurodevelopment. We demonstrate that genes prioritized from common variant analyses of schizophrenia are enriched in rare variant risk3, suggesting that common and rare genetic risk factors converge at least partially on the same underlying pathogenic biological processes. Even after excluding significantly associated genes, schizophrenia cases still carry a substantial excess of URVs, which indicates that more risk genes await discovery using this approach.


Assuntos
Mutação , Transtornos do Neurodesenvolvimento , Esquizofrenia , Estudos de Casos e Controles , Exoma , Predisposição Genética para Doença/genética , Humanos , Transtornos do Neurodesenvolvimento/genética , Receptores de N-Metil-D-Aspartato/genética , Esquizofrenia/genética
6.
Front Genet ; 12: 689824, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34178042

RESUMO

BACKGROUND: Aging is a complex phenotype influenced by a combination of genetic and environmental factors. Although many studies addressed its cellular and physiological age-related changes, the molecular causes of aging remain undetermined. Considering the biological complexity and heterogeneity of the aging process, it is now clear that full understanding of mechanisms underlying aging can only be achieved through the integration of different data types and sources, and with new computational methods capable to achieve such integration. RECENT ADVANCES: In this review, we show that an omics vision of the age-dependent changes occurring as the individual ages can provide researchers with new opportunities to understand the mechanisms of aging. Combining results from single-cell analysis with systems biology tools would allow building interaction networks and investigate how these networks are perturbed during aging and disease. The development of high-throughput technologies such as next-generation sequencing, proteomics, metabolomics, able to investigate different biological markers and to monitor them simultaneously during the aging process with high accuracy and specificity, represents a unique opportunity offered to biogerontologists today. CRITICAL ISSUES: Although the capacity to produce big data drastically increased over the years, integration, interpretation and sharing of high-throughput data remain major challenges. In this paper we present a survey of the emerging omics approaches in aging research and provide a large collection of datasets and databases as a useful resource for the scientific community to identify causes of aging. We discuss their peculiarities, emphasizing the need for the development of methods focused on the integration of different data types. FUTURE DIRECTIONS: We critically review the contribution of bioinformatics into the omics of aging research, and we propose a few recommendations to boost collaborations and produce new insights. We believe that significant advancements can be achieved by following major developments in bioinformatics, investing in diversity, data sharing and community-driven portable bioinformatics methods. We also argue in favor of more engagement and participation, and we highlight the benefits of new collaborations along these lines. This review aims at being a useful resource for many researchers in the field, and a call for new partnerships in aging research.

7.
Nat Neurosci ; 22(12): 1961-1965, 2019 12.
Artigo em Inglês | MEDLINE | ID: mdl-31768057

RESUMO

The exome sequences of approximately 8,000 children with autism spectrum disorder (ASD) and/or attention deficit hyperactivity disorder (ADHD) and 5,000 controls were analyzed, finding that individuals with ASD and individuals with ADHD had a similar burden of rare protein-truncating variants in evolutionarily constrained genes, both significantly higher than controls. This motivated a combined analysis across ASD and ADHD, identifying microtubule-associated protein 1A (MAP1A) as a new exome-wide significant gene conferring risk for childhood psychiatric disorders.


Assuntos
Transtorno do Deficit de Atenção com Hiperatividade/genética , Transtorno do Espectro Autista/genética , Predisposição Genética para Doença/genética , Variação Genética/genética , Proteínas Associadas aos Microtúbulos/genética , Transtorno do Deficit de Atenção com Hiperatividade/complicações , Transtorno do Espectro Autista/complicações , Estudos de Casos e Controles , Exoma/genética , Feminino , Humanos , Masculino
8.
Am J Hum Genet ; 102(6): 1204-1211, 2018 06 07.
Artigo em Inglês | MEDLINE | ID: mdl-29861106

RESUMO

There is a limited understanding about the impact of rare protein-truncating variants across multiple phenotypes. We explore the impact of this class of variants on 13 quantitative traits and 10 diseases using whole-exome sequencing data from 100,296 individuals. Protein-truncating variants in genes intolerant to this class of mutations increased risk of autism, schizophrenia, bipolar disorder, intellectual disability, and ADHD. In individuals without these disorders, there was an association with shorter height, lower education, increased hospitalization, and reduced age at enrollment. Gene sets implicated from GWASs did not show a significant protein-truncating variants burden beyond what was captured by established Mendelian genes. In conclusion, we provide a thorough investigation of the impact of rare deleterious coding variants on complex traits, suggesting widespread pleiotropic risk.


Assuntos
Mutação/genética , Fases de Leitura Aberta/genética , Bases de Dados Genéticas , Etnicidade/genética , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Humanos , Fenótipo , Proteínas/genética
10.
Nature ; 548(7665): 87-91, 2017 08 03.
Artigo em Inglês | MEDLINE | ID: mdl-28746312

RESUMO

Hundreds of thousands of human genomes are now being sequenced to characterize genetic variation and use this information to augment association mapping studies of complex disorders and other phenotypic traits. Genetic variation is identified mainly by mapping short reads to the reference genome or by performing local assembly. However, these approaches are biased against discovery of structural variants and variation in the more complex parts of the genome. Hence, large-scale de novo assembly is needed. Here we show that it is possible to construct excellent de novo assemblies from high-coverage sequencing with mate-pair libraries extending up to 20 kilobases. We report de novo assemblies of 150 individuals (50 trios) from the GenomeDenmark project. The quality of these assemblies is similar to those obtained using the more expensive long-read technology. We use the assemblies to identify a rich set of structural variants including many novel insertions and demonstrate how this variant catalogue enables further deciphering of known association mapping signals. We leverage the assemblies to provide 100 completely resolved major histocompatibility complex haplotypes and to resolve major parts of the Y chromosome. Our study provides a regional reference genome that we expect will improve the power of future association mapping studies and hence pave the way for precision medicine initiatives, which now are being launched in many countries including Denmark.


Assuntos
Variação Genética/genética , Genética Populacional/normas , Genoma Humano/genética , Genômica/normas , Análise de Sequência de DNA/normas , Adulto , Alelos , Criança , Cromossomos Humanos Y/genética , Dinamarca , Feminino , Haplótipos/genética , Humanos , Complexo Principal de Histocompatibilidade/genética , Masculino , Idade Materna , Taxa de Mutação , Idade Paterna , Mutação Puntual/genética , Padrões de Referência
13.
Am J Med Genet B Neuropsychiatr Genet ; 171(8): 1013-1022, 2016 12.
Artigo em Inglês | MEDLINE | ID: mdl-27255576

RESUMO

The demographic history of the isolated population of the Faroe Islands may have induced enrichment of variants rarely seen in outbred European populations, including enrichment of risk variants for panic disorder (PD). PD is a common mental disorder, characterized by recurring and unprovoked panic attacks, and genetic factors have been estimated to explain around 40% of the risk. In this study the potential enrichment of PD risk variants was explored based on whole-exome sequencing of 54 patients with PD and 211 control individuals from the Faroese population. No genome-wide significant associations were found, however several single variants and genes showed strong association with PD, where DGKH was found to be the strongest PD associated gene. Interestingly DGKH has previously demonstrated genome-wide significant association with bipolar disorder as well as evidence of association to other mental disorders. Additionally, we found an enrichment of PD risk variants in the Faroese population; variants with otherwise low frequency in more outbreed European populations. © 2016 Wiley Periodicals, Inc.


Assuntos
Diacilglicerol Quinase/genética , Transtorno de Pânico/genética , Adulto , Dinamarca , Diacilglicerol Quinase/metabolismo , Etnicidade/genética , Exoma , Feminino , Frequência do Gene/genética , Predisposição Genética para Doença/genética , Variação Genética/genética , Estudo de Associação Genômica Ampla/métodos , Haplótipos/genética , Humanos , Masculino , Transtorno de Pânico/psicologia , Polimorfismo de Nucleotídeo Único/genética , Fatores de Risco , População Branca/genética
15.
PLoS One ; 11(4): e0153253, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27089011

RESUMO

Stored neonatal dried blood spot (DBS) samples from neonatal screening programmes are a valuable diagnostic and research resource. Combined with information from national health registries they can be used in population-based studies of genetic diseases. DNA extracted from neonatal DBSs can be amplified to obtain micrograms of an otherwise limited resource, referred to as whole-genome amplified DNA (wgaDNA). Here we investigate the robustness of exome sequencing of wgaDNA of neonatal DBS samples. We conducted three pilot studies of seven, eight and seven subjects, respectively. For each subject we analysed a neonatal DBS sample and corresponding adult whole-blood (WB) reference sample. Different DNA sample types were prepared for each of the subjects. Pilot 1: wgaDNA of 2x3.2mm neonatal DBSs (DBS_2x3.2) and raw DNA extract of the WB reference sample (WB_ref). Pilot 2: DBS_2x3.2, WB_ref and a WB_ref replica sharing DNA extract with the WB_ref sample. Pilot 3: DBS_2x3.2, WB_ref, wgaDNA of 2x1.6 mm neonatal DBSs and wgaDNA of the WB reference sample. Following sequencing and data analysis, we compared pairwise variant calls to obtain a measure of similarity--the concordance rate. Concordance rates were slightly lower when comparing DBS vs WB sample types than for any two WB sample types of the same subject before filtering of the variant calls. The overall concordance rates were dependent on the variant type, with SNPs performing best. Post-filtering, the comparisons of DBS vs WB and WB vs WB sample types yielded similar concordance rates, with values close to 100%. WgaDNA of neonatal DBS samples performs with great accuracy and efficiency in exome sequencing. The wgaDNA performed similarly to matched high-quality reference--whole-blood DNA--based on concordance rates calculated from variant calls. No differences were observed substituting 2x3.2 with 2x1.6 mm discs, allowing for additional reduction of sample material in future projects.


Assuntos
Teste em Amostras de Sangue Seco , Exoma/genética , Genoma Humano , Técnicas de Amplificação de Ácido Nucleico/métodos , Adulto , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequenciamento de Nucleotídeos em Larga Escala/estatística & dados numéricos , Humanos , Recém-Nascido , Projetos Piloto , Polimorfismo de Nucleotídeo Único , Reprodutibilidade dos Testes
16.
Eur J Hum Genet ; 24(1): 135-8, 2016 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-26059840

RESUMO

Primary ovarian insufficiency (POI) is a distressing cause of infertility in young women. POI is heterogeneous with only a few causative genes having been discovered so far. Our objective was to determine the genetic cause of POI in a consanguineous Lebanese family with two affected sisters presenting with primary amenorrhoea and an absence of any pubertal development. Multipoint parametric linkage analysis was performed. Whole-exome sequencing was done on the proband. Linkage analysis identified a locus on chromosome 7 where exome sequencing successfully identified a homozygous two base pair duplication (c.1947_48dupCT), leading to a truncated protein p.(Y650Sfs*22) in the STAG3 gene, confirming it as the cause of POI in this family. Exome sequencing combined with linkage analyses offers a powerful tool to efficiently find novel genetic causes of rare, heterogeneous disorders, even in small single families. This is only the second report of a STAG3 variant; the first STAG3 variant was recently described in a phenotypically similar family with extreme POI. Identification of an additional family highlights the importance of STAG3 in POI pathogenesis and suggests it should be evaluated in families affected with POI.


Assuntos
Amenorreia/genética , Cromossomos Humanos Par 7 , Exoma , Mutação , Proteínas Nucleares/genética , Insuficiência Ovariana Primária/genética , Adolescente , Amenorreia/diagnóstico , Amenorreia/patologia , Sequência de Bases , Proteínas de Ciclo Celular , Criança , Consanguinidade , Feminino , Expressão Gênica , Ligação Genética , Homozigoto , Humanos , Dados de Sequência Molecular , Linhagem , Insuficiência Ovariana Primária/diagnóstico , Insuficiência Ovariana Primária/patologia , Análise de Sequência de DNA , Irmãos
17.
Eur J Hum Genet ; 24(2): 298-301, 2016 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-26059842

RESUMO

The success of whole-exome sequencing to identify mutations causing single-gene disorders has been well documented. In contrast whole-exome sequencing has so far had limited success in the identification of variants causing more complex phenotypes that seem unlikely to be due to the disruption of a single gene. We describe a family where two male offspring of healthy first cousin parents present a complex phenotype consisting of peripheral neuropathy and bronchiectasis that has not been described previously in the literature. Due to the fact that both children had the same problems in the context of parental consanguinity we hypothesised illness resulted from either X-linked or autosomal recessive inheritance. Through the use of whole-exome sequencing we were able to simplify this complex phenotype and identified a causative mutation (p.R1070*) in the gene periaxin (PRX), a gene previously shown to cause peripheral neuropathy (Dejerine-Sottas syndrome) when this mutation is present. For the bronchiectasis phenotype we were unable to identify a causal single mutation or compound heterozygote, reflecting the heterogeneous nature of this phenotype. In conclusion, in this study we show that whole-exome sequencing has the power to disentangle complex phenotypes through the identification of causative genetic mutations for distinct clinical disorders that were previously masked.


Assuntos
Exoma/genética , Neuropatia Hereditária Motora e Sensorial/genética , Proteínas de Membrana/genética , Doenças do Sistema Nervoso Periférico/genética , Feminino , Neuropatia Hereditária Motora e Sensorial/patologia , Heterozigoto , Humanos , Masculino , Mutação/genética , Linhagem , Doenças do Sistema Nervoso Periférico/patologia , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Análise de Sequência de DNA
19.
BMC Genomics ; 16: 548, 2015 Jul 25.
Artigo em Inglês | MEDLINE | ID: mdl-26208977

RESUMO

BACKGROUND: Massively parallel cDNA sequencing (RNA-seq) experiments are gradually superseding microarrays in quantitative gene expression profiling. However, many biologists are uncertain about the choice of differentially expressed gene (DEG) analysis methods and the validity of cost-saving sample pooling strategies for their RNA-seq experiments. Hence, we performed experimental validation of DEGs identified by Cuffdiff2, edgeR, DESeq2 and Two-stage Poisson Model (TSPM) in a RNA-seq experiment involving mice amygdalae micro-punches, using high-throughput qPCR on independent biological replicate samples. Moreover, we sequenced RNA-pools and compared their results with sequencing corresponding individual RNA samples. RESULTS: False-positivity rate of Cuffdiff2 and false-negativity rates of DESeq2 and TSPM were high. Among the four investigated DEG analysis methods, sensitivity and specificity of edgeR was relatively high. We documented the pooling bias and that the DEGs identified in pooled samples suffered low positive predictive values. CONCLUSIONS: Our results highlighted the need for combined use of more sensitive DEG analysis methods and high-throughput validation of identified DEGs in future RNA-seq experiments. They indicated limited utility of sample pooling strategies for RNA-seq in similar setups and supported increasing the number of biological replicate samples.


Assuntos
DNA Complementar/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de RNA , Animais , Camundongos , Software
20.
Nat Commun ; 6: 5969, 2015 Jan 19.
Artigo em Inglês | MEDLINE | ID: mdl-25597990

RESUMO

Building a population-specific catalogue of single nucleotide variants (SNVs), indels and structural variants (SVs) with frequencies, termed a national pan-genome, is critical for further advancing clinical and public health genetics in large cohorts. Here we report a Danish pan-genome obtained from sequencing 10 trios to high depth (50 × ). We report 536k novel SNVs and 283k novel short indels from mapping approaches and develop a population-wide de novo assembly approach to identify 132k novel indels larger than 10 nucleotides with low false discovery rates. We identify a higher proportion of indels and SVs than previous efforts showing the merits of high coverage and de novo assembly approaches. In addition, we use trio information to identify de novo mutations and use a probabilistic method to provide direct estimates of 1.27e-8 and 1.5e-9 per nucleotide per generation for SNVs and indels, respectively.


Assuntos
Genoma Humano/genética , Algoritmos , Humanos , Taxa de Mutação , Polimorfismo de Nucleotídeo Único/genética , Análise de Sequência de DNA/métodos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...