Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 28
Filtrar
1.
Mol Biol Evol ; 41(10)2024 Oct 04.
Artigo em Inglês | MEDLINE | ID: mdl-39361595

RESUMO

Ancient environmental DNA (aeDNA) is becoming a powerful tool to gain insights about past ecosystems, overcoming the limitations of conventional fossil records. However, several methodological challenges remain, particularly for classifying the DNA to species level and conducting phylogenetic analysis. Current methods, primarily tailored for modern datasets, fail to capture several idiosyncrasies of aeDNA, including species mixtures from closely related species and ancestral divergence. We introduce soibean, a novel tool that utilizes mitochondrial pangenomic graphs for identifying species from aeDNA reads. It outperforms existing methods in accurately identifying species from multiple closely related sources within a sample, enhancing phylogenetic analysis for aeDNA. soibean employs a damage-aware likelihood model for precise identification at low coverage with a high damage rate. Additionally, we reconstructed ancestral sequences for soibean's database to handle aeDNA that is highly diverged from modern references. soibean demonstrates effectiveness through simulated data tests and empirical validation. Notably, our method uncovered new empirical results in published datasets, including using porpoise whales as food in a Mesolithic community in Sweden, demonstrating its potential to reveal previously unrecognized findings in aeDNA studies.


Assuntos
DNA Antigo , Genoma Mitocondrial , Filogenia , DNA Antigo/análise , Animais , DNA Ambiental/genética , DNA Mitocondrial/genética , Fósseis
2.
Nucleic Acids Res ; 52(17): 10144-10160, 2024 Sep 23.
Artigo em Inglês | MEDLINE | ID: mdl-39175109

RESUMO

Most heritable diseases are polygenic. To comprehend the underlying genetic architecture, it is crucial to discover the clinically relevant epistatic interactions (EIs) between genomic single nucleotide polymorphisms (SNPs) (1-3). Existing statistical computational methods for EI detection are mostly limited to pairs of SNPs due to the combinatorial explosion of higher-order EIs. With NeEDL (network-based epistasis detection via local search), we leverage network medicine to inform the selection of EIs that are an order of magnitude more statistically significant compared to existing tools and consist, on average, of five SNPs. We further show that this computationally demanding task can be substantially accelerated once quantum computing hardware becomes available. We apply NeEDL to eight different diseases and discover genes (affected by EIs of SNPs) that are partly known to affect the disease, additionally, these results are reproducible across independent cohorts. EIs for these eight diseases can be interactively explored in the Epistasis Disease Atlas (https://epistasis-disease-atlas.com). In summary, NeEDL demonstrates the potential of seamlessly integrated quantum computing techniques to accelerate biomedical research. Our network medicine approach detects higher-order EIs with unprecedented statistical and biological evidence, yielding unique insights into polygenic diseases and providing a basis for the development of improved risk scores and combination therapies.


Assuntos
Epistasia Genética , Polimorfismo de Nucleotídeo Único , Humanos , Teoria Quântica , Herança Multifatorial/genética , Doença/genética , Biologia Computacional/métodos , Algoritmos , Predisposição Genética para Doença
3.
PLoS Pathog ; 20(7): e1012039, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38950065

RESUMO

The severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) not only caused the COVID-19 pandemic but also had a major impact on farmed mink production in several European countries. In Denmark, the entire population of farmed mink (over 15 million animals) was culled in late 2020. During the period of June to November 2020, mink on 290 farms (out of about 1100 in the country) were shown to be infected with SARS-CoV-2. Genome sequencing identified changes in the virus within the mink and it is estimated that about 4000 people in Denmark became infected with these mink virus variants. However, the routes of transmission of the virus to, and from, the mink have been unclear. Phylogenetic analysis revealed the generation of multiple clusters of the virus within the mink. Detailed analysis of changes in the virus during replication in mink and, in parallel, in the human population in Denmark, during the same time period, has been performed here. The majority of cases in mink involved variants with the Y453F substitution and the H69/V70 deletion within the Spike (S) protein; these changes emerged early in the outbreak. However, further introductions of the virus, by variants lacking these changes, from the human population into mink also occurred. Based on phylogenetic analysis of viral genome data, we estimate, using a conservative approach, that about 17 separate examples of mink to human transmission occurred in Denmark but up to 59 such events (90% credible interval: (39-77)) were identified using parsimony to count cross-species jumps on transmission trees inferred using Bayesian methods. Using the latter approach, 136 jumps (90% credible interval: (117-164)) from humans to mink were found, which may underlie the farm-to-farm spread. Thus, transmission of SARS-CoV-2 from humans to mink, mink to mink, from mink to humans and between humans were all observed.


Assuntos
COVID-19 , Vison , Filogenia , SARS-CoV-2 , Vison/virologia , COVID-19/transmissão , COVID-19/virologia , COVID-19/epidemiologia , COVID-19/veterinária , SARS-CoV-2/genética , Animais , Dinamarca/epidemiologia , Humanos , Pandemias , Fazendas , Betacoronavirus/genética , Betacoronavirus/classificação , Genoma Viral , Infecções por Coronavirus/veterinária , Infecções por Coronavirus/epidemiologia , Infecções por Coronavirus/virologia , Infecções por Coronavirus/transmissão , Glicoproteína da Espícula de Coronavírus/genética
4.
medRxiv ; 2023 Nov 09.
Artigo em Inglês | MEDLINE | ID: mdl-38076997

RESUMO

Most heritable diseases are polygenic. To comprehend the underlying genetic architecture, it is crucial to discover the clinically relevant epistatic interactions (EIs) between genomic single nucleotide polymorphisms (SNPs)1-3. Existing statistical computational methods for EI detection are mostly limited to pairs of SNPs due to the combinatorial explosion of higher-order EIs. With NeEDL (network-based epistasis detection via local search), we leverage network medicine to inform the selection of EIs that are an order of magnitude more statistically significant compared to existing tools and consist, on average, of five SNPs. We further show that this computationally demanding task can be substantially accelerated once quantum computing hardware becomes available. We apply NeEDL to eight different diseases and discover genes (affected by EIs of SNPs) that are partly known to affect the disease, additionally, these results are reproducible across independent cohorts. EIs for these eight diseases can be interactively explored in the Epistasis Disease Atlas (https://epistasis-disease-atlas.com). In summary, NeEDL is the first application that demonstrates the potential of seamlessly integrated quantum computing techniques to accelerate biomedical research. Our network medicine approach detects higher-order EIs with unprecedented statistical and biological evidence, yielding unique insights into polygenic diseases and providing a basis for the development of improved risk scores and combination therapies.

6.
Nat Biotechnol ; 41(3): 399-408, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-36593394

RESUMO

The application of multiple omics technologies in biomedical cohorts has the potential to reveal patient-level disease characteristics and individualized response to treatment. However, the scale and heterogeneous nature of multi-modal data makes integration and inference a non-trivial task. We developed a deep-learning-based framework, multi-omics variational autoencoders (MOVE), to integrate such data and applied it to a cohort of 789 people with newly diagnosed type 2 diabetes with deep multi-omics phenotyping from the DIRECT consortium. Using in silico perturbations, we identified drug-omics associations across the multi-modal datasets for the 20 most prevalent drugs given to people with type 2 diabetes with substantially higher sensitivity than univariate statistical tests. From these, we among others, identified novel associations between metformin and the gut microbiota as well as opposite molecular responses for the two statins, simvastatin and atorvastatin. We used the associations to quantify drug-drug similarities, assess the degree of polypharmacy and conclude that drug effects are distributed across the multi-omics modalities.


Assuntos
Aprendizado Profundo , Diabetes Mellitus Tipo 2 , Humanos , Algoritmos , Diabetes Mellitus Tipo 2/tratamento farmacológico , Diabetes Mellitus Tipo 2/genética
7.
Viruses ; 13(6)2021 06 02.
Artigo em Inglês | MEDLINE | ID: mdl-34199456

RESUMO

Beginning in late 2017, highly pathogenic avian influenza (HPAI) H5N6 viruses caused outbreaks in wild birds and poultry in several European countries. H5N6 viruses were detected in 43 wild birds found dead throughout Denmark. Most of the Danish virus-positive dead birds were found in the period from February to April 2018. However, unlike the rest of Europe, sporadic HPAI H5N6-positive dead wild birds were detected in Denmark in July, August, September, and December 2018, with the last positive bird being found in January 2019. HPAI viruses were not detected in active surveillance of apparently healthy wild birds. In this study, we use full genome sequencing and phylogenetic analysis to investigate the wild bird HPAI H5N6 viruses found in Denmark. The Danish viruses were found to be closely related to those of contemporary HPAI H5N6 viruses detected in Europe. Their sequences formed two clusters indicating that at least two or more introductions of H5N6 into Denmark occurred. Notably, all viruses detected in the latter half of 2018 and in 2019 grouped into the same cluster. The H5N6 viruses appeared to have been maintained undetected in the autumn 2018.


Assuntos
Vírus da Influenza A/genética , Influenza Aviária/epidemiologia , Influenza Aviária/virologia , Animais , Animais Selvagens , Teorema de Bayes , Aves , Dinamarca/epidemiologia , Surtos de Doenças/veterinária , Evolução Molecular , Geografia Médica , História do Século XXI , Vírus da Influenza A/classificação , Vírus da Influenza A/isolamento & purificação , Influenza Aviária/história , Influenza Aviária/transmissão , Filogenia , Vigilância em Saúde Pública , RNA Viral
8.
Elife ; 102021 07 27.
Artigo em Inglês | MEDLINE | ID: mdl-34313225

RESUMO

Since the influenza pandemic in 2009, there has been an increased focus on swine influenza A virus (swIAV) surveillance. This paper describes the results of the surveillance of swIAV in Danish swine from 2011 to 2018. In total, 3800 submissions were received with a steady increase in swIAV-positive submissions, reaching 56% in 2018. Full-genome sequences were obtained from 129 swIAV-positive samples. Altogether, 17 different circulating genotypes were identified including six novel reassortants harboring human seasonal IAV gene segments. The phylogenetic analysis revealed substantial genetic drift and also evidence of positive selection occurring mainly in antigenic sites of the hemagglutinin protein and confirmed the presence of a swine divergent cluster among the H1pdm09Nx (clade 1A.3.3.2) viruses. The results provide essential data for the control of swIAV in pigs and emphasize the importance of contemporary surveillance for discovering novel swIAV strains posing a potential threat to the human population.


Assuntos
Variação Genética , Vírus da Influenza A/classificação , Vírus da Influenza A/genética , Infecções por Orthomyxoviridae/virologia , Doenças dos Suínos/virologia , Animais , Dinamarca , Deriva Genética , Genótipo , Testes de Inibição da Hemaglutinação , Humanos , Vírus da Influenza A Subtipo H1N1/genética , Vírus da Influenza A Subtipo H1N2/genética , Vírus da Influenza A Subtipo H3N2/genética , Vírus da Influenza A/isolamento & purificação , Mutação , Neuraminidase/genética , Filogenia , RNA Viral/genética , Vírus Reordenados/genética , Estações do Ano , Suínos
9.
NPJ Precis Oncol ; 5(1): 55, 2021 Jun 18.
Artigo em Inglês | MEDLINE | ID: mdl-34145376

RESUMO

PARP inhibitors are approved for the treatment of solid tumor types that frequently harbor alterations in the key homologous recombination (HR) genes, BRCA1/2. Other tumor types, such as lung cancer, may also be HR deficient, but the frequency of such cases is less well characterized. Specific DNA aberration profiles (mutational signatures) are induced by homologous recombination deficiency (HRD) and their presence can be used to assess the presence or absence of HR deficiency in a given tumor biopsy even in the absence of an observed alteration of an HR gene. We derived various HRD-associated mutational signatures from whole-genome and whole-exome sequencing data in the lung adenocarcinoma and lung squamous carcinoma cases from TCGA, and in a patient of ours with stage IVA lung cancer with exceptionally good response to platinum-based therapy, and in lung cancer cell lines. We found that a subset of the investigated cases, both with and without biallelic loss of BRCA1 or BRCA2, showed robust signs of HR deficiency. The extreme platinum responder case also showed a robust HRD-associated genomic mutational profile. HRD-associated mutational signatures were also associated with PARP inhibitor sensitivity in lung cancer cell lines. Consequently, lung cancer cases with HRD, as identified by diagnostic mutational signatures, may benefit from PARP inhibitor therapy.

10.
Brief Bioinform ; 22(6)2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-34015811

RESUMO

Formalin-fixed paraffin-embedded tissue, the most common tissue specimen stored in clinical practice, presents challenges in the analysis due to formalin-induced artifacts. Here, we present Strand Orientation Bias Detector (SOBDetector), a flexible computational platform compatible with all the common somatic SNV-calling pipelines, designed to assess the probability whether a given detected mutation is an artifact. The underlying predictor mechanism is based on the posterior distribution of a Bayesian logistic regression model trained on The Cancer Genome Atlas whole exomes. SOBDetector is a freely available cross-platform program, implemented in Java 1.8.


Assuntos
Artefatos , Técnicas Citológicas/normas , Sequenciamento de Nucleotídeos em Larga Escala/normas , Modelos Estatísticos , Análise de Sequência de DNA/normas , Moldes Genéticos , Algoritmos , DNA de Neoplasias , Bases de Dados Genéticas , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Mutação , Neoplasias/diagnóstico , Neoplasias/genética , Reprodutibilidade dos Testes , Análise de Sequência de DNA/métodos
11.
Nat Commun ; 11(1): 5660, 2020 11 09.
Artigo em Inglês | MEDLINE | ID: mdl-33168830

RESUMO

Human endogenous retroviruses (HERV) form a substantial part of the human genome, but mostly remain transcriptionally silent under strict epigenetic regulation, yet can potentially be reactivated by malignant transformation or epigenetic therapies. Here, we evaluate the potential for T cell recognition of HERV elements in myeloid malignancies by mapping transcribed HERV genes and generating a library of 1169 potential antigenic HERV-derived peptides predicted for presentation by 4 HLA class I molecules. Using DNA barcode-labeled MHC-I multimers, we find CD8+ T cell populations recognizing 29 HERV-derived peptides representing 18 different HERV loci, of which HERVH-5, HERVW-1, and HERVE-3 have more profound responses; such HERV-specific T cells are present in 17 of the 34 patients, but less frequently in healthy donors. Transcriptomic analyses reveal enhanced transcription of the HERVs in patients; meanwhile DNA-demethylating therapy causes a small and heterogeneous enhancement in HERV transcription without altering T cell recognition. Our study thus uncovers T cell recognition of HERVs in myeloid malignancies, thereby implicating HERVs as potential targets for immunotherapeutic therapies.


Assuntos
Retrovirus Endógenos/genética , Neoplasias Hematológicas/virologia , Linfócitos T/metabolismo , Linfócitos T/virologia , Linfócitos T CD8-Positivos , Epigênese Genética , Epitopos de Linfócito T , Perfilação da Expressão Gênica , Neoplasias Hematológicas/genética , Neoplasias Hematológicas/terapia , Humanos , Imunoterapia , Monitorização Imunológica , Células Mieloides , Neoplasias
12.
Viruses ; 12(2)2020 02 23.
Artigo em Inglês | MEDLINE | ID: mdl-32102230

RESUMO

The degree of antigenic drift in swine influenza A viruses (swIAV) has historically been regarded as minimal compared to that of human influenza A virus strains. However, as surveillance activities on swIAV have increased, more isolates have been characterized, revealing a high level of genetic and antigenic differences even within the same swIAV lineage. The objective of this study was to investigate the level of genetic drift in one enzootically infected swine herd over one year. Nasal swabs were collected monthly from sows (n = 4) and piglets (n = 40) in the farrowing unit, and from weaners (n = 20) in the nursery. Virus from 1-4 animals were sequenced per month. Analyses of the sequences revealed that the hemagglutinin (HA) gene was the main target for genetic drift with a substitution rate of 7.6 × 10-3 substitutions/site/year and evidence of positive selection. The majority of the mutations occurred in the globular head of the HA protein and in antigenic sites. The phylogenetic tree of the HA sequences displayed a pectinate typology, where only a single lineage persists and forms the ancestor for subsequent lineages. This was most likely caused by repeated selection of a single immune-escape variant, which subsequently became the founder of the next wave of infections.


Assuntos
Antígenos Virais/genética , Deriva Genética , Glicoproteínas de Hemaglutininação de Vírus da Influenza/genética , Mutação , Filogenia , Substituição de Aminoácidos , Animais , Animais Recém-Nascidos/virologia , Antígenos Virais/imunologia , Evolução Molecular , Feminino , Glicoproteínas de Hemaglutininação de Vírus da Influenza/imunologia , Nariz/virologia , Infecções por Orthomyxoviridae/virologia , Suínos/virologia
13.
PLoS One ; 14(11): e0224854, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31725751

RESUMO

Influenza A virus (IAV) is a highly contagious pathogen in pigs. Swine IAV (swIAV) infection causes respiratory disease and is thereby a challenge for animal health, animal welfare and the production economy. In Europe, the most widespread strategy for controlling swIAV is implementation of sow vaccination programs, to secure delivery of protective maternally derived antibodies (MDAs) to the newborn piglets. In this study we report a unique case, where a persistently swIAV (A/sw/Denmark/P5U4/2016(H1N1)) infected herd experienced an acute outbreak with a new swIAV subtype (A/sw/Denmark/HB4280U1/2017(H1N2)) and subsequently decided to implement a mass sow vaccination program. Clinical registrations, nasal swabs and blood samples were collected from four different batches of pigs before and after vaccination. Virus isolation, sequencing of the virus strain and hemagglutinin inhibition (HI) tests were performed on samples collected before and during the outbreak and after implementation of mass sow vaccination. After implementation of the sow mass vaccination, the time of infection was delayed and the viral load significantly decreased. An increased number of pigs, however, tested positive at two consecutive sampling times indicating prolonged shedding. In addition, a significantly smaller proportion of the 10-12 weeks old pigs were seropositive by the end of the study, indicating an impaired induction of antibodies against swIAV in the presence of MDAs. Sequencing of the herd strains revealed major differences in the hemagglutinin gene of the strain isolated before- and during the acute outbreak despite that, the two strains belonged to the same HA lineage. The HI tests confirmed a limited degree of cross-reaction between the two strains. Furthermore, the sequencing results of the hemagglutinin gene obtained before and after implementation of mass sow vaccination revealed an increased substitution rate and an increase in positively selected sites in the globular head of the hemagglutinin after vaccination.


Assuntos
Vírus da Influenza A/genética , Vírus da Influenza A/imunologia , Infecções por Orthomyxoviridae/veterinária , Doenças dos Suínos/epidemiologia , Doenças dos Suínos/virologia , Animais , Anticorpos Antivirais/imunologia , Dinamarca/epidemiologia , Surtos de Doenças , Evolução Molecular , Glicoproteínas de Hemaglutininação de Vírus da Influenza/genética , Glicoproteínas de Hemaglutininação de Vírus da Influenza/imunologia , Imunização , Estudos Soroepidemiológicos , Suínos , Doenças dos Suínos/prevenção & controle , Carga Viral , Eliminação de Partículas Virais
14.
Viruses ; 11(10)2019 10 10.
Artigo em Inglês | MEDLINE | ID: mdl-31658773

RESUMO

Vaccines against classical swine fever have proven very effective in protecting pigs from this deadly disease. However, little is known about how vaccination impacts the selective pressures acting on the classical swine fever virus (CSFV). Here we use high-throughput sequencing of viral genomes to investigate evolutionary changes in virus populations following the challenge of naïve and vaccinated pigs with the highly virulent CSFV strain "Koslov". The challenge inoculum contained an ensemble of closely related viral sequences, with three major haplotypes being present, termed A, B, and C. After the challenge, the viral haplotype A was preferentially located within the tonsils of naïve animals but was highly prevalent in the sera of all vaccinated animals. We find that the viral population structure in naïve pigs after infection is very similar to that in the original inoculum. In contrast, the viral population in vaccinated pigs, which only underwent transient low-level viremia, displayed several distinct changes including the emergence of 16 unique non-synonymous single nucleotide polymorphisms (SNPs) that were not detectable in the challenge inoculum. Further analysis showed a significant loss of heterogeneity and an increasing positive selection acting on the virus populations in the vaccinated pigs. We conclude that vaccination imposes a strong selective pressure on viruses that subsequently replicate within the vaccinated animal.


Assuntos
Vírus da Febre Suína Clássica , Doenças dos Suínos/virologia , Interferência Viral , Vacinas Virais , Adaptação Fisiológica , Animais , Sangue/virologia , Peste Suína Clássica/virologia , Vírus da Febre Suína Clássica/genética , Vírus da Febre Suína Clássica/imunologia , Sequenciamento de Nucleotídeos em Larga Escala , Tonsila Palatina/virologia , Polimorfismo de Nucleotídeo Único , RNA Viral , Suínos , Vacinação/veterinária , Vacinas Atenuadas , Viremia/sangue , Virulência , Sequenciamento Completo do Genoma
16.
Nature ; 557(7705): 369-374, 2018 05.
Artigo em Inglês | MEDLINE | ID: mdl-29743675

RESUMO

For thousands of years the Eurasian steppes have been a centre of human migrations and cultural change. Here we sequence the genomes of 137 ancient humans (about 1× average coverage), covering a period of 4,000 years, to understand the population history of the Eurasian steppes after the Bronze Age migrations. We find that the genetics of the Scythian groups that dominated the Eurasian steppes throughout the Iron Age were highly structured, with diverse origins comprising Late Bronze Age herders, European farmers and southern Siberian hunter-gatherers. Later, Scythians admixed with the eastern steppe nomads who formed the Xiongnu confederations, and moved westward in about the second or third century BC, forming the Hun traditions in the fourth-fifth century AD, and carrying with them plague that was basal to the Justinian plague. These nomads were further admixed with East Asian groups during several short-term khanates in the Medieval period. These historical events transformed the Eurasian steppes from being inhabited by Indo-European speakers of largely West Eurasian ancestry to the mostly Turkic-speaking groups of the present day, who are primarily of East Asian ancestry.


Assuntos
Povo Asiático/genética , Genoma Humano/genética , Pradaria , Filogenia , População Branca/genética , Ásia/etnologia , Europa (Continente)/etnologia , Fazendeiros/história , História Antiga , Migração Humana/história , Humanos
17.
Nat Commun ; 9(1): 1661, 2018 04 25.
Artigo em Inglês | MEDLINE | ID: mdl-29695774

RESUMO

Inflammatory bowel disease (IBD) is a chronic intestinal disorder, with two main types: Crohn's disease (CD) and ulcerative colitis (UC), whose molecular pathology is not well understood. The majority of IBD-associated SNPs are located in non-coding regions and are hard to characterize since regulatory regions in IBD are not known. Here we profile transcription start sites (TSSs) and enhancers in the descending colon of 94 IBD patients and controls. IBD-upregulated promoters and enhancers are highly enriched for IBD-associated SNPs and are bound by the same transcription factors. IBD-specific TSSs are associated to genes with roles in both inflammatory cascades and gut epithelia while TSSs distinguishing UC and CD are associated to gut epithelia functions. We find that as few as 35 TSSs can distinguish active CD, UC, and controls with 85% accuracy in an independent cohort. Our data constitute a foundation for understanding the molecular pathology, gene regulation, and genetics of IBD.


Assuntos
Colite Ulcerativa/genética , Doença de Crohn/genética , Sequências Reguladoras de Ácido Nucleico/genética , Adulto , Biópsia , Estudos de Casos e Controles , Estudos de Coortes , Colite Ulcerativa/diagnóstico , Colite Ulcerativa/patologia , Colo/diagnóstico por imagem , Colo/patologia , Colonoscopia , Doença de Crohn/diagnóstico , Doença de Crohn/patologia , Feminino , Humanos , Mucosa Intestinal/diagnóstico por imagem , Mucosa Intestinal/patologia , Masculino , Pessoa de Meia-Idade , Polimorfismo de Nucleotídeo Único , Análise de Sequência de RNA , Regulação para Cima
18.
BMC Genomics ; 18(1): 19, 2017 01 05.
Artigo em Inglês | MEDLINE | ID: mdl-28056767

RESUMO

BACKGROUND: Whole genome sequencing (WGS) is increasingly used in diagnostics and surveillance of infectious diseases. A major application for WGS is to use the data for identifying outbreak clusters, and there is therefore a need for methods that can accurately and efficiently infer phylogenies from sequencing reads. In the present study we describe a new dataset that we have created for the purpose of benchmarking such WGS-based methods for epidemiological data, and also present an analysis where we use the data to compare the performance of some current methods. RESULTS: Our aim was to create a benchmark data set that mimics sequencing data of the sort that might be collected during an outbreak of an infectious disease. This was achieved by letting an E. coli hypermutator strain grow in the lab for 8 consecutive days, each day splitting the culture in two while also collecting samples for sequencing. The result is a data set consisting of 101 whole genome sequences with known phylogenetic relationship. Among the sequenced samples 51 correspond to internal nodes in the phylogeny because they are ancestral, while the remaining 50 correspond to leaves. We also used the newly created data set to compare three different online available methods that infer phylogenies from whole-genome sequencing reads: NDtree, CSI Phylogeny and REALPHY. One complication when comparing the output of these methods with the known phylogeny is that phylogenetic methods typically build trees where all observed sequences are placed as leafs, even though some of them are in fact ancestral. We therefore devised a method for post processing the inferred trees by collapsing short branches (thus relocating some leafs to internal nodes), and also present two new measures of tree similarity that takes into account the identity of both internal and leaf nodes. CONCLUSIONS: Based on this analysis we find that, among the investigated methods, CSI Phylogeny had the best performance, correctly identifying 73% of all branches in the tree and 71% of all clades. We have made all data from this experiment (raw sequencing reads, consensus whole-genome sequences, as well as descriptions of the known phylogeny in a variety of formats) publicly available, with the hope that other groups may find this data useful for benchmarking and exploring the performance of epidemiological methods. All data is freely available at: https://cge.cbs.dtu.dk/services/evolution_data.php .


Assuntos
Bactérias/classificação , Bactérias/genética , Genoma Bacteriano , Genômica , Filogenia , Artefatos , Bases de Dados Genéticas , Escherichia coli/genética , Evolução Molecular , Genômica/métodos , Genômica/normas , Sequenciamento de Nucleotídeos em Larga Escala , Mutação , Taxa de Mutação
19.
BMC Bioinformatics ; 17: 176, 2016 Apr 22.
Artigo em Inglês | MEDLINE | ID: mdl-27102804

RESUMO

BACKGROUND: Amplicon pyrosequencing targets a known genetic region and thus inherently produces reads highly anticipated to have certain features, such as conserved nucleotide sequence, and in the case of protein coding DNA, an open reading frame. Pyrosequencing errors, consisting mainly of nucleotide insertions and deletions, are on the other hand likely to disrupt open reading frames. Such an inverse relationship between errors and expectation based on prior knowledge can be used advantageously to guide the process known as basecalling, i.e. the inference of nucleotide sequence from raw sequencing data. RESULTS: The new basecalling method described here, named Multipass, implements a probabilistic framework for working with the raw flowgrams obtained by pyrosequencing. For each sequence variant Multipass calculates the likelihood and nucleotide sequence of several most likely sequences given the flowgram data. This probabilistic approach enables integration of basecalling into a larger model where other parameters can be incorporated, such as the likelihood for observing a full-length open reading frame at the targeted region. We apply the method to 454 amplicon pyrosequencing data obtained from a malaria virulence gene family, where Multipass generates 20 % more error-free sequences than current state of the art methods, and provides sequence characteristics that allow generation of a set of high confidence error-free sequences. CONCLUSIONS: This novel method can be used to increase accuracy of existing and future amplicon sequencing data, particularly where extensive prior knowledge is available about the obtained sequences, for example in analysis of the immunoglobulin VDJ region where Multipass can be combined with a model for the known recombining germline genes. Multipass is available for Roche 454 data at http://www.cbs.dtu.dk/services/MultiPass-1.0 , and the concept can potentially be implemented for other sequencing technologies as well.


Assuntos
DNA de Protozoário/isolamento & purificação , Análise de Sequência de DNA , Algoritmos , DNA de Protozoário/genética , Modelos Moleculares , Fases de Leitura Aberta , Plasmodium falciparum/genética , Proteínas de Protozoários/genética , Alinhamento de Sequência
20.
Cell ; 163(3): 571-82, 2015 Oct 22.
Artigo em Inglês | MEDLINE | ID: mdl-26496604

RESUMO

The bacteria Yersinia pestis is the etiological agent of plague and has caused human pandemics with millions of deaths in historic times. How and when it originated remains contentious. Here, we report the oldest direct evidence of Yersinia pestis identified by ancient DNA in human teeth from Asia and Europe dating from 2,800 to 5,000 years ago. By sequencing the genomes, we find that these ancient plague strains are basal to all known Yersinia pestis. We find the origins of the Yersinia pestis lineage to be at least two times older than previous estimates. We also identify a temporal sequence of genetic changes that lead to increased virulence and the emergence of the bubonic plague. Our results show that plague infection was endemic in the human populations of Eurasia at least 3,000 years before any historical recordings of pandemics.


Assuntos
Peste/microbiologia , Yersinia pestis/classificação , Yersinia pestis/isolamento & purificação , Animais , Ásia , DNA Bacteriano/genética , Europa (Continente) , História Antiga , História Medieval , Humanos , Peste/história , Peste/transmissão , Sifonápteros/microbiologia , Dente/microbiologia , Yersinia pestis/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA