Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 67
Filtrar
Mais filtros

Intervalo de ano de publicação
1.
BMJ Open ; 9(10): e031092, 2019 10 07.
Artigo em Inglês | MEDLINE | ID: mdl-31594892

RESUMO

INTRODUCTION: Genomic sequencing has rapidly transitioned into clinical practice, improving diagnosis and treatment options for patients with hereditary disorders. However, large-scale implementation of genomic sequencing faces challenges, especially with regard to the return of incidental results, which refer to genetic variants uncovered during testing that are unrelated to the primary disease under investigation, but of potential clinical significance. High-quality evidence evaluating health outcomes and costs of receiving incidental results is critical for the adoption of genomic sequencing into clinical care and to understand the unintended consequences of adoption of genomic sequencing. We aim to evaluate the health outcomes and costs of receiving incidental results for patients undergoing genomic sequencing. METHODS AND ANALYSIS: We will compare health outcomes and costs of receiving, versus not receiving, incidental results for adult patients with cancer undergoing genomic sequencing in a mixed-methods randomised controlled trial. Two hundred and sixty patients who have previously undergone first or second-tier genetic testing for cancer and received uninformative results will be recruited from familial cancer clinics in Toronto, Ontario. Participants in both arms will receive cancer-related results. Participants in the intervention arm have the option to receive incidental results. Our primary outcome is psychological distress at 2 weeks following return of results. Secondary outcomes include behavioural consequences, clinical and personal utility assessed over the 12 months after results are returned and health service use and costs at 12 months and 5 years. A subset of participants and providers will complete qualitative interviews about utility of incidental results. ETHICS AND DISSEMINATION: This study has been approved by Clinical Trials Ontario Streamlined Research Ethics Review System that provides ethical review and oversight for multiple sites participating in the same clinical trial in Ontario.Results from the trial will be shared through stakeholder workshops, national and international conferences, and peer-reviewed journals. TRIAL REGISTRATION NUMBER: NCT03597165.


Assuntos
Achados Incidentais , Padrões de Prática Médica , Análise de Sequência de DNA , Adulto , Custos e Análise de Custo , Estudos de Avaliação como Assunto , Feminino , Testes Genéticos/métodos , Variação Genética , Humanos , Masculino , Avaliação de Resultados em Cuidados de Saúde/economia , Avaliação de Resultados em Cuidados de Saúde/métodos , Padrões de Prática Médica/economia , Padrões de Prática Médica/ética , Padrões de Prática Médica/normas , Ensaios Clínicos Controlados Aleatórios como Assunto , Análise de Sequência de DNA/ética , Análise de Sequência de DNA/métodos , Análise de Sequência de DNA/estatística & dados numéricos
2.
Mol Genet Genomic Med ; 7(2): e00606, 2019 02.
Artigo em Inglês | MEDLINE | ID: mdl-30816028

RESUMO

Attention has been focused on the field of genetics and genomics in Iran in recent years and some efforts have been enforced and implemented. However, they are totally not adequate, considering the advances in medical genetics and genomics in the past two decades around the world. Overall, considering the lack of medical genetics residency programs in the Iranian health education system, big demand due to high consanguinity and intraethnic marriages, there is a lag in genetic services and necessity to an immediate response to fill this big gap in Iran. As clarified in the National constitution fundamental law and re-emphasized in the 6th National Development Plan, the Iranian government authority is in charge of providing the standard level of health including genetic services to all Iranian individuals who are in need.


Assuntos
Utilização de Instalações e Serviços , Doenças Genéticas Inatas/diagnóstico , Testes Genéticos/estatística & dados numéricos , Genética Médica/estatística & dados numéricos , Diagnóstico Pré-Natal/estatística & dados numéricos , Análise de Sequência de DNA/estatística & dados numéricos , Bases de Dados Genéticas , Doenças Genéticas Inatas/epidemiologia , Doenças Genéticas Inatas/genética , Testes Genéticos/economia , Testes Genéticos/legislação & jurisprudência , Genética Médica/economia , Genética Médica/legislação & jurisprudência , Genética Médica/organização & administração , Humanos , Irã (Geográfico) , Diagnóstico Pré-Natal/economia , Análise de Sequência de DNA/economia
3.
Nat Rev Genet ; 20(6): 356-370, 2019 06.
Artigo em Inglês | MEDLINE | ID: mdl-30886350

RESUMO

Antimicrobial resistance extracts high morbidity, mortality and economic costs yearly by rendering bacteria immune to antibiotics. Identifying and understanding antimicrobial resistance are imperative for clinical practice to treat resistant infections and for public health efforts to limit the spread of resistance. Technologies such as next-generation sequencing are expanding our abilities to detect and study antimicrobial resistance. This Review provides a detailed overview of antimicrobial resistance identification and characterization methods, from traditional antimicrobial susceptibility testing to recent deep-learning methods. We focus on sequencing-based resistance discovery and discuss tools and databases used in antimicrobial resistance studies.


Assuntos
Bactérias/efeitos dos fármacos , Farmacorresistência Bacteriana Múltipla/genética , Genoma Bacteriano , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/estatística & dados numéricos , Sequenciamento Completo do Genoma/métodos , Antibacterianos/farmacologia , Bactérias/classificação , Bactérias/genética , Bactérias/isolamento & purificação , Infecções Bacterianas/tratamento farmacológico , Infecções Bacterianas/microbiologia , Sequência de Bases , Sequenciamento de Nucleotídeos em Larga Escala/instrumentação , Humanos , Aprendizado de Máquina , Metagenômica , Análise de Sequência de DNA/métodos , Sequenciamento Completo do Genoma/instrumentação
4.
Brief Bioinform ; 20(4): 1151-1159, 2019 07 19.
Artigo em Inglês | MEDLINE | ID: mdl-29028869

RESUMO

As technologies change, MG-RAST is adapting. Newly available software is being included to improve accuracy and performance. As a computational service constantly running large volume scientific workflows, MG-RAST is the right location to perform benchmarking and implement algorithmic or platform improvements, in many cases involving trade-offs between specificity, sensitivity and run-time cost. The work in [Glass EM, Dribinsky Y, Yilmaz P, et al. ISME J 2014;8:1-3] is an example; we use existing well-studied data sets as gold standards representing different environments and different technologies to evaluate any changes to the pipeline. Currently, we use well-understood data sets in MG-RAST as platform for benchmarking. The use of artificial data sets for pipeline performance optimization has not added value, as these data sets are not presenting the same challenges as real-world data sets. In addition, the MG-RAST team welcomes suggestions for improvements of the workflow. We are currently working on versions 4.02 and 4.1, both of which contain significant input from the community and our partners that will enable double barcoding, stronger inferences supported by longer-read technologies, and will increase throughput while maintaining sensitivity by using Diamond and SortMeRNA. On the technical platform side, the MG-RAST team intends to support the Common Workflow Language as a standard to specify bioinformatics workflows, both to facilitate development and efficient high-performance implementation of the community's data analysis tasks.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Metagenoma , Metagenômica/métodos , Software , Algoritmos , Orçamentos , Biologia Computacional/métodos , Sequenciamento de Nucleotídeos em Larga Escala/economia , Sequenciamento de Nucleotídeos em Larga Escala/estatística & dados numéricos , Internet , Metagenômica/economia , Metagenômica/estatística & dados numéricos , Análise de Sequência de DNA/economia , Análise de Sequência de DNA/métodos , Análise de Sequência de DNA/estatística & dados numéricos , Interface Usuário-Computador , Fluxo de Trabalho
5.
Brief Bioinform ; 20(4): 1222-1237, 2019 07 19.
Artigo em Inglês | MEDLINE | ID: mdl-29220512

RESUMO

MOTIVATION: Since the dawn of the bioinformatics field, sequence alignment scores have been the main method for comparing sequences. However, alignment algorithms are quadratic, requiring long execution time. As alternatives, scientists have developed tens of alignment-free statistics for measuring the similarity between two sequences. RESULTS: We surveyed tens of alignment-free k-mer statistics. Additionally, we evaluated 33 statistics and multiplicative combinations between the statistics and/or their squares. These statistics are calculated on two k-mer histograms representing two sequences. Our evaluations using global alignment scores revealed that the majority of the statistics are sensitive and capable of finding similar sequences to a query sequence. Therefore, any of these statistics can filter out dissimilar sequences quickly. Further, we observed that multiplicative combinations of the statistics are highly correlated with the identity score. Furthermore, combinations involving sequence length difference or Earth Mover's distance, which takes the length difference into account, are always among the highest correlated paired statistics with identity scores. Similarly, paired statistics including length difference or Earth Mover's distance are among the best performers in finding the K-closest sequences. Interestingly, similar performance can be obtained using histograms of shorter words, resulting in reducing the memory requirement and increasing the speed remarkably. Moreover, we found that simple single statistics are sufficient for processing next-generation sequencing reads and for applications relying on local alignment. Finally, we measured the time requirement of each statistic. The survey and the evaluations will help scientists with identifying efficient alternatives to the costly alignment algorithm, saving thousands of computational hours. AVAILABILITY: The source code of the benchmarking tool is available as Supplementary Materials.


Assuntos
Biologia Computacional/métodos , Modelos Estatísticos , Análise de Sequência de DNA/estatística & dados numéricos , Algoritmos , Bases de Dados de Ácidos Nucleicos/estatística & dados numéricos , Sequenciamento de Nucleotídeos em Larga Escala/estatística & dados numéricos , Humanos , Cadeias de Markov , Alinhamento de Sequência/estatística & dados numéricos
6.
J Med Microbiol ; 67(11): 1589-1595, 2018 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-30311873

RESUMO

PURPOSE: Bloodstream infections are major causes of morbidity and mortality that lead to prolonged hospital stays and higher medical costs. In this study, we aimed to evaluate the MinION nanopore sequencer for the identification of the most dominant pathogens in positive blood culture bottles. METHODOLOGY: 16S and ITS1-5.8S-ITS2 rRNA genes were amplified by PCR reactions with barcoded primers using nine clinical isolates obtained from positive blood bottles and 11 type strains, including five types of Candida species. Barcoded amplicons were mixed, and multiplex sequencing with the MinION sequencer was performed. In addition, barcoded PCR amplicons were sequenced by Sanger sequencing to validate the performance of the MinION. RESULTS: The bacterial and Candida spp. identified by MinION sequencing, based on the highest homology of reference sequences from the NCBI gene databases, agreed with the matrix-assisted laser desorption ionization time of flight mass spectrometry results, excepting the closely related species Streptococcusand Escherichia coli. The 'pass' reads obtained within about 10 min of sequencing were sufficient to identify the pathogens. The average values of sequence identities with 1D2 chemistry and the R9.5 flow cell were around 99 %; thus, frequent sequence errors did not affect species identification based on amplicon sequencing. CONCLUSION: We have established a rapid, portable and economical technique for the identification of pathogens in positive blood culture bottles through a novel MinION nanopore sequencer amplicon sequencing scheme, which replaces traditional Sanger sequencing.


Assuntos
Hemocultura/instrumentação , Hemocultura/métodos , Nanoporos , Análise de Sequência de DNA/instrumentação , Análise de Sequência de DNA/métodos , Bacteriemia/sangue , Bacteriemia/diagnóstico , Bacteriemia/microbiologia , Bactérias/classificação , Bactérias/genética , Bactérias/isolamento & purificação , Bactérias/patogenicidade , Candida/genética , Candida/isolamento & purificação , Candida/patogenicidade , DNA Bacteriano/genética , DNA Bacteriano/isolamento & purificação , DNA Fúngico/genética , DNA Fúngico/isolamento & purificação , Escherichia coli/genética , Escherichia coli/isolamento & purificação , Fungemia/sangue , Fungemia/diagnóstico , Fungemia/microbiologia , Fungos/genética , Fungos/isolamento & purificação , Fungos/patogenicidade , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Reação em Cadeia da Polimerase/métodos , Análise de Sequência de DNA/economia , Análise de Sequência de DNA/estatística & dados numéricos
7.
Genome Biol ; 19(1): 90, 2018 07 13.
Artigo em Inglês | MEDLINE | ID: mdl-30005597

RESUMO

Nanopore sequencing is a rapidly maturing technology delivering long reads in real time on a portable instrument at low cost. Not surprisingly, the community has rapidly taken up this new way of sequencing and has used it successfully for a variety of research applications. A major limitation of nanopore sequencing is its high error rate, which despite recent improvements to the nanopore chemistry and computational tools still ranges between 5% and 15%. Here, we review computational approaches determining the nanopore sequencing error rate. Furthermore, we outline strategies for translation of raw sequencing data into base calls for detection of base modifications and for obtaining consensus sequences.


Assuntos
Artefatos , DNA/química , Genoma , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Nanoporos , Análise de Sequência de DNA/estatística & dados numéricos , Pareamento de Bases , DNA/genética , Escherichia coli/genética , Humanos , Klebsiella pneumoniae/genética , Cadeias de Markov , Redes Neurais de Computação , Análise de Sequência de DNA/métodos
8.
Genetica ; 146(4-5): 361-368, 2018 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-29948517

RESUMO

Genomic prediction is feasible for estimating genomic breeding values because of dense genome-wide markers and credible statistical methods, such as Genomic Best Linear Unbiased Prediction (GBLUP) and various Bayesian methods. Compared with GBLUP, Bayesian methods propose more flexible assumptions for the distributions of SNP effects. However, most Bayesian methods are performed based on Markov chain Monte Carlo (MCMC) algorithms, leading to computational efficiency challenges. Hence, some fast Bayesian approaches, such as fast BayesB (fBayesB), were proposed to speed up the calculation. This study proposed another fast Bayesian method termed fast BayesC (fBayesC). The prior distribution of fBayesC assumes that a SNP with probability γ has a non-zero effect which comes from a normal density with a common variance. The simulated data from QTLMAS XII workshop and actual data on large yellow croaker were used to compare the predictive results of fBayesB, fBayesC and (MCMC-based) BayesC. The results showed that when γ was set as a small value, such as 0.01 in the simulated data or 0.001 in the actual data, fBayesB and fBayesC yielded lower prediction accuracies (abilities) than BayesC. In the actual data, fBayesC could yield very similar predictive abilities as BayesC when γ ≥ 0.01. When γ = 0.01, fBayesB could also yield similar results as fBayesC and BayesC. However, fBayesB could not yield an explicit result when γ ≥ 0.1, but a similar situation was not observed for fBayesC. Moreover, the computational speed of fBayesC was significantly faster than that of BayesC, making fBayesC a promising method for genomic prediction.


Assuntos
Interpretação Estatística de Dados , Genômica/métodos , Análise de Sequência de DNA/métodos , Algoritmos , Animais , Teorema de Bayes , Previsões/métodos , Genótipo , Humanos , Cadeias de Markov , Modelos Genéticos , Método de Monte Carlo , Perciformes/genética , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Análise de Sequência de DNA/estatística & dados numéricos , Software
9.
Mol Genet Genomic Med ; 6(1): 35-43, 2018 01.
Artigo em Inglês | MEDLINE | ID: mdl-29471590

RESUMO

BACKGROUND: With the availability of raw DNA generated from direct-to-consumer (DTC) testing companies, there has been a proliferation of third-party online services that are available to interpret the raw data for both genealogy and/or health purposes. This study examines the current landscape and downstream clinical implications of consumer use of third-party services. METHODS: Study participants were recruited online from social media platforms. A total of 321 survey respondents reported using third-party services for raw DNA interpretation. RESULTS: Participants were highly motivated to explore raw DNA for ancestral information (67%), individual health implications (62%), or both (40%). Participants primarily used one of seven companies to interpret raw DNA; 73% used more than one. Company choice was driven by the type of results offered (51%), price (45%), and online reviews (31%). Approximately 30% of participants shared results with a medical provider and 21% shared with more than one. Outcomes of sharing ranged from disinterest/discounting of the information to diagnosis of genetic conditions. Participants were highly satisfied with their decision to analyze raw DNA (M = 4.54/5), yet challenges in understanding interpretation results were reported irrespective of satisfaction ratings. CONCLUSION: Consumers face challenges in understanding the results and may seek out clinical assistance in interpreting their raw DNA results.


Assuntos
Triagem e Testes Direto ao Consumidor/ética , Triagem e Testes Direto ao Consumidor/estatística & dados numéricos , Testes Genéticos/métodos , Adulto , Idoso , Idoso de 80 Anos ou mais , Comportamento de Escolha , Triagem e Testes Direto ao Consumidor/economia , Feminino , Humanos , Internet , Masculino , Pessoa de Meia-Idade , Análise de Sequência de DNA/métodos , Análise de Sequência de DNA/estatística & dados numéricos , Mídias Sociais , Inquéritos e Questionários
10.
Medicine (Baltimore) ; 97(6): e9826, 2018 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-29419684

RESUMO

The prevalence and incidence of human immunodeficiency virus type 1 (HIV-1) among men who have sex with men (MSM) are on the rise throughout China. With a large population of MSM, Jiangsu Province is facing an escalating HIV-1 epidemic.The aim of this study was to explore the phylogenetic and temporal dynamics of HIV-1 CRF01_AE and CRF07_BC among antiretroviral therapy (ART)-naïve MSM recently infected with HIV-1 in Jiangsu Province.We recruited MSM in Jiangsu Province (Suzhou, Wuxi, Nantong, Taizhou and Yancheng) 2012 to 2015. We collected information on demographics and sexual behaviors and a blood sample for HIV genome RNA extraction, RT-PCR amplification, and DNA sequencing. Multiple alignments were made using Gene Cutter, with the selected reference sequences of various subtypes/recombinants from the Los Alamos HIV-1 database. Phylogenetic and Bayesian evolutionary analysis was performed by MEGA version 6.0, Fasttree v2.1.7. and BEAST v1.6.2. Categorical variables were analyzed using χ test (or Fisher exact test where necessary). χ test with trend was used to assess the evolution of HIV-1 subtype distribution over time. All data were analyzed using SPSS20.0 software package (IBM Company, New York, NY).HIV-1 phylogenetic analysis revealed a broad viral diversity including CRF01_AE (60.06%), CRF07_BC (22.29%), subtype B (5.88%), CRF67_01B (5.26%), CRF68_01B (2.79%), CRF55_01B (1.55%), CRF59_01B (0.93%), and CRF08_BC (0.62%). Two unique recombination forms (URFs) (0.62%) were also detected. Four epidemic clusters and 1 major cluster in CRF01_AE and CRF07_BC were identified. The introduction of CRF01_AE strain (2001) was earlier than CRF07_BC strain (2004) into MSM resided in Jiangsu based on the time of the most recent common ancestor.Our study demonstrated HIV-1 subtype diversity among ART-naïve MSM recently infected with HIV-1 in Jiangsu. We first depicted the spatiotemporal dynamics, traced the dates of origin for the HIV-1 CRF01_AE/07_BC strains and made inference for the effective population size among newly infected ART-naïve MSM in Jiangsu from 2012 to 2015. A real-time surveillance of HIV-1 viral diversity and phylodynamics of epidemic cluster would be of great value to the monitoring of the epidemic and control of transmission, improvement of antiretroviral therapy strategies, and design of vaccines.


Assuntos
Infecções por HIV , HIV-1 , RNA Viral/isolamento & purificação , Adulto , Teorema de Bayes , China/epidemiologia , Controle de Doenças Transmissíveis/métodos , Controle de Doenças Transmissíveis/organização & administração , Infecções por HIV/epidemiologia , Infecções por HIV/virologia , HIV-1/genética , HIV-1/isolamento & purificação , Humanos , Masculino , Filogenia , Filogeografia , Análise de Sequência de DNA/métodos , Análise de Sequência de DNA/estatística & dados numéricos , Análise Espaço-Temporal
11.
Brief Bioinform ; 19(5): 737-753, 2018 09 28.
Artigo em Inglês | MEDLINE | ID: mdl-28334228

RESUMO

DNA methylation is an important epigenetic mechanism that plays a crucial role in cellular regulatory systems. Recent advancements in sequencing technologies now enable us to generate high-throughput methylation data and to measure methylation up to single-base resolution. This wealth of data does not come without challenges, and one of the key challenges in DNA methylation studies is to identify the significant differences in the methylation levels of the base pairs across distinct biological conditions. Several computational methods have been developed to identify differential methylation using bisulfite sequencing data; however, there is no clear consensus among existing approaches. A comprehensive survey of these approaches would be of great benefit to potential users and researchers to get a complete picture of the available resources. In this article, we present a detailed survey of 22 such approaches focusing on their underlying statistical models, primary features, key advantages and major limitations. Importantly, the intrinsic drawbacks of the approaches pointed out in this survey could potentially be addressed by future research.


Assuntos
Metilação de DNA , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Biologia Computacional/métodos , Ilhas de CpG , Epigênese Genética , Sequenciamento de Nucleotídeos em Larga Escala/estatística & dados numéricos , Humanos , Modelos Logísticos , Cadeias de Markov , Análise de Sequência de DNA/estatística & dados numéricos , Sulfitos
12.
Mol Biol Evol ; 35(1): 242-246, 2018 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-29029199

RESUMO

Phylogenetics has seen a steady increase in data set size and substitution model complexity, which require increasing amounts of computational power to compute likelihoods. This motivates strategies to approximate the likelihood functions for branch length optimization and Bayesian sampling. In this article, we develop an approximation to the 1D likelihood function as parametrized by a single branch length. Our method uses a four-parameter surrogate function abstracted from the simplest phylogenetic likelihood function, the binary symmetric model. We show that it offers a surrogate that can be fit over a variety of branch lengths, that it is applicable to a wide variety of models and trees, and that it can be used effectively as a proposal mechanism for Bayesian sampling. The method is implemented as a stand-alone open-source C library for calling from phylogenetics algorithms; it has proven essential for good performance of our online phylogenetic algorithm sts.


Assuntos
Funções Verossimilhança , Filogenia , Análise de Sequência de DNA/métodos , Algoritmos , Teorema de Bayes , Evolução Molecular , Cadeias de Markov , Modelos Genéticos , Método de Monte Carlo , Análise de Sequência de DNA/estatística & dados numéricos
14.
Expert Rev Mol Diagn ; 17(6): 549-555, 2017 06.
Artigo em Inglês | MEDLINE | ID: mdl-28402162

RESUMO

INTRODUCTION: Comprehensive cancer genomic profiling provides the opportunity to expose the various molecular aberrations potentially driving tumor progression. Consequently, the identity of these genetic drivers can be utilized to match a patient to the most appropriate targeted therapy, thereby increasing the probability of improved clinical outcome. Despite its capability of informing patient care, the adoption of comprehensive cancer genomic profiling in the clinic has not been widespread. The barriers surrounding its universal acceptance are attributed to both physician and patient perspectives. Areas covered: The following report discusses the various obstacles in place, including those related to clinical utility, education, insurance coverage, and clinical trials, which can deter physicians and patients from utilizing genomic profiling for therapeutic decision-making. Expert commentary: The authors review the recent growth and potential of clinical utility studies over the last two years, provide a suggestive framework for educational support, and comment on the use of social media to enhance clinical trial recruitment.


Assuntos
Biomarcadores Tumorais/genética , Testes Genéticos/estatística & dados numéricos , Genoma Humano , Conhecimentos, Atitudes e Prática em Saúde , Neoplasias/genética , Medicina de Precisão/estatística & dados numéricos , Biomarcadores Tumorais/normas , Custos e Análise de Custo , Testes Genéticos/economia , Humanos , Neoplasias/diagnóstico , Medicina de Precisão/economia , Medicina de Precisão/psicologia , Análise de Sequência de DNA/economia , Análise de Sequência de DNA/estatística & dados numéricos
15.
Mol Biol Evol ; 34(8): 2065-2084, 2017 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-28402447

RESUMO

Genetic sequences from pathogens can provide information about infectious disease dynamics that may supplement or replace information from other epidemiological observations. Most currently available methods first estimate phylogenetic trees from sequence data, then estimate a transmission model conditional on these phylogenies. Outside limited classes of models, existing methods are unable to enforce logical consistency between the model of transmission and that underlying the phylogenetic reconstruction. Such conflicts in assumptions can lead to bias in the resulting inferences. Here, we develop a general, statistically efficient, plug-and-play method to jointly estimate both disease transmission and phylogeny using genetic data and, if desired, other epidemiological observations. This method explicitly connects the model of transmission and the model of phylogeny so as to avoid the aforementioned inconsistency. We demonstrate the feasibility of our approach through simulation and apply it to estimate stage-specific infectiousness in a subepidemic of human immunodeficiency virus in Detroit, Michigan. In a supplement, we prove that our approach is a valid sequential Monte Carlo algorithm. While we focus on how these methods may be applied to population-level models of infectious disease, their scope is more general. These methods may be applied in other biological systems where one seeks to infer population dynamics from genetic sequences, and they may also find application for evolutionary models with phenotypic rather than genotypic data.


Assuntos
Transmissão de Doença Infecciosa/classificação , Análise de Sequência de DNA/métodos , Algoritmos , Evolução Biológica , Transmissão de Doença Infecciosa/estatística & dados numéricos , Evolução Molecular , Humanos , Método de Monte Carlo , Filogenia , Análise de Sequência de DNA/estatística & dados numéricos
17.
Genome Med ; 8(1): 82, 2016 08 08.
Artigo em Inglês | MEDLINE | ID: mdl-27503473

RESUMO

BACKGROUND: Reproducibility is receiving increased attention across many domains of science and genomics is no exception. Efforts to identify copy number variations (CNVs) from exome sequence (ES) data have been increasing. Many algorithms have been published to discover CNVs from exomes and a major challenge is the reproducibility in other datasets. Here we test exome CNV calling reproducibility under three conditions: data generated by different sequencing centers; varying sample sizes; and varying capture methodology. METHODS: Four CNV tools were tested: eXome Hidden Markov Model (XHMM), Copy Number Inference From Exome Reads (CoNIFER), EXCAVATOR, and Copy Number Analysis for Targeted Resequencing (CONTRA). To examine the reproducibility, we ran the callers on four datasets, varying sample sizes of N = 10, 30, 75, 100, 300, and data with different capture methodology. We examined the false negative (FN) calls and false positive (FP) calls for potential limitations of the CNV callers. The positive predictive value (PPV) was measured by checking the CNV call concordance against single nucleotide polymorphism array. RESULTS: Using independently generated datasets, we examined the PPV for each dataset and observed wide range of PPVs. The PPV values were highly data dependent (p <0.001). For the sample sizes and capture method analyses, we tested the callers in triplicates. Both analyses resulted in wide ranges of PPVs, even for the same test. Interestingly, negative correlations between the PPV and the sample sizes were observed for CoNIFER (ρ = -0.80). Further examination of FN calls showed that 44 % of these were missed by all callers and were attributed to the CNV size (46 % spanned ≤3 exons). Overlap of the FP calls showed that FPs were unique to each caller, indicative of algorithm dependency. CONCLUSIONS: Our results demonstrate that further improvements in CNV callers are necessary to improve reproducibility and to include wider spectrum of CNVs (including the small CNVs). These CNV callers should be evaluated on multiple independent, heterogeneously generated datasets of varying size to increase robustness and utility. These approaches to the evaluation of exome CNV are essential to support wide utility and applicability of CNV discovery in exome studies.


Assuntos
Algoritmos , Variações do Número de Cópias de DNA , Exoma , Análise de Sequência de DNA/estatística & dados numéricos , Conjuntos de Dados como Assunto , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Cadeias de Markov , Polimorfismo de Nucleotídeo Único , Reprodutibilidade dos Testes , Tamanho da Amostra
18.
J Bioinform Comput Biol ; 13(2): 1550004, 2015 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-25491390

RESUMO

To apply digital signal processing (DSP) methods to analyze DNA sequences, the sequences first must be specially mapped into numerical sequences. Thus, effective numerical mappings of DNA sequences play key roles in the effectiveness of DSP-based methods such as exon prediction. Despite numerous mappings of symbolic DNA sequences to numerical series, the existing mapping methods do not include the genetic coding features of DNA sequences. We present a novel numerical representation of DNA sequences using genetic codon context (GCC) in which the numerical values are optimized by simulation annealing to maximize the 3-periodicity signal to noise ratio (SNR). The optimized GCC representation is then applied in exon and intron prediction by Short-Time Fourier Transform (STFT) approach. The results show the GCC method enhances the SNR values of exon sequences and thus increases the accuracy of predicting protein coding regions in genomes compared with the commonly used 4D binary representation. In addition, this study offers a novel way to reveal specific features of DNA sequences by optimizing numerical mappings of symbolic DNA sequences.


Assuntos
Códon/genética , DNA/genética , Análise de Sequência de DNA/estatística & dados numéricos , Algoritmos , Animais , Sequência de Bases , Biologia Computacional , Bases de Dados de Ácidos Nucleicos/estatística & dados numéricos , Éxons , Análise de Fourier , Código Genético , Humanos , Íntrons , Método de Monte Carlo , Fases de Leitura Aberta , Processamento de Sinais Assistido por Computador , Razão Sinal-Ruído
19.
Comput Biol Chem ; 53 Pt A: 15-25, 2014 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-25257406

RESUMO

We examine the relationship between exponential correlation functions and Markov models in a bacterial genome in detail. Despite the well known fact that Markov models generate sequences with correlation function that decays exponentially, simply constructed Markov models based on nearest-neighbor dimer (first-order), trimer (second-order), up to hexamer (fifth-order), and treating the DNA sequence as being homogeneous all fail to predict the value of exponential decay rate. Even reading-frame-specific Markov models (both first- and fifth-order) could not explain the fact that the exponential decay is very slow. Starting with the in-phase coding-DNA-sequence (CDS), we investigated correlation within a fixed-codon-position subsequence, and in artificially constructed sequences by packing CDSs with out-of-phase spacers, as well as altering CDS length distribution by imposing an upper limit. From these targeted analyses, we conclude that the correlation in the bacterial genomic sequence is mainly due to a mixing of heterogeneous statistics at different codon positions, and the decay of correlation is due to the possible out-of-phase between neighboring CDSs. There are also small contributions to the correlation from bases at the same codon position, as well as by non-coding sequences. These show that the seemingly simple exponential correlation functions in bacterial genome hide a complexity in correlation structure which is not suitable for a modeling by Markov chain in a homogeneous sequence. Other results include: use of the (absolute value) second largest eigenvalue to represent the 16 correlation functions and the prediction of a 10-11 base periodicity from the hexamer frequencies.


Assuntos
DNA Bacteriano/genética , Escherichia coli Enteropatogênica/genética , Genoma Bacteriano , Mycobacterium tuberculosis/genética , Análise de Sequência de DNA/estatística & dados numéricos , Códon , Cadeias de Markov , Fases de Leitura Aberta
20.
Comput Biol Chem ; 53 Pt A: 32-42, 2014 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-25205032

RESUMO

This study exploits the use of Markov chain order estimation from symbol sequences of systems exhibiting long memory or long range correlations (LRC), such as DNA sequences. In the presence of limited sequence length, LRC chain can be approximated by a high order Markov chain. For the order estimation, the parametric significance test of conditional mutual information IC(m) is applied, found in an earlier work to be suitable for high order estimation. Here, it is computationally optimized applying an iterative algorithm for calculating IC(m) at increasing order m, enabling the analysis of long symbol sequences of high Markov chain order or LRC. The simulation study shows that when the true order is reasonably small, the estimated order saturates at the true order with the increase of the symbol sequence length, while when the true order is very large or the chain has LRC, the estimated order increases logarithmically with the symbol sequence length. The order estimation shows a different dependence on the DNA sequence length for bacteria, the plant Arabidopsis thaliana and the human chromosome, indicating a different long memory structure in their DNA.


Assuntos
Arabidopsis/genética , Bacillus subtilis/genética , Genoma , Haemophilus influenzae/genética , Mycoplasma pneumoniae/genética , Análise de Sequência de DNA/estatística & dados numéricos , Algoritmos , Mapeamento Cromossômico/estatística & dados numéricos , Simulação por Computador , DNA/genética , Humanos , Cadeias de Markov , Especificidade da Espécie
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA