Búsqueda | Portal de Búsqueda de la BVS Enfermería

1.

dnAQET: a framework to compute a consolidated metric for benchmarking quality of de novo assemblies.

Yavas, Gokhan; Hong, Huixiao; Xiao, Wenming.

BMC Genomics ; 20(1): 706, 2019 Sep 11.

Artículo en Inglés | MEDLINE | ID: mdl-31510940

RESUMEN

BACKGROUND: Accurate de novo genome assembly has become reality with the advancements in sequencing technology. With the ever-increasing number of de novo genome assembly tools, assessing the quality of assemblies has become of great importance in genome research. Although many quality metrics have been proposed and software tools for calculating those metrics have been developed, the existing tools do not produce a unified measure to reflect the overall quality of an assembly. RESULTS: To address this issue, we developed the de novo Assembly Quality Evaluation Tool (dnAQET) that generates a unified metric for benchmarking the quality assessment of assemblies. Our framework first calculates individual quality scores for the scaffolds/contigs of an assembly by aligning them to a reference genome. Next, it computes a quality score for the assembly using its overall reference genome coverage, the quality score distribution of its scaffolds and the redundancy identified in it. Using synthetic assemblies randomly generated from the latest human genome build, various builds of the reference genomes for five organisms and six de novo assemblies for sample NA24385, we tested dnAQET to assess its capability for benchmarking quality evaluation of genome assemblies. For synthetic data, our quality score increased with decreasing number of misassemblies and redundancy and increasing average contig length and coverage, as expected. For genome builds, dnAQET quality score calculated for a more recent reference genome was better than the score for an older version. To compare with some of the most frequently used measures, 13 other quality measures were calculated. The quality score from dnAQET was found to be better than all other measures in terms of consistency with the known quality of the reference genomes, indicating that dnAQET is reliable for benchmarking quality assessment of de novo genome assemblies. CONCLUSIONS: The dnAQET is a scalable framework designed to evaluate a de novo genome assembly based on the aggregated quality of its scaffolds (or contigs). Our results demonstrated that dnAQET quality score is reliable for benchmarking quality assessment of genome assemblies. The dnQAET can help researchers to identify the most suitable assembly tools and to select high quality assemblies generated.

Asunto(s)

Genómica/métodos , Benchmarking , Mapeo Contig , Programas Informáticos

2.

DB2: a probabilistic approach for accurate detection of tandem duplication breakpoints using paired-end reads.

Yavas, Gökhan; Koyutürk, Mehmet; Gould, Meetha P; McMahon, Sarah; LaFramboise, Thomas.

BMC Genomics ; 15: 175, 2014 Mar 05.

Artículo en Inglés | MEDLINE | ID: mdl-24597945

RESUMEN

BACKGROUND: With the advent of paired-end high throughput sequencing, it is now possible to identify various types of structural variation on a genome-wide scale. Although many methods have been proposed for structural variation detection, most do not provide precise boundaries for identified variants. In this paper, we propose a new method, Distribution Based detection of Duplication Boundaries (DB2), for accurate detection of tandem duplication breakpoints, an important class of structural variation, with high precision and recall. RESULTS: Our computational experiments on simulated data show that DB2 outperforms state-of-the-art methods in terms of finding breakpoints of tandem duplications, with a higher positive predictive value (precision) in calling the duplications' presence. In particular, DB2's prediction of tandem duplications is correct 99% of the time even for very noisy data, while narrowing down the space of possible breakpoints within a margin of 15 to 20 bps on the average. Most of the existing methods provide boundaries in ranges that extend to hundreds of bases with lower precision values. Our method is also highly robust to varying properties of the sequencing library and to the sizes of the tandem duplications, as shown by its stable precision, recall and mean boundary mismatch performance. We demonstrate our method's efficacy using both simulated paired-end reads, and those generated from a melanoma sample and two ovarian cancer samples. Newly discovered tandem duplications are validated using PCR and Sanger sequencing. CONCLUSIONS: Our method, DB2, uses discordantly aligned reads, taking into account the distribution of fragment length to predict tandem duplications along with their breakpoints on a donor genome. The proposed method fine tunes the breakpoint calls by applying a novel probabilistic framework that incorporates the empirical fragment length distribution to score each feasible breakpoint. DB2 is implemented in Java programming language and is freely available at http://mendel.gene.cwru.edu/laframboiselab/software.php.

Asunto(s)

Algoritmos , Roturas del ADN , Duplicación de Gen , Genoma , Disparidad de Par Base , Femenino , Biblioteca de Genes , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Neoplasias Ováricas/genética , Neoplasias Ováricas/patología , Análisis de Secuencia de ADN , Programas Informáticos , Tirosina Quinasa 3 Similar a fms/genética

3.

Machine Learning Models for Predicting Liver Toxicity.

Liu, Jie; Guo, Wenjing; Sakkiah, Sugunadevi; Ji, Zuowei; Yavas, Gokhan; Zou, Wen; Chen, Minjun; Tong, Weida; Patterson, Tucker A; Hong, Huixiao.

Methods Mol Biol ; 2425: 393-415, 2022.

Artículo en Inglés | MEDLINE | ID: mdl-35188640

RESUMEN

Liver toxicity is a major adverse drug reaction that accounts for drug failure in clinical trials and withdrawal from the market. Therefore, predicting potential liver toxicity at an early stage in drug discovery is crucial to reduce costs and the potential for drug failure. However, current in vivo animal toxicity testing is very expensive and time consuming. As an alternative approach, various machine learning models have been developed to predict potential liver toxicity in humans. This chapter reviews current advances in the development and application of machine learning models for prediction of potential liver toxicity in humans and discusses possible improvements to liver toxicity prediction.

Asunto(s)

Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos , Hepatitis , Animales , Descubrimiento de Drogas , Humanos , Aprendizaje Automático

4.

Assessing reproducibility of inherited variants detected with short-read whole genome sequencing.

Pan, Bohu; Ren, Luyao; Onuchic, Vitor; Guan, Meijian; Kusko, Rebecca; Bruinsma, Steve; Trigg, Len; Scherer, Andreas; Ning, Baitang; Zhang, Chaoyang; Glidewell-Kenney, Christine; Xiao, Chunlin; Donaldson, Eric; Sedlazeck, Fritz J; Schroth, Gary; Yavas, Gokhan; Grunenwald, Haiying; Chen, Haodong; Meinholz, Heather; Meehan, Joe; Wang, Jing; Yang, Jingcheng; Foox, Jonathan; Shang, Jun; Miclaus, Kelci; Dong, Lianhua; Shi, Leming; Mohiyuddin, Marghoob; Pirooznia, Mehdi; Gong, Ping; Golshani, Rooz; Wolfinger, Russ; Lababidi, Samir; Sahraeian, Sayed Mohammad Ebrahim; Sherry, Steve; Han, Tao; Chen, Tao; Shi, Tieliu; Hou, Wanwan; Ge, Weigong; Zou, Wen; Guo, Wenjing; Bao, Wenjun; Xiao, Wenzhong; Fan, Xiaohui; Gondo, Yoichi; Yu, Ying; Zhao, Yongmei; Su, Zhenqiang; Liu, Zhichao.

Genome Biol ; 23(1): 2, 2022 01 03.

Artículo en Inglés | MEDLINE | ID: mdl-34980216

RESUMEN

BACKGROUND: Reproducible detection of inherited variants with whole genome sequencing (WGS) is vital for the implementation of precision medicine and is a complicated process in which each step affects variant call quality. Systematically assessing reproducibility of inherited variants with WGS and impact of each step in the process is needed for understanding and improving quality of inherited variants from WGS. RESULTS: To dissect the impact of factors involved in detection of inherited variants with WGS, we sequence triplicates of eight DNA samples representing two populations on three short-read sequencing platforms using three library kits in six labs and call variants with 56 combinations of aligners and callers. We find that bioinformatics pipelines (callers and aligners) have a larger impact on variant reproducibility than WGS platform or library preparation. Single-nucleotide variants (SNVs), particularly outside difficult-to-map regions, are more reproducible than small insertions and deletions (indels), which are least reproducible when > 5 bp. Increasing sequencing coverage improves indel reproducibility but has limited impact on SNVs above 30×. CONCLUSIONS: Our findings highlight sources of variability in variant detection and the need for improvement of bioinformatics pipelines in the era of precision medicine with WGS.

Asunto(s)

Genoma Humano , Polimorfismo de Nucleótido Simple , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Mutación INDEL , Reproducibilidad de los Resultados , Secuenciación Completa del Genoma

5.

Informing selection of drugs for COVID-19 treatment through adverse events analysis.

Guo, Wenjing; Pan, Bohu; Sakkiah, Sugunadevi; Ji, Zuowei; Yavas, Gokhan; Lu, Yanhui; Komatsu, Takashi E; Lal-Nag, Madhu; Tong, Weida; Patterson, Tucker A; Hong, Huixiao.

Sci Rep ; 11(1): 14022, 2021 07 07.

Artículo en Inglés | MEDLINE | ID: mdl-34234253

RESUMEN

Coronavirus disease 2019 (COVID-19) is an ongoing pandemic and there is an urgent need for safe and effective drugs for COVID-19 treatment. Since developing a new drug is time consuming, many approved or investigational drugs have been repurposed for COVID-19 treatment in clinical trials. Therefore, selection of safe drugs for COVID-19 patients is vital for combating this pandemic. Our goal was to evaluate the safety concerns of drugs by analyzing adverse events reported in post-market surveillance. We collected 296 drugs that have been evaluated in clinical trials for COVID-19 and identified 28,597,464 associated adverse events at the system organ classes (SOCs) level in the FDA adverse events report systems (FAERS). We calculated Z-scores of SOCs that statistically quantify the relative frequency of adverse events of drugs in FAERS to quantitatively measure safety concerns for the drugs. Analyzing the Z-scores revealed that these drugs are associated with different significantly frequent adverse events. Our results suggest that this safety concern metric may serve as a tool to inform selection of drugs with favorable safety profiles for COVID-19 patients in clinical practices. Caution is advised when administering drugs with high Z-scores to patients who are vulnerable to associated adverse events.

Asunto(s)

Sistemas de Registro de Reacción Adversa a Medicamentos , Tratamiento Farmacológico de COVID-19 , Ensayos Clínicos como Asunto , Bases de Datos Factuales , Humanos , Vigilancia de Productos Comercializados , Seguridad

6.

Elucidating Interactions Between SARS-CoV-2 Trimeric Spike Protein and ACE2 Using Homology Modeling and Molecular Dynamics Simulations.

Sakkiah, Sugunadevi; Guo, Wenjing; Pan, Bohu; Ji, Zuowei; Yavas, Gokhan; Azevedo, Marli; Hawes, Jessica; Patterson, Tucker A; Hong, Huixiao.

Front Chem ; 8: 622632, 2020.

Artículo en Inglés | MEDLINE | ID: mdl-33469527

RESUMEN

Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2) causes coronavirus disease 2019 (COVID-19). As of October 21, 2020, more than 41.4 million confirmed cases and 1.1 million deaths have been reported. Thus, it is immensely important to develop drugs and vaccines to combat COVID-19. The spike protein present on the outer surface of the virion plays a major role in viral infection by binding to receptor proteins present on the outer membrane of host cells, triggering membrane fusion and internalization, which enables release of viral ssRNA into the host cell. Understanding the interactions between the SARS-CoV-2 trimeric spike protein and its host cell receptor protein, angiotensin converting enzyme 2 (ACE2), is important for developing drugs and vaccines to prevent and treat COVID-19. Several crystal structures of partial and mutant SARS-CoV-2 spike proteins have been reported; however, an atomistic structure of the wild-type SARS-CoV-2 trimeric spike protein complexed with ACE2 is not yet available. Therefore, in our study, homology modeling was used to build the trimeric form of the spike protein complexed with human ACE2, followed by all-atom molecular dynamics simulations to elucidate interactions at the interface between the spike protein and ACE2. Molecular Mechanics Poisson-Boltzmann Surface Area (MMPBSA) and in silico alanine scanning were employed to characterize the interacting residues at the interface. Twenty interacting residues in the spike protein were identified that are likely to be responsible for tightly binding to ACE2, of which five residues (Val445, Thr478, Gly485, Phe490, and Ser494) were not reported in the crystal structure of the truncated spike protein receptor binding domain (RBD) complexed with ACE2. These data indicate that the interactions between ACE2 and the tertiary structure of the full-length spike protein trimer are different from those between ACE2 and the truncated monomer of the spike protein RBD. These findings could facilitate the development of drugs and vaccines to prevent SARS-CoV-2 infection and combat COVID-19.

7.

PathCase: pathways database system.

Elliott, Brendan; Kirac, Mustafa; Cakmak, Ali; Yavas, Gokhan; Mayes, Stephen; Cheng, En; Wang, Yuan; Gupta, Chirag; Ozsoyoglu, Gultekin; Meral Ozsoyoglu, Zehra.

Bioinformatics ; 24(21): 2526-33, 2008 Nov 01.

Artículo en Inglés | MEDLINE | ID: mdl-18728044

RESUMEN

MOTIVATION: As the blueprints of cellular actions, biological pathways characterize the roles of genomic entities in various cellular mechanisms, and as such, their availability, manipulation and queriability over the web is important to facilitate ongoing biological research. RESULTS: In this article, we present the new features of PathCase, a system to store, query, visualize and analyze metabolic pathways at different levels of genetic, molecular, biochemical and organismal detail. The new features include: (i) a web-based system with a new architecture, containing a server-side and a client-side, and promoting scalability, and flexible and easy adaptation of different pathway databases, (ii) an interactive client-side visualization tool for metabolic pathways, with powerful visualization capabilities, and with integrated gene and organism viewers, (iii) two distinct querying capabilities: an advanced querying interface for computer savvy users, and built-in queries for ease of use, that can be issued directly from pathway visualizations and (iv) a pathway functionality analysis tool. PathCase is now available for three different datasets, namely, KEGG pathways data, sample pathways from the literature and BioCyc pathways for humans. AVAILABILITY: Available online at http://nashua.case.edu/pathways

Asunto(s)

Bases de Datos Factuales , Redes y Vías Metabólicas , Programas Informáticos , Simulación por Computador , Interfaz Usuario-Computador

8.

Persistent Organic Pollutants in Food: Contamination Sources, Health Effects and Detection Methods.

Guo, Wenjing; Pan, Bohu; Sakkiah, Sugunadevi; Yavas, Gokhan; Ge, Weigong; Zou, Wen; Tong, Weida; Hong, Huixiao.

Int J Environ Res Public Health ; 16(22)2019 11 08.

Artículo en Inglés | MEDLINE | ID: mdl-31717330

RESUMEN

Persistent organic pollutants (POPs) present in foods have been a major concern for food safety due to their persistence and toxic effects. To ensure food safety and protect human health from POPs, it is critical to achieve a better understanding of POP pathways into food and develop strategies to reduce human exposure. POPs could present in food in the raw stages, transferred from the environment or artificially introduced during food preparation steps. Exposure to these pollutants may cause various health problems such as endocrine disruption, cardiovascular diseases, cancers, diabetes, birth defects, and dysfunctional immune and reproductive systems. This review describes potential sources of POP food contamination, analytical approaches to measure POP levels in food and efforts to control food contamination with POPs.

Asunto(s)

Monitoreo del Ambiente , Contaminantes Ambientales/química , Contaminación de Alimentos/análisis , Inocuidad de los Alimentos , Humanos

9.

Direct comparison of performance of single nucleotide variant calling in human genome with alignment-based and assembly-based approaches.

Wu, Leihong; Yavas, Gokhan; Hong, Huixiao; Tong, Weida; Xiao, Wenming.

Sci Rep ; 7(1): 10963, 2017 09 08.

Artículo en Inglés | MEDLINE | ID: mdl-28887485

RESUMEN

Complementary to reference-based variant detection, recent studies revealed that many novel variants could be detected with de novo assembled genomes. To evaluate the effect of reads coverage and the accuracy of assembly-based variant calling, we simulated short reads containing more than 3 million of single nucleotide variants (SNVs) from the whole human genome and compared the efficiency of SNV calling between the assembly-based and alignment-based calling approaches. We assessed the quality of the assembled contig and found that a minimum of 30X coverage of short reads was needed to ensure reliable SNV calling and to generate assembled contigs with a good coverage of genome and genes. In addition, we observed that the assembly-based approach had a much lower recall rate and precision comparing to the alignment-based approach that would recover 99% of imputed SNVs. We observed similar results with experimental reads for NA24385, an individual whose germline variants were well characterized. Although there are additional values for SNVs detection, the assembly-based approach would have great risk of false discovery of novel SNVs. Further improvement of de novo assembly algorithms are needed in order to warrant a good completeness of genome with haplotype resolved and high fidelity of assembled sequences.

Asunto(s)

Mapeo Contig/métodos , Estudio de Asociación del Genoma Completo/métodos , Polimorfismo de Nucleótido Simple , Alineación de Secuencia/métodos , Análisis de Secuencia de ADN/métodos , Algoritmos , Mapeo Contig/normas , Estudio de Asociación del Genoma Completo/normas , Humanos , Alineación de Secuencia/normas , Análisis de Secuencia de ADN/normas

10.

Challenges, Solutions, and Quality Metrics of Personal Genome Assembly in Advancing Precision Medicine.

Xiao, Wenming; Wu, Leihong; Yavas, Gokhan; Simonyan, Vahan; Ning, Baitang; Hong, Huixiao.

Pharmaceutics ; 8(2)2016 Apr 22.

Artículo en Inglés | MEDLINE | ID: mdl-27110816

RESUMEN

Even though each of us shares more than 99% of the DNA sequences in our genome, there are millions of sequence codes or structure in small regions that differ between individuals, giving us different characteristics of appearance or responsiveness to medical treatments. Currently, genetic variants in diseased tissues, such as tumors, are uncovered by exploring the differences between the reference genome and the sequences detected in the diseased tissue. However, the public reference genome was derived with the DNA from multiple individuals. As a result of this, the reference genome is incomplete and may misrepresent the sequence variants of the general population. The more reliable solution is to compare sequences of diseased tissue with its own genome sequence derived from tissue in a normal state. As the price to sequence the human genome has dropped dramatically to around $1000, it shows a promising future of documenting the personal genome for every individual. However, de novo assembly of individual genomes at an affordable cost is still challenging. Thus, till now, only a few human genomes have been fully assembled. In this review, we introduce the history of human genome sequencing and the evolution of sequencing platforms, from Sanger sequencing to emerging "third generation sequencing" technologies. We present the currently available de novo assembly and post-assembly software packages for human genome assembly and their requirements for computational infrastructures. We recommend that a combined hybrid assembly with long and short reads would be a promising way to generate good quality human genome assemblies and specify parameters for the quality assessment of assembly outcomes. We provide a perspective view of the benefit of using personal genomes as references and suggestions for obtaining a quality personal genome. Finally, we discuss the usage of the personal genome in aiding vaccine design and development, monitoring host immune-response, tailoring drug therapy and detecting tumors. We believe the precision medicine would largely benefit from bioinformatics solutions, particularly for personal genome assembly.

11.

Corrigendum for Elliott,B. et al., 'PathCase pathways database system', Bioinformatics, Nov. 2008, 24(21), pp. 2526-2533.

Elliott, Brendan; Kirac, Mustafa; Cakmak, Ali; Yavas, Gokhan; Mayes, Stephen; Cheng, En; Wang, Yuan; Gupta, Chirag; Ozsoyoglu, Gultekin; Ozsoyoglu, Zehra Meral.

Bioinformatics ; 25(20): 2773, 2009 Oct 15.

Artículo en Inglés | MEDLINE | ID: mdl-19713414

Asunto(s)

Biología Computacional/métodos , Bases de Datos Factuales , Redes y Vías Metabólicas , Programas Informáticos

12.

COKGEN: a software for the identification of rare copy number variation from SNP microarrays.

Yavas, Gökhan; Koyutürk, Mehmet; Ozsoyoglu, Meral; Gould, Meetha P; Laframboise, Thomas.

Pac Symp Biocomput ; : 371-82, 2010.

Artículo en Inglés | MEDLINE | ID: mdl-19908389

RESUMEN

Until fairly recently, it was believed that essentially all human cells harbor two copies of each locus in the autosomal genome. However, studies have now shown that there are segments of the genome that are polymorphic with regard to genomic copy number. These copy number variations (CNVs) have a role in various diseases such as Alzheimer disease, Crohn's disease, autism and schizophrenia. In the effort to scan the entire genome for these gains and losses of DNA, single nucleotide polymorphism (SNP) arrays have emerged as an important tool. As such, CNV identification from SNP array data is attracting considerable attention as an algorithmic problem, and many methods have been published over the last few years. However, many of the existing model-based methods train their models based on common variations and are therefore less successful in the identification of rare CNVs, detection of which may be very important in personalized genomics applications. In this paper, we formulate CNV identification explicitly as an optimization problem with an objective function that is characterized by several adjustable parameters. These parameters can be configured based on the characteristics of the experimental platform and target application, so that the solution to the optimization problem is the most accurate set of CNV calls. Our method, termed COKGEN, efficiently solves this problem using a variant of the well-known heuristic simulated annealing. We apply COKGEN to data from hundreds of samples, and demonstrate its ability to detect known CNVs at a high level of sensitivity without sacrificing specificity, not only for common but also rare CNVs. Furthermore, we show that it performs better than other publicly-available methods. The configurability of COKGEN, its computational efficiency, and its accuracy in calling rare CNVs make it particularly useful for personalized genomics applications. COKGEN is implemented as an R package and is freely available at http://mendel.gene.cwru.edu/laframboiselab/software.php.

Asunto(s)

Variaciones en el Número de Copia de ADN , Análisis de Secuencia por Matrices de Oligonucleótidos/estadística & datos numéricos , Polimorfismo de Nucleótido Simple , Programas Informáticos , Algoritmos , Biología Computacional , Marcadores Genéticos , Genoma Humano , Genómica , Humanos

13.

An optimization framework for unsupervised identification of rare copy number variation from SNP array data.

Yavas, Gökhan; Koyutürk, Mehmet; Ozsoyoglu, Meral; Gould, Meetha P; LaFramboise, Thomas.

Genome Biol ; 10(10): R119, 2009.

Artículo en Inglés | MEDLINE | ID: mdl-19849861

RESUMEN

Copy number variants (CNVs) have roles in human disease, and DNA microarrays are important tools for identifying them. In this paper, we frame CNV identification as an objective function optimization problem. We apply our method to data from hundreds of samples, and demonstrate its ability to detect CNVs at a high level of sensitivity without sacrificing specificity. Its performance compares favorably with currently available methods and it reveals previously unreported gains and losses.

Asunto(s)

Variaciones en el Número de Copia de ADN/genética , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Polimorfismo de Nucleótido Simple/genética , Algoritmos , Cromosomas Humanos Par 12/genética , Etnicidad/genética , Humanos , Reproducibilidad de los Resultados , Programas Informáticos , Factores de Tiempo

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA