Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 36.690
Filtrar
2.
Nat Commun ; 11(1): 4469, 2020 09 08.
Artigo em Inglês | MEDLINE | ID: mdl-32901013

RESUMO

Dissecting tumor heterogeneity is a key to understanding the complex mechanisms underlying drug resistance in cancers. The rich literature of pioneering studies on tumor heterogeneity analysis spurred a recent community-wide benchmark study that compares diverse modeling algorithms. Here we present FastClone, a top-performing algorithm in accuracy in this benchmark. FastClone improves over existing methods by allowing the deconvolution of subclones that have independent copy number variation events within the same chromosome regions. We characterize the behavior of FastClone in identifying subclones using stage III colon cancer primary tumor samples as well as simulated data. It achieves approximately 100-fold acceleration in computation for both simulated and patient data. The efficacy of FastClone will allow its application to large-scale data and clinical data, and facilitate personalized medicine in cancers.


Assuntos
Algoritmos , Variações do Número de Cópias de DNA , Neoplasias/genética , Neoplasias do Colo/genética , Neoplasias do Colo/patologia , Biologia Computacional/métodos , Simulação por Computador , DNA de Neoplasias/genética , Resistencia a Medicamentos Antineoplásicos/genética , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Modelos Genéticos , Neoplasias/tratamento farmacológico , Neoplasias/patologia , Filogenia , Medicina de Precisão , Análise de Sequência de DNA
3.
BMC Bioinformatics ; 21(1): 401, 2020 Sep 10.
Artigo em Inglês | MEDLINE | ID: mdl-32912137

RESUMO

BACKGROUND: As an important non-coding RNA, microRNA (miRNA) plays a significant role in a series of life processes and is closely associated with a variety of Human diseases. Hence, identification of potential miRNA-disease associations can make great contributions to the research and treatment of Human diseases. However, to our knowledge, many existing computational methods only utilize the single type of known association information between miRNAs and diseases to predict their potential associations, without focusing on their interactions or associations with other types of molecules. RESULTS: In this paper, we propose a network embedding-based method for predicting miRNA-disease associations by preserving behavior and attribute information. Firstly, a heterogeneous network is constructed by integrating known associations among miRNA, protein and disease, and the network representation method Learning Graph Representations with Global Structural Information (GraRep) is implemented to learn the behavior information of miRNAs and diseases in the network. Then, the behavior information of miRNAs and diseases is combined with the attribute information of them to represent miRNA-disease association pairs. Finally, the prediction model is established based on the Random Forest algorithm. Under the five-fold cross validation, the proposed NEMPD model obtained average 85.41% prediction accuracy with 80.96% sensitivity at the AUC of 91.58%. Furthermore, the performance of NEMPD is also validated by the case studies. Among the top 50 predicted disease-related miRNAs, 48 (breast neoplasms), 47 (colon neoplasms), 47 (lung neoplasms) were confirmed by two other databases. CONCLUSIONS: The proposed NEMPD model has a good performance in predicting the potential associations between miRNAs and diseases, and has great potency in the field of miRNA-disease association prediction in the future.


Assuntos
Neoplasias da Mama/diagnóstico , Neoplasias do Colo/diagnóstico , Biologia Computacional/métodos , Neoplasias Pulmonares/diagnóstico , MicroRNAs/metabolismo , Algoritmos , Área Sob a Curva , Neoplasias da Mama/genética , Neoplasias do Colo/genética , Feminino , Humanos , Neoplasias Pulmonares/genética , MicroRNAs/genética , Curva ROC
4.
BMC Med Res Methodol ; 20(1): 235, 2020 09 21.
Artigo em Inglês | MEDLINE | ID: mdl-32958001

RESUMO

BACKGROUND: Data analysis and visualization is an essential tool for exploring and communicating findings in medical research, especially in epidemiological surveillance. RESULTS: Data on COVID-19 diagnosed cases and mortality, from January 1st, 2020, onwards is collected automatically from the European Centre for Disease Prevention and Control (ECDC). We have developed a Shiny application for data visualization and analysis of several indicators to follow the SARS-CoV-2 epidemic using ECDC data. A country-specific tool for basic epidemiological surveillance, in an interactive and user-friendly manner. The available analyses cover time trends and projections, attack rate, population fatality rate, case fatality rate, and basic reproduction number. CONCLUSIONS: The COVID19-World online web application systematically produces daily updated country-specific data visualization and analysis of the SARS-CoV-2 epidemic worldwide. The application may help for a better understanding of the SARS-CoV-2 epidemic worldwide.


Assuntos
Betacoronavirus/isolamento & purificação , Biologia Computacional/estatística & dados numéricos , Infecções por Coronavirus/epidemiologia , Visualização de Dados , Pandemias , Pneumonia Viral/epidemiologia , Algoritmos , Betacoronavirus/fisiologia , Biologia Computacional/métodos , Infecções por Coronavirus/transmissão , Infecções por Coronavirus/virologia , Europa (Continente)/epidemiologia , Saúde Global/estatística & dados numéricos , Humanos , Incidência , Internet , Pneumonia Viral/transmissão , Pneumonia Viral/virologia , Vigilância da População/métodos
5.
PLoS Comput Biol ; 16(9): e1007836, 2020 09.
Artigo em Inglês | MEDLINE | ID: mdl-32960900

RESUMO

Early warning signals (EWS) identify systems approaching a critical transition, where the system undergoes a sudden change in state. For example, monitoring changes in variance or autocorrelation offers a computationally inexpensive method which can be used in real-time to assess when an infectious disease transitions to elimination. EWS have a promising potential to not only be used to monitor infectious diseases, but also to inform control policies to aid disease elimination. Previously, potential EWS have been identified for prevalence data, however the prevalence of a disease is often not known directly. In this work we identify EWS for incidence data, the standard data type collected by the Centers for Disease Control and Prevention (CDC) or World Health Organization (WHO). We show, through several examples, that EWS calculated on simulated incidence time series data exhibit vastly different behaviours to those previously studied on prevalence data. In particular, the variance displays a decreasing trend on the approach to disease elimination, contrary to that expected from critical slowing down theory; this could lead to unreliable indicators of elimination when calculated on real-world data. We derive analytical predictions which can be generalised for many epidemiological systems, and we support our theory with simulated studies of disease incidence. Additionally, we explore EWS calculated on the rate of incidence over time, a property which can be extracted directly from incidence data. We find that although incidence might not exhibit typical critical slowing down properties before a critical transition, the rate of incidence does, presenting a promising new data type for the application of statistical indicators.


Assuntos
Doenças Transmissíveis/epidemiologia , Biologia Computacional/métodos , Modelos Estatísticos , Vigilância em Saúde Pública/métodos , Controle de Doenças Transmissíveis , Humanos , Incidência , Prevalência
6.
PLoS Comput Biol ; 16(9): e1008146, 2020 09.
Artigo em Inglês | MEDLINE | ID: mdl-32970679

RESUMO

According to the efficient coding hypothesis, sensory systems are adapted to maximize their ability to encode information about the environment. Sensory neurons play a key role in encoding by selectively modulating their firing rate for a subset of all possible stimuli. This pattern of modulation is often summarized via a tuning curve. The optimally efficient distribution of tuning curves has been calculated in variety of ways for one-dimensional (1-D) stimuli. However, many sensory neurons encode multiple stimulus dimensions simultaneously. It remains unclear how applicable existing models of 1-D tuning curves are for neurons tuned across multiple dimensions. We describe a mathematical generalization that builds on prior work in 1-D to predict optimally efficient multidimensional tuning curves. Our results have implications for interpreting observed properties of neuronal populations. For example, our results suggest that not all tuning curve attributes (such as gain and bandwidth) are equally useful for evaluating the encoding efficiency of a population.


Assuntos
Biologia Computacional/métodos , Modelos Neurológicos , Células Receptoras Sensoriais/fisiologia , Encéfalo/fisiologia , Humanos
7.
Sci Rep ; 10(1): 15917, 2020 09 28.
Artigo em Inglês | MEDLINE | ID: mdl-32985513

RESUMO

SARS-CoV-2 is the novel coronavirus responsible for the outbreak of COVID-19, a disease that has spread to over 100 countries and, as of the 26th July 2020, has infected over 16 million people. Despite the urgent need to find effective therapeutics, research on SARS-CoV-2 has been affected by a lack of suitable animal models. To facilitate the development of medical approaches and novel treatments, we compared the ACE2 receptor, and TMPRSS2 and Furin proteases usage of the SARS-CoV-2 Spike glycoprotein in human and in a panel of animal models, i.e. guinea pig, dog, cat, rat, rabbit, ferret, mouse, hamster and macaque. Here we showed that ACE2, but not TMPRSS2 or Furin, has a higher level of sequence variability in the Spike protein interaction surface, which greatly influences Spike protein binding mode. Using molecular docking simulations we compared the SARS-CoV and SARS-CoV-2 Spike proteins in complex with the ACE2 receptor and showed that the SARS-CoV-2 Spike glycoprotein is compatible to bind the human ACE2 with high specificity. In contrast, TMPRSS2 and Furin are sufficiently similar in the considered hosts not to drive susceptibility differences. Computational analysis of binding modes and protein contacts indicates that macaque, ferrets and hamster are the most suitable models for the study of inhibitory antibodies and small molecules targeting the SARS-CoV-2 Spike protein interaction with ACE2. Since TMPRSS2 and Furin are similar across species, our data also suggest that transgenic animal models expressing human ACE2, such as the hACE2 transgenic mouse, are also likely to be useful models for studies investigating viral entry.


Assuntos
Betacoronavirus/genética , Infecções por Coronavirus/veterinária , Pandemias/veterinária , Peptidil Dipeptidase A/metabolismo , Pneumonia Viral/veterinária , Glicoproteína da Espícula de Coronavírus/genética , Glicoproteína da Espícula de Coronavírus/metabolismo , Sequência de Aminoácidos/genética , Animais , Gatos , Biologia Computacional/métodos , Infecções por Coronavirus/patologia , Cricetinae , Modelos Animais de Doenças , Cães , Furões , Furina/genética , Furina/metabolismo , Cobaias , Humanos , Macaca fascicularis , Camundongos , Simulação de Acoplamento Molecular , Peptidil Dipeptidase A/genética , Pneumonia Viral/patologia , Coelhos , Ratos , Serina Endopeptidases/genética , Serina Endopeptidases/metabolismo
8.
Nat Commun ; 11(1): 4575, 2020 09 11.
Artigo em Inglês | MEDLINE | ID: mdl-32917868

RESUMO

A central issue in drug risk-benefit assessment is identifying frequencies of side effects in humans. Currently, frequencies are experimentally determined in randomised controlled clinical trials. We present a machine learning framework for computationally predicting frequencies of drug side effects. Our matrix decomposition algorithm learns latent signatures of drugs and side effects that are both reproducible and biologically interpretable. We show the usefulness of our approach on 759 structurally and therapeutically diverse drugs and 994 side effects from all human physiological systems. Our approach can be applied to any drug for which a small number of side effect frequencies have been identified, in order to predict the frequencies of further, yet unidentified, side effects. We show that our model is informative of the biology underlying drug activity: individual components of the drug signatures are related to the distinct anatomical categories of the drugs and to the specific drug routes of administration.


Assuntos
Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Aprendizado de Máquina , Algoritmos , Biologia Computacional/métodos , Bases de Dados de Produtos Farmacêuticos , Humanos , Preparações Farmacêuticas/administração & dosagem , Probabilidade
9.
Medicine (Baltimore) ; 99(37): e22199, 2020 Sep 11.
Artigo em Inglês | MEDLINE | ID: mdl-32925795

RESUMO

Colorectal cancer (CRC) is the most common malignant gastrointestinal tumor worldwide. Serum exosomal microRNAs (miRNAs) play a critical role in tumor progression and metastasis. However, the underlying molecular mechanisms are poorly understood.The miRNAs expression profile (GSE39833) was downloaded from Gene Expression Omnibus (GEO) database. GEO2R was applied to screen the differentially expressed miRNAs (DEmiRNAs) between healthy and CRC serum exosome samples. The target genes of DEmiRNAs were predicted by starBase v3.0 online tool. The gene ontology (GO) and Kyoto Encyclopedia of Genomes pathway (KEGG) enrichment analysis were performed using the Database for Annotation, Visualization and Integrated Discovery (DAVID) online tool. The protein-protein interaction (PPI) network was established by the Search Tool for the Retrieval of Interacting Genes (STRING) visualized using Cytoscape software. Molecular Complex Detection (MCODE) and cytohubba plug-in were used to screen hub genes and gene modules.In total, 102 DEmiRNAs were identified including 67 upregulated and 35 downregulated DEmiRNAs, and 1437 target genes were predicted. GO analysis showed target genes of upregulated DEmiRNAs were significantly enriched in transcription regulation, protein binding, and ubiquitin protein ligase activity. While the target genes of downregulated DEmiRNAs were mainly involved in transcription from RNA polymerase II promoter, SMAD binding, and DNA binding. The KEGG pathway enrichment analyses showed target genes of upregulated DEmiRNAs were significantly enriched in proteoglycans in cancer, microRNAs in cancer, and phosphatidylinositol-3 kinases/Akt (PI3K-Akt) signaling pathway, while target genes of downregulated DEmiRNAs were mainly enriched in transforming growth factor-beta (TGF-beta) signaling pathway and proteoglycans in cancer. The genes of the top 3 modules were mainly enriched in ubiquitin mediated proteolysis, spliceosome, and mRNA surveillance pathway. According to the cytohubba plugin, 37 hub genes were selected, and 4 hub genes including phosphoinositide-3-kinase regulatory subunit 1 (PIK3R1), SRC, cell division cycle 42 (CDC42), E1A binding protein p300 (EP300) were identified by combining 8 ranked methods of cytohubba.The study provides a comprehensive analysis of exosomal DEmiRNAs and target genes regulatory network in CRC, which can better understand the roles of exosomal miRNAs in the development of CRC. However, these findings require further experimental validation in future studies.


Assuntos
Neoplasias Colorretais/genética , Biologia Computacional/métodos , MicroRNAs/genética , Bases de Dados Genéticas , Perfilação da Expressão Gênica , Genes Neoplásicos , Humanos , Análise Serial de Proteínas , Mapas de Interação de Proteínas
10.
Anticancer Res ; 40(9): 5097-5106, 2020 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-32878798

RESUMO

BACKGROUND/AIM: Accumulating evidence has shown therapeutic effects of herbals on breast cancer, a commonly diagnosed malignancy in women worldwide. However, their underlying mechanisms remain unclear. We aimed to explore the mode of action of a recently developed herbal combination at system-level. MATERIALS AND METHODS: We employed network pharmacological approaches to study the mechanism of a combination of three herbals, Astragalus membranaceus, Angelica gigas and Trichosanthes kirilowii by investigating active compounds and performing functional enrichment analysis for the interacting targets. RESULTS: For in silico pharmacokinetic evaluation, ten active ingredients interacted with fifty-six breast cancer-associated therapeutic targets. Functional enrichment analysis revealed that TNF, estrogen, PI3K-Akt and MAPK signaling pathways were involved in tumorigenesis and development of breast cancer. The pharmacological mechanisms might be associated with cellular effects on proliferation, cell cycle process and apoptosis. CONCLUSION: The present study provides novel insights into the system-level pharmacological mechanisms underlying a herbal combination used for breast cancer therapies.


Assuntos
Antineoplásicos Fitogênicos/farmacologia , Medicamentos de Ervas Chinesas/farmacologia , Redes Neurais de Computação , Biologia de Sistemas/métodos , Tecnologia Farmacêutica/métodos , Antineoplásicos Fitogênicos/química , Astragalus propinquus , Neoplasias da Mama , Linhagem Celular Tumoral , Biologia Computacional/métodos , Ensaios de Seleção de Medicamentos Antitumorais , Medicamentos de Ervas Chinesas/química , Feminino , Humanos , Medicina Tradicional Chinesa , Fluxo de Trabalho
11.
PLoS Comput Biol ; 16(9): e1007758, 2020 09.
Artigo em Inglês | MEDLINE | ID: mdl-32881897

RESUMO

With the ever-increasing quality and quantity of imaging data in biomedical research comes the demand for computational methodologies that enable efficient and reliable automated extraction of the quantitative information contained within these images. One of the challenges in providing such methodology is the need for tailoring algorithms to the specifics of the data, limiting their areas of application. Here we present a broadly applicable approach to quantification and classification of complex shapes and patterns in biological or other multi-component formations. This approach integrates the mapping of all shape boundaries within an image onto a global information-rich graph and machine learning on the multidimensional measures of the graph. We demonstrated the power of this method by (1) extracting subtle structural differences from visually indistinguishable images in our phenotype rescue experiments using the endothelial tube formations assay, (2) training the algorithm to identify biophysical parameters underlying the formation of different multicellular networks in our simulation model of collective cell behavior, and (3) analyzing the response of U2OS cell cultures to a broad array of small molecule perturbations.


Assuntos
Biologia Computacional/métodos , Processamento de Imagem Assistida por Computador/métodos , Aprendizado de Máquina , Reconhecimento Automatizado de Padrão/métodos , Algoritmos , Linhagem Celular Tumoral , Técnicas Citológicas , Árvores de Decisões , Técnicas de Silenciamento de Genes , Células Endoteliais da Veia Umbilical Humana , Humanos
12.
PLoS Comput Biol ; 16(9): e1008108, 2020 09.
Artigo em Inglês | MEDLINE | ID: mdl-32898133

RESUMO

Existing models for assessing microbiome sequencing such as operational taxonomic units (OTUs) can only test predictors' effects on OTUs. There is limited work on how to estimate the correlations between multiple OTUs and incorporate such relationship into models to evaluate longitudinal OTU measures. We propose a novel approach to estimate OTU correlations based on their taxonomic structure, and apply such correlation structure in Generalized Estimating Equations (GEE) models to estimate both predictors' effects and OTU correlations. We develop a two-part Microbiome Taxonomic Longitudinal Correlation (MTLC) model for multivariate zero-inflated OTU outcomes based on the GEE framework. In addition, longitudinal and other types of repeated OTU measures are integrated in the MTLC model. Extensive simulations have been conducted to evaluate the performance of the MTLC method. Compared with the existing methods, the MTLC method shows robust and consistent estimation, and improved statistical power for testing predictors' effects. Lastly we demonstrate our proposed method by implementing it into a real human microbiome study to evaluate the obesity on twins.


Assuntos
Biologia Computacional/métodos , DNA Bacteriano , Microbioma Gastrointestinal/genética , Modelos Estatísticos , Análise de Sequência de DNA/métodos , DNA Bacteriano/classificação , DNA Bacteriano/genética , Bases de Dados Genéticas , Humanos
13.
PLoS Comput Biol ; 16(9): e1008205, 2020 09.
Artigo em Inglês | MEDLINE | ID: mdl-32903255

RESUMO

Single-cell RNA sequencing (scRNA-seq) can map cell types, states and transitions during dynamic biological processes such as tissue development and regeneration. Many trajectory inference methods have been developed to order cells by their progression through a dynamic process. However, when time series data is available, most of these methods do not consider the available time information when ordering cells and are instead designed to work only on a single scRNA-seq data snapshot. We present Tempora, a novel cell trajectory inference method that orders cells using time information from time-series scRNA-seq data. In performance comparison tests, Tempora inferred known developmental lineages from three diverse tissue development time series data sets, beating state of the art methods in accuracy and speed. Tempora works at the level of cell clusters (types) and uses biological pathway information to help identify cell type relationships. This approach increases gene expression signal from single cells, processing speed, and interpretability of the inferred trajectory. Our results demonstrate the utility of a combination of time and pathway information to supervise trajectory inference for scRNA-seq based analysis.


Assuntos
Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Software , Algoritmos , Animais , Células Cultivadas , Humanos , Camundongos , Mioblastos/metabolismo , RNA/genética , RNA/metabolismo , Reprodutibilidade dos Testes
14.
BMC Bioinformatics ; 21(1): 402, 2020 Sep 14.
Artigo em Inglês | MEDLINE | ID: mdl-32928110

RESUMO

BACKGROUND: De Bruijn graphs are key data structures for the analysis of next-generation sequencing data. They efficiently represent the overlap between reads and hence, also the underlying genome sequence. However, sequencing errors and repeated subsequences render the identification of the true underlying sequence difficult. A key step in this process is the inference of the multiplicities of nodes and arcs in the graph. These multiplicities correspond to the number of times each k-mer (resp. k+1-mer) implied by a node (resp. arc) is present in the genomic sequence. Determining multiplicities thus reveals the repeat structure and presence of sequencing errors. Multiplicities of nodes/arcs in the de Bruijn graph are reflected in their coverage, however, coverage variability and coverage biases render their determination ambiguous. Current methods to determine node/arc multiplicities base their decisions solely on the information in nodes and arcs individually, under-utilising the information present in the sequencing data. RESULTS: To improve the accuracy with which node and arc multiplicities in a de Bruijn graph are inferred, we developed a conditional random field (CRF) model to efficiently combine the coverage information within each node/arc individually with the information of surrounding nodes and arcs. Multiplicities are thus collectively assigned in a more consistent manner. CONCLUSIONS: We demonstrate that the CRF model yields significant improvements in accuracy and a more robust expectation-maximisation parameter estimation. True k-mers can be distinguished from erroneous k-mers with a higher F1 score than existing methods. A C++11 implementation is available at https://github.com/biointec/detox under the GNU AGPL v3.0 license.


Assuntos
Biologia Computacional/métodos , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Algoritmos , Humanos
15.
PLoS Comput Biol ; 16(9): e1008229, 2020 09.
Artigo em Inglês | MEDLINE | ID: mdl-32936825

RESUMO

Accurately predicting essential genes using computational methods can greatly reduce the effort in finding them via wet experiments at both time and resource scales, and further accelerate the process of drug discovery. Several computational methods have been proposed for predicting essential genes in model organisms by integrating multiple biological data sources either via centrality measures or machine learning based methods. However, the methods aiming to predict human essential genes are still limited and the performance still need improve. In addition, most of the machine learning based essential gene prediction methods are lack of skills to handle the imbalanced learning issue inherent in the essential gene prediction problem, which might be one factor affecting their performance. We propose a deep learning based method, DeepHE, to predict human essential genes by integrating features derived from sequence data and protein-protein interaction (PPI) network. A deep learning based network embedding method is utilized to automatically learn features from PPI network. In addition, 89 sequence features were derived from DNA sequence and protein sequence for each gene. These two types of features are integrated to train a multilayer neural network. A cost-sensitive technique is used to address the imbalanced learning problem when training the deep neural network. The experimental results for predicting human essential genes show that our proposed method, DeepHE, can accurately predict human gene essentiality with an average performance of AUC higher than 94%, the area under precision-recall curve (AP) higher than 90%, and the accuracy higher than 90%. We also compare DeepHE with several widely used traditional machine learning models (SVM, Naïve Bayes, Random Forest, and Adaboost) using the same features and utilizing the same cost-sensitive technique to against the imbalanced learning issue. The experimental results show that DeepHE significantly outperforms the compared machine learning models. We have demonstrated that human essential genes can be accurately predicted by designing effective machine learning algorithm and integrating representative features captured from available biological data. The proposed deep learning framework is effective for such task.


Assuntos
Biologia Computacional/métodos , Aprendizado Profundo , Genes Essenciais/genética , Análise de Sequência de DNA/métodos , DNA/genética , Humanos , Redes Neurais de Computação , Mapas de Interação de Proteínas/genética
16.
PLoS Comput Biol ; 16(9): e1008173, 2020 09.
Artigo em Inglês | MEDLINE | ID: mdl-32946435

RESUMO

Single-cell Hi-C (scHi-C) interrogates genome-wide chromatin interaction in individual cells, allowing us to gain insights into 3D genome organization. However, the extremely sparse nature of scHi-C data poses a significant barrier to analysis, limiting our ability to tease out hidden biological information. In this work, we approach this problem by applying topic modeling to scHi-C data. Topic modeling is well-suited for discovering latent topics in a collection of discrete data. For our analysis, we generate nine different single-cell combinatorial indexed Hi-C (sci-Hi-C) libraries from five human cell lines (GM12878, H1Esc, HFF, IMR90, and HAP1), consisting over 19,000 cells. We demonstrate that topic modeling is able to successfully capture cell type differences from sci-Hi-C data in the form of "chromatin topics." We further show enrichment of particular compartment structures associated with locus pairs in these topics.


Assuntos
Cromatina , Biologia Computacional/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Célula Única/métodos , Linhagem Celular , Cromatina/química , Cromatina/genética , Análise por Conglomerados , Biblioteca Gênica , Humanos , Processamento de Linguagem Natural
17.
BMC Bioinformatics ; 21(1): 406, 2020 Sep 15.
Artigo em Inglês | MEDLINE | ID: mdl-32933482

RESUMO

BACKGROUND: Bioinformatic workflows frequently make use of automated genome assembly and protein clustering tools. At the core of most of these tools, a significant portion of execution time is spent in determining optimal local alignment between two sequences. This task is performed with the Smith-Waterman algorithm, which is a dynamic programming based method. With the advent of modern sequencing technologies and increasing size of both genome and protein databases, a need for faster Smith-Waterman implementations has emerged. Multiple SIMD strategies for the Smith-Waterman algorithm are available for CPUs. However, with the move of HPC facilities towards accelerator based architectures, a need for an efficient GPU accelerated strategy has emerged. Existing GPU based strategies have either been optimized for a specific type of characters (Nucleotides or Amino Acids) or for only a handful of application use-cases. RESULTS: In this paper, we present ADEPT, a new sequence alignment strategy for GPU architectures that is domain independent, supporting alignment of sequences from both genomes and proteins. Our proposed strategy uses GPU specific optimizations that do not rely on the nature of sequence. We demonstrate the feasibility of this strategy by implementing the Smith-Waterman algorithm and comparing it to similar CPU strategies as well as the fastest known GPU methods for each domain. ADEPT's driver enables it to scale across multiple GPUs and allows easy integration into software pipelines which utilize large scale computational systems. We have shown that the ADEPT based Smith-Waterman algorithm demonstrates a peak performance of 360 GCUPS and 497 GCUPs for protein based and DNA based datasets respectively on a single GPU node (8 GPUs) of the Cori Supercomputer. Overall ADEPT shows 10x faster performance in a node-to-node comparison against a corresponding SIMD CPU implementation. CONCLUSIONS: ADEPT demonstrates a performance that is either comparable or better than existing GPU strategies. We demonstrated the efficacy of ADEPT in supporting existing bionformatics software pipelines by integrating ADEPT in MetaHipMer a high-performance denovo metagenome assembler and PASTIS a high-performance protein similarity graph construction pipeline. Our results show 10% and 30% boost of performance in MetaHipMer and PASTIS respectively.


Assuntos
Biologia Computacional/métodos , Alinhamento de Sequência/métodos , Algoritmos , Humanos
18.
BMC Bioinformatics ; 21(1): 410, 2020 Sep 16.
Artigo em Inglês | MEDLINE | ID: mdl-32938397

RESUMO

BACKGROUND: Motif enrichment analysis (MEA) identifies over-represented transcription factor binding (TF) motifs in the DNA sequence of regulatory regions, enabling researchers to infer which transcription factors can regulate transcriptional response to a stimulus, or identify sequence features found near a target protein in a ChIP-seq experiment. Score-based MEA determines motifs enriched in regions exhibiting extreme differences in regulatory activity, but existing methods do not control for biases in GC content or dinucleotide composition. This lack of control for sequence bias, such as those often found in CpG islands, can obscure the enrichment of biologically relevant motifs. RESULTS: We developed Motif Enrichment In Ranked Lists of Peaks (MEIRLOP), a novel MEA method that determines enrichment of TF binding motifs in a list of scored regulatory regions, while controlling for sequence bias. In this study, we compare MEIRLOP against other MEA methods in identifying binding motifs found enriched in differentially active regulatory regions after interferon-beta stimulus, finding that using logistic regression and covariates improves the ability to call enrichment of ISGF3 binding motifs from differential acetylation ChIP-seq data compared to other methods. Our method achieves similar or better performance compared to other methods when quantifying the enrichment of TF binding motifs from ENCODE TF ChIP-seq datasets. We also demonstrate how MEIRLOP is broadly applicable to the analysis of numerous types of NGS assays and experimental designs. CONCLUSIONS: Our results demonstrate the importance of controlling for sequence bias when accurately identifying enriched DNA sequence motifs using score-based MEA. MEIRLOP is available for download from https://github.com/npdeloss/meirlop under the MIT license.


Assuntos
Biologia Computacional/métodos , Motivos de Nucleotídeos/genética , Análise de Sequência de DNA/métodos , Viés , Humanos
19.
BMC Bioinformatics ; 21(1): 411, 2020 Sep 17.
Artigo em Inglês | MEDLINE | ID: mdl-32942983

RESUMO

BACKGROUND: Protein microarray is a well-established approach for characterizing activity levels of thousands of proteins in a parallel manner. Analysis of protein microarray data is complex and time-consuming, while existing solutions are either outdated or challenging to use without programming skills. The typical data analysis pipeline consists of a data preprocessing step, followed by differential expression analysis, which is then put into context via functional enrichment. Normally, biologists would need to assemble their own workflow by combining a set of unrelated tools to analyze experimental data. Provided that most of these tools are developed independently by various bioinformatics groups, making them work together could be a real challenge. RESULTS: Here we present PAWER, the online web tool dedicated solely to protein microarray analysis. PAWER enables biologists to carry out all the necessary analysis steps in one go. PAWER provides access to state-of-the-art computational methods through the user-friendly interface, resulting in publication-ready illustrations. We also provide an R package for more advanced use cases, such as bespoke analysis workflows. CONCLUSIONS: PAWER is freely available at https://biit.cs.ut.ee/pawer .


Assuntos
Biologia Computacional/métodos , Análise Serial de Proteínas/métodos , Humanos
20.
Commun Biol ; 3(1): 538, 2020 09 29.
Artigo em Inglês | MEDLINE | ID: mdl-32994472

RESUMO

The advent of portable nanopore sequencing devices has enabled DNA and RNA sequencing to be performed in the field or the clinic. However, advances in in situ genomics require parallel development of portable, offline solutions for the computational analysis of sequencing data. Here we introduce Genopo, a mobile toolkit for nanopore sequencing analysis. Genopo compacts popular bioinformatics tools to an Android application, enabling fully portable computation. To demonstrate its utility for in situ genome analysis, we use Genopo to determine the complete genome sequence of the human coronavirus SARS-CoV-2 in nine patient isolates sequenced on a nanopore device, with Genopo executing this workflow in less than 30 min per sample on a range of popular smartphones. We further show how Genopo can be used to profile DNA methylation in a human genome sample, illustrating a flexible, efficient architecture that is suitable to run many popular bioinformatics tools and accommodate small or large genomes. As the first ever smartphone application for nanopore sequencing analysis, Genopo enables the genomics community to harness this cheap, ubiquitous computational resource.


Assuntos
Betacoronavirus/genética , Biologia Computacional/métodos , Genoma Humano , Genoma Viral , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequenciamento Completo do Genoma/métodos , Betacoronavirus/patogenicidade , Telefone Celular/instrumentação , Biologia Computacional/instrumentação , Infecções por Coronavirus/diagnóstico , Infecções por Coronavirus/virologia , Metilação de DNA , Sequenciamento de Nucleotídeos em Larga Escala/instrumentação , Humanos , Nanoporos , Pandemias , Pneumonia Viral/diagnóstico , Pneumonia Viral/virologia , Sequenciamento Completo do Genoma/instrumentação
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA