Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 36.882
Filtrar
1.
BMC Bioinformatics ; 21(1): 506, 2020 Nov 07.
Artículo en Inglés | MEDLINE | ID: mdl-33160308

RESUMEN

BACKGROUND: Hi-C and its variant techniques have been developed to capture the spatial organization of chromatin. Normalization of Hi-C contact map is essential for accurate modeling and interpretation of high-throughput chromatin conformation capture (3C) experiments. Hi-C correction tools were originally developed to normalize systematic biases of karyotypically normal cell lines. However, a vast majority of available Hi-C datasets are derived from cancer cell lines that carry multi-level DNA copy number variations (CNVs). CNV regions display over- or under-representation of interaction frequencies compared to CN-neutral regions. Therefore, it is necessary to remove CNV-driven bias from chromatin interaction data of cancer cell lines to generate a euploid-equivalent contact map. RESULTS: We developed the HiCNAtra framework to compute high-resolution CNV profiles from Hi-C or 3C-seq data of cancer cell lines and to correct chromatin contact maps from systematic biases including CNV-associated bias. First, we introduce a novel 'entire-fragment' counting method for better estimation of the read depth (RD) signal from Hi-C reads that recapitulates the whole-genome sequencing (WGS)-derived coverage signal. Second, HiCNAtra employs a multimodal-based hierarchical CNV calling approach, which outperformed OneD and HiNT tools, to accurately identify CNVs of cancer cell lines. Third, incorporating CNV information with other systematic biases, HiCNAtra simultaneously estimates the contribution of each bias and explicitly corrects the interaction matrix using Poisson regression. HiCNAtra normalization abolishes CNV-induced artifacts from the contact map generating a heatmap with homogeneous signal. When benchmarked against OneD, CAIC, and ICE methods using MCF7 cancer cell line, HiCNAtra-corrected heatmap achieves the least 1D signal variation without deforming the inherent chromatin interaction signal. Additionally, HiCNAtra-corrected contact frequencies have minimum correlations with each of the systematic bias sources compared to OneD's explicit method. Visual inspection of CNV profiles and contact maps of cancer cell lines reveals that HiCNAtra is the most robust Hi-C correction tool for ameliorating CNV-induced bias. CONCLUSIONS: HiCNAtra is a Hi-C-based computational tool that provides an analytical and visualization framework for DNA copy number profiling and chromatin contact map correction of karyotypically abnormal cell lines. HiCNAtra is an open-source software implemented in MATLAB and is available at https://github.com/AISKhalil/HiCNAtra .


Asunto(s)
Biología Computacional/métodos , Variaciones en el Número de Copia de ADN , Neoplasias/patología , Cromatina/metabolismo , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Células MCF-7 , Neoplasias/genética , Interfaz Usuario-Computador
2.
BMC Bioinformatics ; 21(1): 511, 2020 Nov 10.
Artículo en Inglés | MEDLINE | ID: mdl-33167851

RESUMEN

BACKGROUND: The nonrandom radial organization of eukaryotic chromosome territories (CTs) inside the nucleus plays an important role in nuclear functional compartmentalization. Increasingly, chromosome conformation capture (Hi-C) based approaches are being used to characterize the genome structure of many cell types and conditions. Computational methods to extract 3D arrangements of CTs from this type of pairwise contact data will thus increase our ability to analyze CT organization in a wider variety of biological situations. RESULTS: A number of full-scale polymer models have successfully reconstructed the 3D structure of chromosome territories from Hi-C. To supplement such methods, we explore alternative, direct, and less computationally intensive approaches to capture radial CT organization from Hi-C data. We show that we can infer relative chromosome ordering using PCA on a thresholded inter-chromosomal contact matrix. We simulate an ensemble of possible CT arrangements using a force-directed network layout algorithm and propose an approach to integrate additional chromosome properties into our predictions. Our CT radial organization predictions have a high correlation with microscopy imaging data for various cell nucleus geometries (lymphoblastoid, skin fibroblast, and breast epithelial cells), and we can capture previously documented changes in senescent and progeria cells. CONCLUSIONS: Our analysis approaches provide rapid and modular approaches to screen for alterations in CT organization across widely available Hi-C data. We demonstrate which stages of the approach can extract meaningful information, and also describe limitations of pairwise contacts alone to predict absolute 3D positions.


Asunto(s)
Cromosomas/química , Biología Computacional/métodos , Línea Celular Tumoral , Núcleo Celular/genética , Cromosomas/metabolismo , Células Epiteliales/citología , Células Epiteliales/metabolismo , Humanos , Análisis de Componente Principal
3.
BMC Bioinformatics ; 21(1): 512, 2020 Nov 10.
Artículo en Inglés | MEDLINE | ID: mdl-33167861

RESUMEN

BACKGROUND: An enzyme activity is influenced by the external environment. It is important to have an enzyme remain high activity in a specific condition. A usual way is to first determine the optimal condition of an enzyme by either the gradient test or by tertiary structure, and then to use protein engineering to mutate a wild type enzyme for a higher activity in an expected condition. RESULTS: In this paper, we investigate the optimal condition of an enzyme by directly analyzing the sequence. We propose an embedding method to represent the amino acids and the structural information as vectors in the latent space. These vectors contain information about the correlations between amino acids and sites in the aligned amino acid sequences, as well as the correlation with the optimal condition. We crawled and processed the amino acid sequences in the glycoside hydrolase GH11 family, and got 125 amino acid sequences with optimal pH condition. We used probabilistic approximation method to implement the embedding learning method on these samples. Based on these embedding vectors, we design a computational score to determine which one has a better optimal condition for two given amino acid sequences and achieves the accuracy 80% on the test proteins in the same family. We also give the mutation suggestion such that it has a higher activity in an expected environment, which is consistent with the previously professional wet experiments and analysis. CONCLUSION: A new computational method is proposed for the sequence based on the enzyme optimal condition analysis. Compared with the traditional process that involves a lot of wet experiments and requires multiple mutations, this method can give recommendations on the direction and location of amino acid substitution with reference significance for an expected condition in an efficient and effective way.


Asunto(s)
Biología Computacional/métodos , Glicósido Hidrolasas/metabolismo , Secuencia de Aminoácidos , Glicósido Hidrolasas/química , Glicósido Hidrolasas/genética , Concentración de Iones de Hidrógeno , Redes Neurales de la Computación
4.
BMC Bioinformatics ; 21(1): 492, 2020 Oct 31.
Artículo en Inglés | MEDLINE | ID: mdl-33129268

RESUMEN

BACKGROUND: The ability to compare samples or studies easily using metabarcoding so as to better interpret microbial ecology results is an upcoming challenge. A growing number of metabarcoding pipelines are available, each with its own benefits and limitations. However, very few have been developed to offer the opportunity to characterize various microbial communities (e.g., archaea, bacteria, fungi, photosynthetic microeukaryotes) with the same tool. RESULTS: BIOCOM-PIPE is a flexible and independent suite of tools for processing data from high-throughput sequencing technologies, Roche 454 and Illumina platforms, and focused on the diversity of archaeal, bacterial, fungal, and photosynthetic microeukaryote amplicons. Various original methods were implemented in BIOCOM-PIPE to (1) remove chimeras based on read abundance, (2) align sequences with structure-based alignments of RNA homologs using covariance models, and (3) a post-clustering tool (ReClustOR) to improve OTUs consistency based on a reference OTU database. The comparison with two other pipelines (FROGS and mothur) and Amplicon Sequence Variant definition highlighted that BIOCOM-PIPE was better at discriminating land use groups. CONCLUSIONS: The BIOCOM-PIPE pipeline makes it possible to analyze 16S, 18S and 23S rRNA genes in the same packaged tool. The new post-clustering approach defines a biological database from previously analyzed samples and performs post-clustering of reads with this reference database by using open-reference clustering. This makes it easier to compare projects from various sequencing runs, and increased the congruence among results. For all users, the pipeline was developed to allow for adding or modifying the components, the databases and the bioinformatics tools easily, giving high modularity for each analysis.


Asunto(s)
Archaea/genética , Bacterias/genética , Biodiversidad , Biología Computacional/métodos , Código de Barras del ADN Taxonómico , Hongos/genética , Genes de ARNr , Programas Informáticos , Análisis por Conglomerados , Simulación por Computador , Bases de Datos Genéticas , Microbiota/genética , ARN Ribosómico 16S/genética , ARN Ribosómico 23S/genética , Microbiología del Suelo
5.
PLoS One ; 15(10): e0227659, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-33091000

RESUMEN

Mass spectrometry is a fundamental tool for modern proteomics. The increasing availability of mass spectrometry data paired with the increasing sensitivity and fidelity of the instruments necessitates new and more potent analytical methods. To that end, we have created and present XFlow, a feature detection algorithm for extracting ion chromatograms from MS1 LC-MS data. XFlow is a parameter-free procedurally agnostic feature detection algorithm that utilizes the latent properties of ion chromatograms to resolve them from the surrounding noise present in MS1 data. XFlow is designed to function on either profile or centroided data across different resolutions and instruments. This broad applicability lends XFlow strong utility as a one-size-fits-all method for MS1 analysis or target acquisition for MS2. XFlow is written in Java and packaged with JS-MS, an open-source mass spectrometry analysis toolkit.


Asunto(s)
Biología Computacional/métodos , Proteómica/métodos , Algoritmos , Cromatografía Liquida , Iones/análisis , Espectrometría de Masas en Tándem
6.
PLoS One ; 15(10): e0239287, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-33002005

RESUMEN

RNAs adopt specific structures to perform their functions, which are critical to fundamental cellular processes. For decades, these structures have been determined and modeled with strong support from computational methods. Still, the accuracy of the latter ones depends on the availability of experimental data, for example, chemical probing information that can define pseudo-energy constraints for RNA folding algorithms. At the same time, diverse computational tools have been developed to facilitate analysis and visualization of data from RNA structure probing experiments followed by capillary electrophoresis or next-generation sequencing. RNAthor, a new software tool for the fully automated normalization of SHAPE and DMS probing data resolved by capillary electrophoresis, has recently joined this collection. RNAthor automatically identifies unreliable probing data. It normalizes the reactivity information to a uniform scale and uses it in the RNA secondary structure prediction. Our web server also provides tools for fast and easy RNA probing data visualization and statistical analysis that facilitates the comparison of multiple data sets. RNAthor is freely available at http://rnathor.cs.put.poznan.pl/.


Asunto(s)
Biología Computacional/métodos , Electroforesis Capilar , Pliegue del ARN , ARN/química , Estadística como Asunto/métodos , Internet , Factores de Tiempo
7.
PLoS One ; 15(10): e0239700, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-33017414

RESUMEN

In the past two decades, research into the biochemical, biophysical and structural properties of the ribosome have revealed many different steps of protein translation. Nevertheless, a complete understanding of how they lead to a rapid and accurate protein synthesis still remains a challenge. Here we consider a coarse network analysis in the bacterial ribosome formed by the connectivity between ribosomal (r) proteins and RNAs at different stages in the elongation cycle. The ribosomal networks are found to be dis-assortative and small world, implying that the structure allows for an efficient exchange of information between distant locations. An analysis of centrality shows that the second and fifth domains of 23S rRNA are the most important elements in all of the networks. Ribosomal protein hubs connect to much fewer nodes but are shown to provide important connectivity within the network (high closeness centrality). A modularity analysis reveals some of the different functional communities, indicating some known and some new possible communication pathways Our mathematical results confirm important communication pathways that have been discussed in previous research, thus verifying the use of this technique for representing the ribosome, and also reveal new insights into the collective function of ribosomal elements.


Asunto(s)
Bacterias/genética , Redes Reguladoras de Genes/genética , Ribosomas/genética , Bacterias/metabolismo , Biología Computacional/métodos , Biosíntesis de Proteínas/genética , Biosíntesis de Proteínas/fisiología , ARN Ribosómico 23S/metabolismo , Proteínas Ribosómicas/metabolismo , Ribosomas/metabolismo , Elongación de la Transcripción Genética/fisiología
8.
Zool Res ; 41(6): 705-708, 2020 Nov 18.
Artículo en Inglés | MEDLINE | ID: mdl-33045776

RESUMEN

Since the first reported severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection in December 2019, coronavirus disease 2019 (COVID-19) has become a global pandemic, spreading to more than 200 countries and regions worldwide. With continued research progress and virus detection, SARS-CoV-2 genomes and sequencing data have been reported and accumulated at an unprecedented rate. To meet the need for fast analysis of these genome sequences, the National Genomics Data Center (NGDC) of the China National Center for Bioinformation (CNCB) has established an online coronavirus analysis platform, which includes de novoassembly, BLAST alignment, genome annotation, variant identification, and variant annotation modules. The online analysis platform can be freely accessed at the 2019 Novel Coronavirus Resource (2019nCoVR) (https://bigd.big.ac.cn/ncov/online/tools).


Asunto(s)
Betacoronavirus/genética , Biología Computacional/métodos , Infecciones por Coronavirus/diagnóstico , Genoma Viral/genética , Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Neumonía Viral/diagnóstico , Animales , Betacoronavirus/clasificación , Betacoronavirus/fisiología , China , Biología Computacional/organización & administración , Infecciones por Coronavirus/virología , Variación Genética , Humanos , Internet , Anotación de Secuencia Molecular , Pandemias , Neumonía Viral/virología
9.
Nat Commun ; 11(1): 5026, 2020 10 06.
Artículo en Inglés | MEDLINE | ID: mdl-33024104

RESUMEN

How to design experiments that accelerate knowledge discovery on complex biological landscapes remains a tantalizing question. We present an optimal experimental design method (coined OPEX) to identify informative omics experiments using machine learning models for both experimental space exploration and model training. OPEX-guided exploration of Escherichia coli's populations exposed to biocide and antibiotic combinations lead to more accurate predictive models of gene expression with 44% less data. Analysis of the proposed experiments shows that broad exploration of the experimental space followed by fine-tuning emerges as the optimal strategy. Additionally, analysis of the experimental data reveals 29 cases of cross-stress protection and 4 cases of cross-stress vulnerability. Further validation reveals the central role of chaperones, stress response proteins and transport pumps in cross-stress exposure. This work demonstrates how active learning can be used to guide omics data collection for training predictive models, making evidence-driven decisions and accelerating knowledge discovery in life sciences.


Asunto(s)
Biología Computacional/métodos , Escherichia coli/efectos de los fármacos , Escherichia coli/genética , Modelos Biológicos , Antibacterianos/farmacología , Proteínas Bacterianas/genética , Desinfectantes/farmacología , Regulación Bacteriana de la Expresión Génica/efectos de los fármacos , Aprendizaje Automático , Proteínas de la Membrana/genética , Chaperonas Moleculares/genética , Proyectos de Investigación , Estrés Fisiológico/efectos de los fármacos , Estrés Fisiológico/genética
10.
Nat Commun ; 11(1): 5011, 2020 10 06.
Artículo en Inglés | MEDLINE | ID: mdl-33024107

RESUMEN

Development of high throughput single-cell sequencing technologies has made it cost-effective to profile thousands of cells from diverse samples containing multiple cell types. To study how these different cell types work together, here we develop NATMI (Network Analysis Toolkit for Multicellular Interactions). NATMI uses connectomeDB2020 (a database of 2293 manually curated ligand-receptor pairs with literature support) to predict and visualise cell-to-cell communication networks from single-cell (or bulk) expression data. Using multiple published single-cell datasets we demonstrate how NATMI can be used to identify (i) the cell-type pairs that are communicating the most (or most specifically) within a network, (ii) the most active (or specific) ligand-receptor pairs active within a network, (iii) putative highly-communicating cellular communities and (iv) differences in intercellular communication when profiling given cell types under different conditions. Furthermore, analysis of the Tabula Muris (organism-wide) atlas confirms our previous prediction that autocrine signalling is a major feature of cell-to-cell communication networks, while also revealing that hundreds of ligands and their cognate receptors are co-expressed in individual cells suggesting a substantial potential for self-signalling.


Asunto(s)
Comunicación Celular , Biología Computacional/métodos , Programas Informáticos , Factores de Edad , Animales , Comunicación Autocrina , Visualización de Datos , Bases de Datos Factuales , Femenino , Ligandos , Glándulas Mamarias Animales , Ratones , Proteínas/metabolismo , Análisis de la Célula Individual , Interfaz Usuario-Computador
11.
BMC Bioinformatics ; 21(1): 470, 2020 Oct 21.
Artículo en Inglés | MEDLINE | ID: mdl-33087064

RESUMEN

BACKGROUND: Many studies prove that miRNAs have significant roles in diagnosing and treating complex human diseases. However, conventional biological experiments are too costly and time-consuming to identify unconfirmed miRNA-disease associations. Thus, computational models predicting unidentified miRNA-disease pairs in an efficient way are becoming promising research topics. Although existing methods have performed well to reveal unidentified miRNA-disease associations, more work is still needed to improve prediction performance. RESULTS: In this work, we present a novel multiple meta-paths fusion graph embedding model to predict unidentified miRNA-disease associations (M2GMDA). Our method takes full advantage of the complex structure and rich semantic information of miRNA-disease interactions in a self-learning way. First, a miRNA-disease heterogeneous network was derived from verified miRNA-disease pairs, miRNA similarity and disease similarity. All meta-path instances connecting miRNAs with diseases were extracted to describe intrinsic information about miRNA-disease interactions. Then, we developed a graph embedding model to predict miRNA-disease associations. The model is composed of linear transformations of miRNAs and diseases, the means encoder of a single meta-path instance, the attention-aware encoder of meta-path type and attention-aware multiple meta-path fusion. We innovatively integrated meta-path instances, meta-path based neighbours, intermediate nodes in meta-paths and more information to strengthen the prediction in our model. In particular, distinct contributions of different meta-path instances and meta-path types were combined with attention mechanisms. The data sets and source code that support the findings of this study are available at https://github.com/dangdangzhang/M2GMDA . CONCLUSIONS: M2GMDA achieved AUCs of 0.9323 and 0.9182 in global leave-one-out cross validation and fivefold cross validation with HDMM V2.0. The results showed that our method outperforms other prediction methods. Three kinds of case studies with lung neoplasms, breast neoplasms, prostate neoplasms, pancreatic neoplasms, lymphoma and colorectal neoplasms demonstrated that 47, 50, 49, 48, 50 and 50 out of the top 50 candidate miRNAs predicted by M2GMDA were validated by biological experiments. Therefore, it further confirms the prediction performance of our method.


Asunto(s)
Biología Computacional/métodos , Gráficos por Computador , MicroARNs/genética , Neoplasias/genética , Algoritmos , Área Bajo la Curva , Predisposición Genética a la Enfermedad/genética , Humanos , Masculino , Factores de Riesgo
12.
Medicine (Baltimore) ; 99(43): e22974, 2020 Oct 23.
Artículo en Inglés | MEDLINE | ID: mdl-33120861

RESUMEN

The current study aimed to elucidate the molecular mechanisms and identify the potential key genes and pathways for metastatic uveal melanoma (UM) using bioinformatics analysis.Gene expression microarray data from GSE39717 included 39 primary UM tissue samples and 2 metastatic UM tissue samples. Differentially expressed genes (DEGs) were generated using Gene Expression Omnibus 2R. Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were performed using the online Database for Annotation, Visualization and Integrated Discovery (DAVID) tool. The web-based STRING tool was adopted to construct a protein--protein interaction (PPI) network. The MCODE tool in Cytoscape was used to generate significant modules of the PPI network.A total of 213 DEGs were identified. GO and KEGG analyses revealed that the upregulated genes were mainly enriched in extracellular matrix organization and blood coagulation cascades, while the downregulated DEGs were mainly related to protein binding, negative regulation of ERK cascade, nucleus and chromatin modification, and lung and renal cell carcinoma. The most significant module was extracted from the PPI network. GO and KEGG enrichment analyses of the module revealed that the genes were mainly enriched in the extracellular region and space organization, blood coagulation process, and PI3K-Akt signaling pathway. Hub genes, including FN1, APOB, F2, SERPINC1, SERPINA1, APOA1, FGG, PROC, ITIH2, VCAN, TFPI, CXCL8, CDH2, and HP, were identified from DEGs. Survival analysis and hierarchical clustering results revealed that most of the hub genes were associated with prognosis and clinical progression.Results of this bioinformatics analysis may provide predictive biomarkers and potential candidate therapeutic targets for individuals with metastatic UM.


Asunto(s)
Biología Computacional/métodos , Melanoma/genética , Melanoma/secundario , Neoplasias de la Úvea/patología , Biomarcadores de Tumor/metabolismo , Análisis por Conglomerados , Progresión de la Enfermedad , Regulación hacia Abajo , Perfilación de la Expresión Génica/métodos , Regulación Neoplásica de la Expresión Génica/genética , Ontología de Genes , Humanos , Tamizaje Masivo/métodos , Análisis por Micromatrices/métodos , Pronóstico , Mapas de Interacción de Proteínas/genética , Transducción de Señal/genética , Regulación hacia Arriba , Neoplasias de la Úvea/genética , Neoplasias de la Úvea/secundario
14.
BMC Bioinformatics ; 21(1): 466, 2020 Oct 19.
Artículo en Inglés | MEDLINE | ID: mdl-33076816

RESUMEN

BACKGROUND: Homology based methods are one of the most important and widely used approaches for functional annotation of high-throughput microbial genome data. A major limitation of these methods is the absence of well-characterized sequences for certain functions. The non-homology methods based on the context and the interactions of a protein are very useful for identifying missing metabolic activities and functional annotation in the absence of significant sequence similarity. In the current work, we employ both homology and context-based methods, incrementally, to identify local holes and chokepoints, whose presence in the Mycobacterium tuberculosis genome is indicated based on its interaction with known proteins in a metabolic network context, but have not been annotated. We have developed two computational procedures using network theory to identify orphan enzymes ('Hole finding protocol') coupled with the identification of candidate proteins for the predicted orphan enzyme ('Hole filling protocol'). We propose an integrated interaction score based on scores from the STRING database to identify candidate protein sequences for the orphan enzymes from M. tuberculosis, as a case study, which are most likely to perform the missing function. RESULTS: The application of an automated homology-based enzyme identification protocol, ModEnzA, on M. tuberculosis genome yielded 56 novel enzyme predictions. We further predicted 74 putative local holes, 6 choke points, and 3 high confidence local holes in the genome using 'Hole finding protocol'. The 'Hole-filling protocol' was validated on the E. coli genome using artificial in-silico enzyme knockouts where our method showed 25% increased accuracy, compared to other methods, in assigning the correct sequence for the knocked-out enzyme amongst the top 10 ranks. The method was further validated on 8 additional genomes. CONCLUSIONS: We have developed methods that can be generalized to augment homology-based annotation to identify missing enzyme coding genes and to predict a candidate protein for them. For pathogens such as M. tuberculosis, this work holds significance in terms of increasing the protein repertoire and thereby, the potential for identifying novel drug targets.


Asunto(s)
Proteínas Bacterianas/genética , Biología Computacional/métodos , Enzimas/genética , Mycobacterium tuberculosis/enzimología , Homología de Secuencia de Aminoácido , Secuencia de Aminoácidos , Bases de Datos Factuales , Escherichia coli/enzimología , Genoma Bacteriano , Anotación de Secuencia Molecular
15.
BMC Bioinformatics ; 21(1): 475, 2020 Oct 22.
Artículo en Inglés | MEDLINE | ID: mdl-33092523

RESUMEN

BACKGROUND: Single individual haplotype problem refers to reconstructing haplotypes of an individual based on several input fragments sequenced from a specified chromosome. Solving this problem is an important task in computational biology and has many applications in the pharmaceutical industry, clinical decision-making, and genetic diseases. It is known that solving the problem is NP-hard. Although several methods have been proposed to solve the problem, it is found that most of them have low performances in dealing with noisy input fragments. Therefore, proposing a method which is accurate and scalable, is a challenging task. RESULTS: In this paper, we introduced a method, named NCMHap, which utilizes the Neutrosophic c-means (NCM) clustering algorithm. The NCM algorithm can effectively detect the noise and outliers in the input data. In addition, it can reduce their effects in the clustering process. The proposed method has been evaluated by several benchmark datasets. Comparing with existing methods indicates when NCM is tuned by suitable parameters, the results are encouraging. In particular, when the amount of noise increases, it outperforms the comparing methods. CONCLUSION: The proposed method is validated using simulated and real datasets. The achieved results recommend the application of NCMHap on the datasets which involve the fragments with a huge amount of gaps and noise.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Haplotipos/genética , Secuencia de Bases , Análisis por Conglomerados , Simulación por Computador , Bases de Datos Genéticas , Humanos , Polimorfismo de Nucleótido Simple/genética
16.
BMC Bioinformatics ; 21(1): 478, 2020 Oct 24.
Artículo en Inglés | MEDLINE | ID: mdl-33099301

RESUMEN

BACKGROUND: Introns have been shown to be spliced in a defined order, and this order influences both alternative splicing regulation and splicing fidelity, but previous studies have only considered neighbouring introns. The detailed intron splicing order remains unknown. RESULTS: In this work, a method was developed that can calculate the intron splicing orders of all introns in each transcript. A simulation study showed that this method can accurately calculate intron splicing orders. I further applied this method to real S. pombe, fruit fly, Arabidopsis thaliana, and human sequencing datasets and found that intron splicing orders change from gene to gene and that humans contain more not in-order spliced transcripts than S. pombe, fruit fly and Arabidopsis thaliana. In addition, I reconfirmed that the first introns in humans are spliced slower than those in S. pombe, fruit fly, and Arabidopsis thaliana genome-widely. Both the calculated most likely orders and the method developed here are available on the web. CONCLUSIONS: A novel computational method was developed to calculate the intron splicing orders and applied the method to real sequencing datasets. I obtained intron splicing orders for hundreds or thousands of genes in four organisms. I found humans contain more number of not in-order spliced transcripts.


Asunto(s)
Arabidopsis/genética , Biología Computacional/métodos , Drosophila melanogaster/genética , Intrones/genética , Empalme del ARN/genética , Schizosaccharomyces/genética , Empalme Alternativo , Animales , Secuencia de Bases , Humanos
17.
Sci Rep ; 10(1): 16862, 2020 10 08.
Artículo en Inglés | MEDLINE | ID: mdl-33033344

RESUMEN

The prevalence of a novel ß-coronavirus (SARS-CoV-2) was declared as a public health emergency of international concern on 30 January 2020 and a global pandemic on 11 March 2020 by WHO. The spike glycoprotein of SARS-CoV-2 is regarded as a key target for the development of vaccines and therapeutic antibodies. In order to develop anti-viral therapeutics for SARS-CoV-2, it is crucial to find amino acid pairs that strongly attract each other at the interface of the spike glycoprotein and the human angiotensin-converting enzyme 2 (hACE2) complex. In order to find hot spot residues, the strongly attracting amino acid pairs at the protein-protein interaction (PPI) interface, we introduce a reliable inter-residue interaction energy calculation method, FMO-DFTB3/D/PCM/3D-SPIEs. In addition to the SARS-CoV-2 spike glycoprotein/hACE2 complex, the hot spot residues of SARS-CoV-1 spike glycoprotein/hACE2 complex, SARS-CoV-1 spike glycoprotein/antibody complex, and HCoV-NL63 spike glycoprotein/hACE2 complex were obtained using the same FMO method. Following this, a 3D-SPIEs-based interaction map was constructed with hot spot residues for the hACE2/SARS-CoV-1 spike glycoprotein, hACE2/HCoV-NL63 spike glycoprotein, and hACE2/SARS-CoV-2 spike glycoprotein complexes. Finally, the three 3D-SPIEs-based interaction maps were combined and analyzed to find the consensus hot spots among the three complexes. As a result of the analysis, two hot spots were identified between hACE2 and the three spike proteins. In particular, E37, K353, G354, and D355 of the hACE2 receptor strongly interact with the spike proteins of coronaviruses. The 3D-SPIEs-based map would provide valuable information to develop anti-viral therapeutics that inhibit PPIs between the spike protein of SARS-CoV-2 and hACE2.


Asunto(s)
Betacoronavirus/metabolismo , Biología Computacional/métodos , Infecciones por Coronavirus/epidemiología , Peptidil-Dipeptidasa A/metabolismo , Neumonía Viral/epidemiología , Mapas de Interacción de Proteínas , Glicoproteína de la Espiga del Coronavirus/metabolismo , Anticuerpos Antivirales/metabolismo , Sitios de Unión , Infecciones por Coronavirus/virología , Coronavirus Humano NL63/metabolismo , Humanos , Pandemias , Neumonía Viral/virología , Prevalencia , Dominios Proteicos , Receptores Virales/metabolismo , Virus del SRAS/metabolismo , Síndrome Respiratorio Agudo Grave/virología
19.
BMC Bioinformatics ; 21(1): 472, 2020 Oct 21.
Artículo en Inglés | MEDLINE | ID: mdl-33087041

RESUMEN

BACKGROUND: Optimality principles have been used to explain the structure and behavior of living matter at different levels of organization, from basic phenomena at the molecular level, up to complex dynamics in whole populations. Most of these studies have assumed a single-criteria approach. Such optimality principles have been justified from an evolutionary perspective. In the context of the cell, previous studies have shown how dynamics of gene expression in small metabolic models can be explained assuming that cells have developed optimal adaptation strategies. Most of these works have considered rather simplified representations, such as small linear pathways, or reduced networks with a single branching point, and a single objective for the optimality criteria. RESULTS: Here we consider the extension of this approach to more realistic scenarios, i.e. biochemical pathways of arbitrary size and structure. We first show that exploiting optimality principles for these networks poses great challenges due to the complexity of the associated optimal control problems. Second, in order to surmount such challenges, we present a computational framework which has been designed with scalability and efficiency in mind, including mechanisms to avoid the most common pitfalls. Third, we illustrate its performance with several case studies considering the central carbon metabolism of S. cerevisiae and B. subtilis. In particular, we consider metabolic dynamics during nutrient shift experiments. CONCLUSIONS: We show how multi-objective optimal control can be used to predict temporal profiles of enzyme activation and metabolite concentrations in complex metabolic pathways. Further, we also show how to consider general cost/benefit trade-offs. In this study we have considered metabolic pathways, but this computational framework can also be applied to analyze the dynamics of other complex pathways, such as signal transduction or gene regulatory networks.


Asunto(s)
Biología Computacional/métodos , Redes y Vías Metabólicas , Redes Reguladoras de Genes , Saccharomyces cerevisiae/citología , Saccharomyces cerevisiae/metabolismo , Transducción de Señal , Transcriptoma
20.
BMC Bioinformatics ; 21(1): 473, 2020 Oct 21.
Artículo en Inglés | MEDLINE | ID: mdl-33087046

RESUMEN

BACKGROUND: Phenotypes such as height and intelligence, are thought to be a product of the collective effects of multiple phenotype-associated genes and interactions among their protein products. High/low degree of interactions is suggestive of coherent/random molecular mechanisms, respectively. Comparing the degree of interactions may help to better understand the coherence of phenotype-specific molecular mechanisms and the potential for therapeutic intervention. However, direct comparison of the degree of interactions is difficult due to different sizes and configurations of phenotype-associated gene networks. METHODS: We introduce a metric for measuring coherence of molecular-interaction networks as a slope of internal versus external distributions of the degree of interactions. The internal degree distribution is defined by interaction counts within a phenotype-specific gene network, while the external degree distribution counts interactions with other genes in the whole protein-protein interaction (PPI) network. We present a novel method for normalizing the coherence estimates, making them directly comparable. RESULTS: Using STRING and BioGrid PPI databases, we compared the coherence of 116 phenotype-associated gene sets from GWAScatalog against size-matched KEGG pathways (the reference for high coherence) and random networks (the lower limit of coherence). We observed a range of coherence estimates for each category of phenotypes. Metabolic traits and diseases were the most coherent, while psychiatric disorders and intelligence-related traits were the least coherent. We demonstrate that coherence and modularity measures capture distinct network properties. CONCLUSIONS: We present a general-purpose method for estimating and comparing the coherence of molecular-interaction gene networks that accounts for the network size and shape differences. Our results highlight gaps in our current knowledge of genetics and molecular mechanisms of complex phenotypes and suggest priorities for future GWASs.


Asunto(s)
Biología Computacional/métodos , Enfermedad , Redes Reguladoras de Genes , Humanos , Fenotipo , Mapas de Interacción de Proteínas
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA