Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 50
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
BMC Bioinformatics ; 24(1): 60, 2023 Feb 23.
Artículo en Inglés | MEDLINE | ID: mdl-36823571

RESUMEN

BACKGROUND: Cell homeostasis relies on the concerted actions of genes, and dysregulated genes can lead to diseases. In living organisms, genes or their products do not act alone but within networks. Subsets of these networks can be viewed as modules that provide specific functionality to an organism. The Kyoto encyclopedia of genes and genomes (KEGG) systematically analyzes gene functions, proteins, and molecules and combines them into pathways. Measurements of gene expression (e.g., RNA-seq data) can be mapped to KEGG pathways to determine which modules are affected or dysregulated in the disease. However, genes acting in multiple pathways and other inherent issues complicate such analyses. Many current approaches may only employ gene expression data and need to pay more attention to some of the existing knowledge stored in KEGG pathways for detecting dysregulated pathways. New methods that consider more precompiled information are required for a more holistic association between gene expression and diseases. RESULTS: PriPath is a novel approach that transfers the generic process of grouping and scoring, followed by modeling to analyze gene expression with KEGG pathways. In PriPath, KEGG pathways are utilized as the grouping function as part of a machine learning algorithm for selecting the most significant KEGG pathways. A machine learning model is trained to differentiate between diseases and controls using those groups. We have tested PriPath on 13 gene expression datasets of various cancers and other diseases. Our proposed approach successfully assigned biologically and clinically relevant KEGG terms to the samples based on the differentially expressed genes. We have comparatively evaluated the performance of PriPath against other tools, which are similar in their merit. For each dataset, we manually confirmed the top results of PriPath in the literature and found that most predictions can be supported by previous experimental research. CONCLUSIONS: PriPath can thus aid in determining dysregulated pathways, which applies to medical diagnostics. In the future, we aim to advance this approach so that it can perform patient stratification based on gene expression and identify druggable targets. Thereby, we cover two aspects of precision medicine.


Asunto(s)
Biología Computacional , Neoplasias , Humanos , Biología Computacional/métodos , Neoplasias/genética , Genoma , Algoritmos , Expresión Génica , Perfilación de la Expresión Génica
2.
Arch Toxicol ; 95(11): 3475-3495, 2021 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-34510227

RESUMEN

microRNAs (miRNAs or miRs) are short non-coding RNA molecules which have been shown to be dysregulated and released into the extracellular milieu as a result of many drug and non-drug-induced pathologies in different organ systems. Consequently, circulating miRs have been proposed as useful biomarkers of many disease states, including drug-induced tissue injury. miRs have shown potential to support or even replace the existing traditional biomarkers of drug-induced toxicity in terms of sensitivity and specificity, and there is some evidence for their improved diagnostic and prognostic value. However, several pre-analytical and analytical challenges, mainly associated with assay standardization, require solutions before circulating miRs can be successfully translated into the clinic. This review will consider the value and potential for the use of circulating miRs in drug-safety assessment and describe a systems approach to the analysis of the miRNAome in the discovery setting, as well as highlighting standardization issues that at this stage prevent their clinical use as biomarkers. Highlighting these challenges will hopefully drive future research into finding appropriate solutions, and eventually circulating miRs may be translated to the clinic where their undoubted biomarker potential can be used to benefit patients in rapid, easy to use, point-of-care test systems.


Asunto(s)
Biomarcadores Farmacológicos , MicroARNs/sangre , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos/diagnóstico , Humanos , MicroARNs/análisis , Sensibilidad y Especificidad
3.
Bioinformatics ; 35(20): 4020-4028, 2019 10 15.
Artículo en Inglés | MEDLINE | ID: mdl-30895309

RESUMEN

MOTIVATION: Disease is often manifested via changes in transcript and protein abundance. MicroRNAs (miRNAs) are instrumental in regulating protein abundance and may measurably influence transcript levels. miRNAs often target more than one mRNA (for humans, the average is three), and mRNAs are often targeted by more than one miRNA (for the genes considered in this study, the average is also three). Therefore, it is difficult to determine the miRNAs that may cause the observed differential gene expression. We present a novel approach, maTE, which is based on machine learning, that integrates information about miRNA target genes with gene expression data. maTE depends on the availability of a sufficient amount of patient and control samples. The samples are used to train classifiers to accurately classify the samples on a per miRNA basis. Multiple high scoring miRNAs are used to build a final classifier to improve separation. RESULTS: The aim of the study is to find a set of miRNAs causing the regulation of their target genes that best explains the difference between groups (e.g. cancer versus control). maTE provides a list of significant groups of genes where each group is targeted by a specific miRNA. For the datasets used in this study, maTE generally achieves an accuracy well above 80%. Also, the results show that when the accuracy is much lower (e.g. ∼50%), the set of miRNAs provided is likely not causative of the difference in expression. This new approach of integrating miRNA regulation with expression data yields powerful results and is independent of external labels and training data. Thereby, this approach allows new avenues for exploring miRNA regulation and may enable the development of miRNA-based biomarkers and drugs. AVAILABILITY AND IMPLEMENTATION: The KNIME workflow, implementing maTE, is available at Bioinformatics online. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
MicroARNs/genética , Perfilación de la Expresión Génica , Humanos , Aprendizaje Automático , Neoplasias , ARN Mensajero
4.
Bioinformatics ; 33(6): 923-925, 2017 03 15.
Artículo en Inglés | MEDLINE | ID: mdl-28039164

RESUMEN

Motivation: Protein synthesis is not a straight forward process and one gene locus can produce many isoforms, for example, by starting mRNA translation from alternative start sites. altORF evaluator (altORFev) predicts alternative open reading frames within eukaryotic mRNA translated by a linear scanning mechanism and its modifications (leaky scanning and reinitiation). The program reveals the efficiently translated altORFs recognized by the majority of 40S ribosomal subunits landing on the 5'-end of an mRNA. This information aids to reveal the functions of eukaryotic genes connected to synthesis of either unknown isoforms of annotated proteins or new unrelated polypeptides. Availability and Implementation: altORFev is available at http://www.bionet.nsc.ru/AUGWeb/ and has been developed in Java 1.8 using the BioJava library; and the Vaadin framework to produce the web service. Contact: ak@bionet.nsc.ru.


Asunto(s)
Genómica/métodos , Sistemas de Lectura Abierta , ARN Mensajero/metabolismo , Programas Informáticos , Eucariontes/genética , Biosíntesis de Proteínas , Subunidades Ribosómicas Pequeñas de Eucariotas/metabolismo , Análisis de Secuencia de ARN/métodos
5.
BMC Bioinformatics ; 18(1): 170, 2017 Mar 14.
Artículo en Inglés | MEDLINE | ID: mdl-28292266

RESUMEN

BACKGROUND: Post-transcriptional gene dysregulation can be a hallmark of diseases like cancer and microRNAs (miRNAs) play a key role in the modulation of translation efficiency. Known pre-miRNAs are listed in miRBase, and they have been discovered in a variety of organisms ranging from viruses and microbes to eukaryotic organisms. The computational detection of pre-miRNAs is of great interest, and such approaches usually employ machine learning to discriminate between miRNAs and other sequences. Many features have been proposed describing pre-miRNAs, and we have previously introduced the use of sequence motifs and k-mers as useful ones. There have been reports of xeno-miRNAs detected via next generation sequencing. However, they may be contaminations and to aid that important decision-making process, we aimed to establish a means to differentiate pre-miRNAs from different species. RESULTS: To achieve distinction into species, we used one species' pre-miRNAs as the positive and another species' pre-miRNAs as the negative training and test data for the establishment of machine learned models based on sequence motifs and k-mers as features. This approach resulted in higher accuracy values between distantly related species while species with closer relation produced lower accuracy values. CONCLUSIONS: We were able to differentiate among species with increasing success when the evolutionary distance increases. This conclusion is supported by previous reports of fast evolutionary changes in miRNAs since even in relatively closely related species a fairly good discrimination was possible.


Asunto(s)
MicroARNs/metabolismo , Animales , Secuencia de Bases , Fabaceae/clasificación , Fabaceae/genética , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , MicroARNs/química , MicroARNs/genética , Filogenia , Precursores del ARN/genética , Precursores del ARN/metabolismo
6.
Mol Genet Genomics ; 292(4): 847-855, 2017 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-28386640

RESUMEN

Spinach is a popular leafy green vegetable due to its nutritional composition. It contains high concentrations of vitamins A, E, C, and K, and folic acid. Development of genetic markers for spinach is important for diversity and breeding studies. In this work, Next Generation Sequencing (NGS) technology was used to develop genomic simple sequence repeat (SSR) markers. After cleaning and contig assembly, the sequence encompassed 2.5% of the 980 Mb spinach genome. The contigs were mined for SSRs. A total of 3852 SSRs were detected. Of these, 100 primer pairs were tested and 85% were found to yield clear, reproducible amplicons. These 85 markers were then applied to 48 spinach accessions from worldwide origins, resulting in 389 alleles with 89% polymorphism. The average gene diversity (GD) value of the markers (based on a GD calculation that ranges from 0 to 0.5) was 0.25. Our results demonstrated that the newly developed SSR markers are suitable for assessing genetic diversity and population structure of spinach germplasm. The markers also revealed clustering of the accessions based on geographical origin with clear separation of Far Eastern accessions which had the overall highest genetic diversity when compared with accessions from Persia, Turkey, Europe, and the USA. Thus, the SSR markers have good potential to provide valuable information for spinach breeding and germplasm management. Also they will be helpful for genome mapping and core collection establishment.


Asunto(s)
ADN de Plantas/genética , Genoma de Planta/genética , Repeticiones de Microsatélite/genética , Spinacia oleracea/genética , Secuencia de Bases , Mapeo Cromosómico , Marcadores Genéticos/genética , Variación Genética , Geografía , Secuenciación de Nucleótidos de Alto Rendimiento , Análisis de Secuencia de ADN
7.
Genes (Basel) ; 15(5)2024 04 29.
Artículo en Inglés | MEDLINE | ID: mdl-38790203

RESUMEN

MicroRNAs (miRNAs), a class of small, non-coding RNAs, play a pivotal role in regulating gene expression at the post-transcriptional level. These regulatory molecules are integral to many biological processes and have been implicated in the pathogenesis of various diseases, including Human Immunodeficiency Virus (HIV) infection. This review aims to cover the current understanding of the multifaceted roles miRNAs assume in the context of HIV infection and pathogenesis. The discourse is structured around three primary focal points: (i) elucidation of the mechanisms through which miRNAs regulate HIV replication, encompassing both direct targeting of viral transcripts and indirect modulation of host factors critical for viral replication; (ii) examination of the modulation of miRNA expression by HIV, mediated through either viral proteins or the activation of cellular pathways consequent to viral infection; and (iii) assessment of the impact of miRNAs on the immune response and the progression of disease in HIV-infected individuals. Further, this review delves into the potential utility of miRNAs as biomarkers and therapeutic agents in HIV infection, underscoring the challenges and prospects inherent to this line of inquiry. The synthesis of current evidence positions miRNAs as significant modulators of the host-virus interplay, offering promising avenues for enhancing the diagnosis, treatment, and prevention of HIV infection.


Asunto(s)
Infecciones por VIH , MicroARNs , Replicación Viral , Humanos , MicroARNs/genética , Infecciones por VIH/genética , Infecciones por VIH/virología , Replicación Viral/genética , VIH-1/genética , Interacciones Huésped-Patógeno/genética , Biomarcadores , Regulación de la Expresión Génica
8.
Front Mol Biosci ; 11: 1336336, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38380430

RESUMEN

Alternative polyadenylation (APA) increases transcript diversity through the generation of isoforms with varying 3' untranslated region (3' UTR) lengths. As the 3' UTR harbors regulatory element target sites, such as miRNAs or RNA-binding proteins, changes in this region can impact post-transcriptional regulation and translation. Moreover, the APA landscape can change based on the cell type, cell state, or condition. Given that APA events can impact protein expression, investigating translational control is crucial for comprehending the overall cellular regulation process. Revisiting data from polysome profiling followed by RNA sequencing, we investigated the cardiomyogenic differentiation of pluripotent stem cells by identifying the transcripts that show dynamic 3' UTR lengthening or shortening, which are being actively recruited to ribosome complexes. Our findings indicate that dynamic 3' UTR lengthening is not exclusively associated with differential expression during cardiomyogenesis but rather with recruitment to polysomes. We confirm that the differentiated state of cardiomyocytes shows a preference for shorter 3' UTR in comparison to the pluripotent stage although preferences vary during the days of the differentiation process. The most distinct regulatory changes are seen in day 4 of differentiation, which is the mesoderm commitment time point of cardiomyogenesis. After identifying the miRNAs that would target specifically the alternative 3' UTR region of the isoforms, we constructed a gene regulatory network for the cardiomyogenesis process, in which genes related to the cell cycle were identified. Altogether, our work sheds light on the regulation and dynamic 3' UTR changes of polysome-recruited transcripts that take place during the cardiomyogenic differentiation of pluripotent stem cells.

9.
Curr Pharm Biotechnol ; 24(7): 825-831, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-35619299

RESUMEN

Diseases such as cancer are often defined by dysregulation of gene expression. Noncoding RNAs (ncRNA) such as microRNAs are involved in gene expression and cell-cell communication. Many other ncRNAs exist, such as circular RNAs and small nucleolar RNAs. A wealth of knowledge is available for many ncRNAs, but the information is federated in many databases. A small number of highly complementary ncRNA databases are discussed in this work. Their relevance for cancer research is highlighted, and some of the current problems and limitations are revealed. A central or shared database enforcing community reporting and quality standards is needed in the future. • RNA-seq • Noncoding RNAs • Databases • Data repositories.


Asunto(s)
MicroARNs , Neoplasias , ARN Largo no Codificante , Humanos , ARN Largo no Codificante/genética , ARN no Traducido/genética , ARN no Traducido/metabolismo , MicroARNs/genética , MicroARNs/metabolismo , Neoplasias/genética
10.
J Integr Bioinform ; 20(1)2023 Mar 01.
Artículo en Inglés | MEDLINE | ID: mdl-36812104

RESUMEN

Science has become a highly competitive undertaking concerning, for example, resources, positions, students, and publications. At the same time, the number of journals presenting scientific findings skyrockets while the knowledge increase per manuscript seems to be diminishing. Science has also become ever more dependent on computational analyses. For example, virtually all biomedical applications involve computational data analysis. The science community develops many computational tools, and there are numerous alternatives for many computational tasks. The same is true for workflow management systems, leading to a tremendous duplication of efforts. Software quality is often of low concern, and typically, a small dataset is used as a proof of principle to support rapid publication. Installation and usage of such tools are complicated, so virtual machine images, containers, and package managers are employed more frequently. These simplify installation and ease of use but do not solve the software quality issue and duplication of effort. We believe that a community-wide collaboration is needed to (a) ensure software quality, (b) increase reuse of code, (c) force proper software review, (c) increase testing, and (d) make interoperability more seamless. Such a science software ecosystem will overcome current issues and increase trust in current data analyses.


Asunto(s)
Ecosistema , Confianza , Humanos , Biología Computacional/métodos , Programas Informáticos , Flujo de Trabajo
11.
Turk J Biol ; 47(6): 366-382, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-38681776

RESUMEN

Deep learning is a powerful machine learning technique that can learn from large amounts of data using multiple layers of artificial neural networks. This paper reviews some applications of deep learning in bioinformatics, a field that deals with analyzing and interpreting biological data. We first introduce the basic concepts of deep learning and then survey the recent advances and challenges of applying deep learning to various bioinformatics problems, such as genome sequencing, gene expression analysis, protein structure prediction, drug discovery, and disease diagnosis. We also discuss future directions and opportunities for deep learning in bioinformatics. We aim to provide an overview of deep learning so that bioinformaticians applying deep learning models can consider all critical technical and ethical aspects. Thus, our target audience is biomedical informatics researchers who use deep learning models for inference. This review will inspire more bioinformatics researchers to adopt deep-learning methods for their research questions while considering fairness, potential biases, explainability, and accountability.

12.
Amino Acids ; 42(1): 129-38, 2012 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-20473535

RESUMEN

Mass spectrometry (MS)-based proteomics, by itself, is a vast and complex area encompassing various mass spectrometers, different spectra, and search result representations. When the aim is quantitation performed in different scanning modes at different MS levels, matters become additionally complex. Quantitation of post-translational modifications (PTM) represents the greatest challenge among these endeavors. Many different approaches to quantitation have been described and some of these can be directly applied to the quantitation of PTMs. The amount of data produced via MS, however, makes manual data interpretation impractical. Therefore, specialized software tools meet this challenge. Any software currently able to quantitate differentially labeled samples may theoretically be adapted to quantitate differential PTM expression among samples as well. Due to the heterogeneity of mass spectrometry-based proteomics; this review will focus on quantitation of PTM using liquid chromatography followed by one or more stages of mass spectrometry. Currently available free software, which either allow analysis of PTM or are easily adaptable for this purpose, is briefly reviewed in this paper. Selected studies, especially those related to phosphoproteomics, shall be used to highlight the current ability to quantitate PTMs.


Asunto(s)
Biología Computacional , Procesamiento Proteico-Postraduccional , Proteínas/química , Proteínas/metabolismo , Cromatografía Liquida , Espectrometría de Masas
13.
Methods Mol Biol ; 2257: 235-254, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-34432282

RESUMEN

Gene regulation is of utmost importance to cell homeostasis; thus, any dysregulation in it often leads to disease. MicroRNAs (miRNAs) are involved in posttranscriptional gene regulation and consequently, their dysregulation has been associated with many diseases.MiRBase version 21 contains microRNAs from about 200 species organized into about 70 clades. It has been shown that not all miRNAs collected in the database are likely to be real and, therefore, novel routes to delineate between correct and false miRNAs should be explored. We introduce a novel approach based on k-mer frequencies and machine learning that assigns an unknown/unlabeled miRNA to its most likely clade/species of origin. A simple way to filter new data would be to ensure that the novel miRNA categorizes closely to the species it is said to originate from. For that, an ensemble classifier of multiple two-class random forest classifiers was designed, where each random forest was trained on one species-clade pair. The approach was tested with different sampling methods on a dataset that was taken from miRBase version 21 and it was evaluated using a hierarchical F-measure. The approach predicted 81% to 94% of the test data correctly, depending on the sampling method. This is the first classifier that can classify miRNAs to their species of origin. This method will aid in the evaluation of miRNA database integrity and analysis of noisy miRNA samples.


Asunto(s)
MicroARNs/genética , Regulación de la Expresión Génica , Aprendizaje Automático
14.
Methods Mol Biol ; 2257: 423-438, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-34432289

RESUMEN

Mature microRNAs (miRNAs) are short RNA sequences about 18-24 nucleotide long, which provide the recognition key within RISC for the posttranscriptional regulation of target RNAs. Considering the canonical pathway, mature miRNAs are produced via a multistep process. Their transcription (pri-miRNAs) and first processing step via the microprocessor complex (pre-miRNAs) occur in the nucleus. Then they are exported into the cytosol, processed again by Dicer (dsRNA) and finally a single strand (mature miRNA) is incorporated into RISC (miRISC). The sequence of the incorporated miRNA provides the function of RNA target recognition via hybridization. Following binding of the target, the mRNA is either degraded or translation is inhibited, which ultimately leads to less protein production. Conversely, it has been shown that binding within the 5' UTR of the mRNA can lead to an increase in protein product. Regulation of homeostasis is very important for a cell; therefore, all steps in the miRNA-based regulation pathway, from transcription to the incorporation of the mature miRNA into RISC, are under tight control. While much research effort has been exerted in this area, the knowledgebase is not sufficient for accurately modelling miRNA regulation computationally. The computational prediction of miRNAs is, however, necessary because it is not feasible to investigate all possible pairs of a miRNA and its target, let alone miRNAs and their targets. We here point out open challenges important for computational modelling or for our general understanding of miRNA-based regulation and show how their investigation is beneficial. It is our hope that this collection of challenges will lead to their resolution in the near future.


Asunto(s)
MicroARNs/genética , Regulación de la Expresión Génica , Genómica , ARN Mensajero
15.
Expert Rev Proteomics ; 8(5): 645-57, 2011 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-21999834

RESUMEN

Proteomics is the study of proteins, their time- and location-dependent expression profiles, as well as their modifications and interactions. Mass spectrometry is useful to investigate many of the questions asked in proteomics. Database search methods are typically employed to identify proteins from complex mixtures. However, databases are not often available or, despite their availability, some sequences are not readily found therein. To overcome this problem, de novo sequencing can be used to directly assign a peptide sequence to a tandem mass spectrometry spectrum. Many algorithms have been proposed for de novo sequencing and a selection of them are detailed in this article. Although a standard accuracy measure has not been agreed upon in the field, relative algorithm performance is discussed. The current state of the de novo sequencing is assessed thereafter and, finally, examples are used to construct possible future perspectives of the field.


Asunto(s)
Péptidos/química , Análisis de Secuencia de Proteína/métodos , Espectrometría de Masas en Tándem/métodos , Algoritmos , Secuencia de Aminoácidos , Bases de Datos de Proteínas , Humanos , Datos de Secuencia Molecular , Proteínas/química , Proteómica/métodos , Análisis de Secuencia de Proteína/tendencias , Programas Informáticos
16.
J Integr Bioinform ; 18(1): 19-26, 2021 Mar 16.
Artículo en Inglés | MEDLINE | ID: mdl-33721918

RESUMEN

SARS-CoV-2 has spread worldwide and caused social, economic, and health turmoil. The first genome assembly of SARS-CoV-2 was produced in Wuhan, and it is widely used as a reference. Subsequently, more than a hundred additional SARS-CoV-2 genomes have been sequenced. While the genomes appear to be mostly identical, there are variations. Therefore, an alignment of all available genomes and the derived consensus sequence could be used as a reference, better serving the science community. Variations are significant, but representing them in a genome browser can become, especially if their sequences are largely identical. Here we summarize the variation in one track. Other information not currently found in genome browsers for SARS-CoV-2, such as predicted miRNAs and predicted TRS as well as secondary structure information, were also added as tracks to the consensus genome. We believe that a genome browser based on the consensus sequence is better suited when considering worldwide effects and can become a valuable resource in the combating of COVID-19. The genome browser is available at http://cov.iaba.online.


Asunto(s)
COVID-19 , Genoma Viral/genética , SARS-CoV-2/genética , Secuencia de Bases , Humanos , Programas Informáticos
17.
Amino Acids ; 38(4): 1075-87, 2010 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-19575279

RESUMEN

Determining the differential expression of proteins under different conditions is of major importance in proteomics. Since mass spectrometry-based proteomics is often used to quantify proteins, several labelling strategies have been developed. While these are generally more precise than label-free quantitation approaches, they imply specifically designed experiments which also require knowledge about peptides that are expected to be measured and need to be modified. We recently designed the 2DB database which aids storage, analysis, and publication of data from mass spectrometric experiments to identify proteins. This database can aid identifying peptides which can be used for quantitation. Here an extension to the database application, named MSMAG, is presented which allows for more detailed analysis of the distribution of peptides and their associated proteins over the fractions of an experiment. Furthermore, given several biological samples in the database, label-free quantitation can be performed. Thus, interesting proteins, which may warrant further investigation, can be identified en passant while performing high-throughput proteomics studies.


Asunto(s)
Bases de Datos de Proteínas , Proteínas/análisis , Proteómica/métodos , Simulación por Computador , Minería de Datos , Procesamiento Automatizado de Datos/métodos , Perfilación de la Expresión Génica/métodos , Regulación de la Expresión Génica , Ensayos Analíticos de Alto Rendimiento , Fragmentos de Péptidos/análisis , Péptidos/análisis , Programas Informáticos , Espectrometría de Masas en Tándem
18.
PeerJ ; 8: e10216, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-33150092

RESUMEN

For the identification and sequencing of proteins, mass spectrometry (MS) has become the tool of choice and, as such, drives proteomics. MS/MS spectra need to be assigned a peptide sequence for which two strategies exist. Either database search or de novo sequencing can be employed to establish peptide spectrum matches. For database search, mzIdentML is the current community standard for data representation. There is no community standard for representing de novo sequencing results, but we previously proposed the de novo markup language (DNML). At the moment, each de novo sequencing solution uses different data representation, complicating downstream data integration, which is crucial since ensemble predictions may be more useful than predictions of a single tool. We here propose the de novo MS Ontology (DNMSO), which can, for example, provide many-to-many mappings between spectra and peptide predictions. Additionally, an application programming interface (API) that supports any file operation necessary for de novo sequencing from spectra input to reading, writing, creating, of the DNMSO format, as well as conversion from many other file formats, has been implemented. This API removes all overhead from the production of de novo sequencing tools and allows developers to concentrate on algorithm development completely. We make the API and formal descriptions of the format freely available at https://github.com/savastakan/dnmso.

19.
J Integr Bioinform ; 16(3)2019 May 30.
Artículo en Inglés | MEDLINE | ID: mdl-31145694

RESUMEN

Big data and complex analysis workflows (pipelines) are common issues in data driven science such as bioinformatics. Large amounts of computational tools are available for data analysis. Additionally, many workflow management systems to piece together such tools into data analysis pipelines have been developed. For example, more than 50 computational tools for read mapping are available representing a large amount of duplicated effort. Furthermore, it is unclear whether these tools are correct and only a few have a user base large enough to have encountered and reported most of the potential problems. Bringing together many largely untested tools in a computational pipeline must lead to unpredictable results. Yet, this is the current state. While presently data analysis is performed on personal computers/workstations/clusters, the future will see development and analysis shift to the cloud. None of the workflow management systems is ready for this transition. This presents the opportunity to build a new system, which will overcome current duplications of effort, introduce proper testing, allow for development and analysis in public and private clouds, and include reporting features leading to interactive documents.


Asunto(s)
Biología Computacional , Internet , Programas Informáticos , Interfaz Usuario-Computador , Flujo de Trabajo
20.
Methods Mol Biol ; 1912: 175-196, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-30635894

RESUMEN

Proteins have a strong influence on the phenotype and their aberrant expression leads to diseases. MicroRNAs (miRNAs) are short RNA sequences which posttranscriptionally regulate protein expression. This regulation is driven by miRNAs acting as recognition sequences for their target mRNAs within a larger regulatory machinery. A miRNA can have many target mRNAs and an mRNA can be targeted by many miRNAs which makes it difficult to experimentally discover all miRNA-mRNA interactions. Therefore, computational methods have been developed for miRNA detection and miRNA target prediction. An abundance of available computational tools makes selection difficult. Additionally, interactions are not currently the focus of investigation although they more accurately define the regulation than pre-miRNA detection or target prediction could perform alone. We define an interaction including the miRNA source and the mRNA target. We present computational methods allowing the investigation of these interactions as well as how they can be used to extend regulatory pathways. Finally, we present a list of points that should be taken into account when investigating miRNA-mRNA interactions. In the future, this may lead to better understanding of functional interactions which may pave the way for disease marker discovery and design of miRNA-based drugs.


Asunto(s)
Biología Computacional/métodos , Redes Reguladoras de Genes , MicroARNs/metabolismo , ARN Mensajero/metabolismo , Animales , Biología Computacional/instrumentación , Bases de Datos Genéticas , Perfilación de la Expresión Génica/instrumentación , Perfilación de la Expresión Génica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/instrumentación , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Aprendizaje Automático , MicroARNs/aislamiento & purificación , ARN Mensajero/aislamiento & purificación , Análisis de Secuencia de ARN/instrumentación , Análisis de Secuencia de ARN/métodos , Programas Informáticos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA