Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 20
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
Brief Bioinform ; 24(6)2023 09 22.
Artigo em Inglês | MEDLINE | ID: mdl-37756593

RESUMO

Single-cell RNA-sequencing (scRNA-seq) allows for obtaining genomic and transcriptomic profiles of individual cells. That data make it possible to characterize tissues at the cell level. In this context, one of the main analyses exploiting scRNA-seq data is identifying the cell types within tissue to estimate the quantitative composition of cell populations. Due to the massive amount of available scRNA-seq data, automatic classification approaches for cell typing, based on the most recent deep learning technology, are needed. Here, we present the gene ontology-driven wide and deep learning (GOWDL) model for classifying cell types in several tissues. GOWDL implements a hybrid architecture that considers the functional annotations found in Gene Ontology and the marker genes typical of specific cell types. We performed cross-validation and independent external testing, comparing our algorithm with 12 other state-of-the-art predictors. Classification scores demonstrated that GOWDL reached the best results over five different tissues, except for recall, where we got about 92% versus 97% of the best tool. Finally, we presented a case study on classifying immune cell populations in breast cancer using a hierarchical approach based on GOWDL.


Assuntos
Aprendizado Profundo , Ontologia Genética , Análise da Expressão Gênica de Célula Única , Algoritmos , Genômica
2.
Sensors (Basel) ; 23(3)2023 Jan 31.
Artigo em Inglês | MEDLINE | ID: mdl-36772592

RESUMO

Breast Cancer (BC) is the most common cancer among women worldwide and is characterized by intra- and inter-tumor heterogeneity that strongly contributes towards its poor prognosis. The Estrogen Receptor (ER), Progesterone Receptor (PR), Human Epidermal Growth Factor Receptor 2 (HER2), and Ki67 antigen are the most examined markers depicting BC heterogeneity and have been shown to have a strong impact on BC prognosis. Radiomics can noninvasively predict BC heterogeneity through the quantitative evaluation of medical images, such as Magnetic Resonance Imaging (MRI), which has become increasingly important in the detection and characterization of BC. However, the lack of comprehensive BC datasets in terms of molecular outcomes and MRI modalities, and the absence of a general methodology to build and compare feature selection approaches and predictive models, limit the routine use of radiomics in the BC clinical practice. In this work, a new radiomic approach based on a two-step feature selection process was proposed to build predictors for ER, PR, HER2, and Ki67 markers. An in-house dataset was used, containing 92 multiparametric MRIs of patients with histologically proven BC and all four relevant biomarkers available. Thousands of radiomic features were extracted from post-contrast and subtracted Dynamic Contrast-Enanched (DCE) MRI images, Apparent Diffusion Coefficient (ADC) maps, and T2-weighted (T2) images. The two-step feature selection approach was used to identify significant radiomic features properly and then to build the final prediction models. They showed remarkable results in terms of F1-score for all the biomarkers: 84%, 63%, 90%, and 72% for ER, HER2, Ki67, and PR, respectively. When possible, the models were validated on the TCGA/TCIA Breast Cancer dataset, returning promising results (F1-score = 88% for the ER+/ER- classification task). The developed approach efficiently characterized BC heterogeneity according to the examined molecular biomarkers.


Assuntos
Neoplasias da Mama , Humanos , Feminino , Neoplasias da Mama/diagnóstico por imagem , Neoplasias da Mama/patologia , Antígeno Ki-67 , Imageamento por Ressonância Magnética/métodos , Imagem de Difusão por Ressonância Magnética/métodos , Prognóstico , Receptores de Estrogênio
3.
Int J Mol Sci ; 23(22)2022 Nov 17.
Artigo em Inglês | MEDLINE | ID: mdl-36430688

RESUMO

Many biological systems are characterised by biological entities, as well as their relationships. These interaction networks can be modelled as graphs, with nodes representing bio-entities, such as molecules, and edges representing relations among them, such as interactions. Due to the current availability of a huge amount of biological data, it is very important to consider in silico analysis methods based on, for example, machine learning, that could take advantage of the inner graph structure of the data in order to improve the quality of the results. In this scenario, graph neural networks (GNNs) are recent computational approaches that directly deal with graph-structured data. In this paper, we present a GNN network for the analysis of siRNA-mRNA interaction networks. siRNAs, in fact, are small RNA molecules that are able to bind to target genes and silence them. These events make siRNAs key molecules as RNA interference agents in many biological interaction networks related to severe diseases such as cancer. In particular, our GNN approach allows for the prediction of the siRNA efficacy, which measures the siRNA's ability to bind and silence a gene target. Tested on benchmark datasets, our proposed method overcomes other machine learning algorithms, including the state-of-the-art predictor based on the convolutional neural network, reaching a Pearson correlation coefficient of approximately 73.6%. Finally, we proposed a case study where the efficacy of a set of siRNAs is predicted for a gene of interest. To the best of our knowledge, GNNs were used for the first time in this scenario.


Assuntos
Aprendizado de Máquina , Redes Neurais de Computação , RNA Interferente Pequeno/genética , Algoritmos , Sequência de Bases
4.
Life (Basel) ; 12(1)2022 Jan 04.
Artigo em Inglês | MEDLINE | ID: mdl-35054462

RESUMO

In consideration of the increasing prevalence of COVID-19 cases in several countries and the resulting demand for unbiased sequencing approaches, we performed a direct RNA sequencing (direct RNA seq.) experiment using critical oropharyngeal swab samples collected from Italian patients infected with SARS-CoV-2 from the Palermo region in Sicily. Here, we identified the sequences SARS-CoV-2 directly in RNA extracted from critical samples using the Oxford Nanopore MinION technology without prior cDNA retrotranscription. Using an appropriate bioinformatics pipeline, we could identify mutations in the nucleocapsid (N) gene, which have been reported previously in studies conducted in other countries. In conclusion, to the best of our knowledge, the technique used in this study has not been used for SARS-CoV-2 detection previously owing to the difficulties in the extraction of RNA of sufficient quantity and quality from routine oropharyngeal swabs. Despite these limitations, this approach provides the advantages of true native RNA sequencing and does not include amplification steps that could introduce systematic errors. This study can provide novel information relevant to the current strategies adopted in SARS-CoV-2 next-generation sequencing.

5.
Int J Mol Sci ; 22(20)2021 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-34681801

RESUMO

Cytochromes P450 (CYP) are enzymes responsible for the biotransformation of most endogenous and exogenous agents. The expression of each CYP is influenced by a unique combination of mechanisms and factors including genetic polymorphisms, induction by xenobiotics, and regulation by cytokines and hormones. In recent years, Ciona robusta, one of the closest living relatives of vertebrates, has become a model in various fields of biology, in particular for studying inflammatory response. Using an in vivo LPS exposure strategy, next-generation sequencing (NGS) and qRT-PCR combined with bioinformatics and in silico analyses, compared whole pharynx transcripts from naïve and LPS-exposed C. robusta, and we provide the first view of cytochrome genes expression and miRNA regulation in the inflammatory response induced by LPS in a hematopoietic organ. In C. robusta, cytochromes belonging to 2B,2C, 2J, 2U, 4B and 4F subfamilies were deregulated and miRNA network interactions suggest that different conserved and species-specific miRNAs are involved in post-transcriptional regulation of cytochrome genes and that there could be an interplay between specific miRNAs regulating both inflammation and cytochrome molecules in the inflammatory response in C. robusta.


Assuntos
Ciona intestinalis , Sistema Enzimático do Citocromo P-450 , Inflamação/genética , Animais , Ciona intestinalis/efeitos dos fármacos , Ciona intestinalis/genética , Sistema Enzimático do Citocromo P-450/efeitos dos fármacos , Sistema Enzimático do Citocromo P-450/genética , Perfilação da Expressão Gênica , Regulação Enzimológica da Expressão Gênica/efeitos dos fármacos , Sequenciamento de Nucleotídeos em Larga Escala , Inflamação/induzido quimicamente , Inflamação/metabolismo , Inflamação/patologia , Lipopolissacarídeos , Família Multigênica/efeitos dos fármacos , Família Multigênica/genética , Faringe/efeitos dos fármacos , Faringe/metabolismo , Faringe/patologia , Filogenia , Transcriptoma/efeitos dos fármacos
6.
Cancer Immunol Res ; 9(7): 825-837, 2021 07.
Artigo em Inglês | MEDLINE | ID: mdl-33941587

RESUMO

Tumors undergo dynamic immunoediting as part of a process that balances immunologic sensing of emerging neoantigens and evasion from immune responses. Tumor-infiltrating lymphocytes (TIL) comprise heterogeneous subsets of peripheral T cells characterized by diverse functional differentiation states and dependence on T-cell receptor (TCR) specificity gained through recombination events during their development. We hypothesized that within the tumor microenvironment (TME), an antigenic milieu and immunologic interface, tumor-infiltrating peripheral T cells could reexpress key elements of the TCR recombination machinery, namely, Rag1 and Rag2 recombinases and Tdt polymerase, as a potential mechanism involved in the revision of TCR specificity. Using two syngeneic invasive breast cancer transplantable models, 4T1 and TS/A, we observed that Rag1, Rag2, and Dntt in situ mRNA expression characterized rare tumor-infiltrating T cells. In situ expression of the transcripts was increased in coisogenic Mlh1-deficient tumors, characterized by genomic overinstability, and was also modulated by PD-1 immune-checkpoint blockade. Through immunolocalization and mRNA hybridization analyses, we detected the presence of rare TDT+RAG1/2+ cells populating primary tumors and draining lymph nodes in human invasive breast cancer. Analysis of harmonized single-cell RNA-sequencing data sets of human cancers identified a very small fraction of tumor-associated T cells, characterized by the expression of recombination/revision machinery transcripts, which on pseudotemporal ordering corresponded to differentiated effector T cells. We offer thought-provoking evidence of a TIL microniche marked by rare transcripts involved in TCR shaping.


Assuntos
Neoplasias da Mama/imunologia , Linfócitos T CD8-Positivos/imunologia , Linfócitos do Interstício Tumoral/imunologia , Recombinação Genética/imunologia , Especificidade do Receptor de Antígeno de Linfócitos T/genética , Adulto , Idoso , Idoso de 80 Anos ou mais , Animais , Mama/imunologia , Mama/patologia , Neoplasias da Mama/genética , Neoplasias da Mama/patologia , Linfócitos T CD8-Positivos/metabolismo , Dano ao DNA/imunologia , DNA Nucleotidilexotransferase/genética , DNA Nucleotidilexotransferase/metabolismo , Proteínas de Ligação a DNA/metabolismo , Conjuntos de Dados como Assunto , Modelos Animais de Doenças , Feminino , Proteínas de Homeodomínio/metabolismo , Humanos , Linfócitos do Interstício Tumoral/metabolismo , Camundongos , Camundongos Knockout , Pessoa de Meia-Idade , Proteína 1 Homóloga a MutL/genética , Proteína 1 Homóloga a MutL/metabolismo , Proteínas Nucleares/metabolismo , RNA-Seq , Receptores de Antígenos de Linfócitos T , Análise de Célula Única , Microambiente Tumoral/genética , Microambiente Tumoral/imunologia
7.
Front Immunol ; 12: 664534, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34025666

RESUMO

The 2,2'4,4'-tetrabromodiphenyl ether (PBDE-47) is one of the most prominent PBDE congeners detected in the environment and in animal and human tissues. Animal model experiments suggested the occurrence of PBDE-induced immunotoxicity leading to different outcomes and recently we demonstrated that this substance can impair macrophage and basophil activities. In this manuscript, we decided to further examine the effects induced by PBDE-47 treatment on innate immune response by looking at the intracellular expression profile of miRNAs as well as the biogenesis, cargo content and activity of human M(LPS) macrophage cell-derived small extracellular vesicles (sEVs). Microarray and in silico analysis demonstrated that PBDE-47 can induce some epigenetic effects in M(LPS) THP-1 cells modulating the expression of a set of intracellular miRNAs involved in biological pathways regulating the expression of estrogen-mediated signaling and immune responses with particular reference to M1/M2 differentiation. In addition to the cell-intrinsic modulation of intracellular miRNAs, we demonstrated that PBDE-47 could also interfere with the biogenesis of sEVs increasing their number and selecting a de novo population of sEVs. Moreover, PBDE-47 induced the overload of specific immune related miRNAs in PBDE-47 derived sEVs. Finally, culture experiments with naïve M(LPS) macrophages demonstrated that purified PBDE-47 derived sEVs can modulate macrophage immune response exacerbating the LPS-induced pro-inflammatory response inducing the overexpression of the IL-6 and the MMP9 genes. Data from this study demonstrated that PBDE-47 can perturb the innate immune response at different levels modulating the intracellular expression of miRNAs but also interfering with the biogenesis, cargo content and functional activity of M(LPS) macrophage cell-derived sEVs.


Assuntos
Vesículas Extracelulares/metabolismo , Regulação da Expressão Gênica/efeitos dos fármacos , Éteres Difenil Halogenados/farmacologia , Lipopolissacarídeos/imunologia , MicroRNAs/genética , Transcriptoma , Biomarcadores , Biologia Computacional/métodos , Citocinas/metabolismo , Perfilação da Expressão Gênica , Humanos , Lipopolissacarídeos/efeitos adversos , Macrófagos/efeitos dos fármacos , Macrófagos/imunologia , Macrófagos/metabolismo , Células THP-1
8.
Int J Mol Sci ; 22(7)2021 Mar 28.
Artigo em Inglês | MEDLINE | ID: mdl-33800649

RESUMO

The transforming growth factor-ß (TGF-ß) family of cytokines performs a multifunctional signaling, which is integrated and coordinated in a signaling network that involves other pathways, such as Wintless, Forkhead box-O (FOXO) and Hedgehog and regulates pivotal functions related to cell fate in all tissues. In the hematopoietic system, TGF-ß signaling controls a wide spectrum of biological processes, from immune system homeostasis to the quiescence and self-renewal of hematopoietic stem cells (HSCs). Recently an important role in post-transcription regulation has been attributed to two type of ncRNAs: microRNAs and pseudogenes. Ciona robusta, due to its philogenetic position close to vertebrates, is an excellent model to investigate mechanisms of post-transcriptional regulation evolutionarily highly conserved in immune homeostasis. The combined use of NGS and bioinformatic analyses suggests that in the pharynx, the hematopoietic organ of Ciona robusta, the Tgf-ß, Wnt, Hedgehog and FoxO pathways are involved in tissue homeostasis, as they are in human. Furthermore, ceRNA network interactions and 3'UTR elements analyses of Tgf-ß, Wnt, Hedgehog and FoxO pathways genes suggest that different miRNAs conserved (cin-let-7d, cin-mir-92c, cin-mir-153), species-specific (cin-mir-4187, cin-mir-4011a, cin-mir-4056, cin-mir-4150, cin-mir-4189, cin-mir-4053, cin-mir-4016, cin-mir-4075), pseudogenes (ENSCING00000011392, ENSCING00000018651, ENSCING00000007698) and mRNA 3'UTR elements are involved in post-transcriptional regulation in an integrated way in C. robusta.


Assuntos
Ciona/metabolismo , Proteína Forkhead Box O1/metabolismo , Regulação da Expressão Gênica , Fator de Crescimento Transformador beta/metabolismo , Proteínas Wnt/metabolismo , Regiões 3' não Traduzidas , Animais , Linhagem da Célula , Biologia Computacional , Proteínas Hedgehog/metabolismo , Hematopoese , Sequenciamento de Nucleotídeos em Larga Escala , Homeostase , Sistema Imunitário , MicroRNAs/metabolismo , Faringe/metabolismo , Mapeamento de Interação de Proteínas , RNA-Seq
9.
BMC Bioinformatics ; 21(Suppl 8): 363, 2020 Sep 16.
Artigo em Inglês | MEDLINE | ID: mdl-32938383

RESUMO

The 16th Annual Meeting of the Bioinformatics Italian Society was held in Palermo, Italy, on June 26-28, 2019. More than 80 scientific contributions were presented, including 4 keynote lectures, 31 oral communications and 49 posters. Also, three workshops were organised before and during the meeting. Full papers from some of the works presented in Palermo were submitted for this Supplement of BMC Bioinformatics. Here, we provide an overview of meeting aims and scope. We also shortly introduce selected papers that have been accepted for publication in this Supplement, for a complete presentation of the outcomes of the meeting.


Assuntos
Biologia Computacional , Humanos , Itália
10.
BMC Bioinformatics ; 20(Suppl 9): 344, 2019 Nov 22.
Artigo em Inglês | MEDLINE | ID: mdl-31757209

RESUMO

BACKGROUND: In silico experiments, with the aid of computer simulation, speed up the process of in vitro or in vivo experiments. Cancer therapy design is often based on signalling pathway. MicroRNAs (miRNA) are small non-coding RNA molecules. In several kinds of diseases, including cancer, hepatitis and cardiovascular diseases, they are often deregulated, acting as oncogenes or tumor suppressors. miRNA therapeutics is based on two main kinds of molecules injection: miRNA mimics, which consists of injection of molecules that mimic the targeted miRNA, and antagomiR, which consists of injection of molecules inhibiting the targeted miRNA. Nowadays, the research is focused on miRNA therapeutics. This paper addresses cancer related signalling pathways to investigate miRNA therapeutics. RESULTS: In order to prove our approach, we present two different case studies: non-small cell lung cancer and melanoma. KEGG signalling pathways are modelled by a digital circuit. A logic value of 1 is linked to the expression of the corresponding gene. A logic value of 0 is linked to the absence (not expressed) gene. All possible relationships provided by a signalling pathway are modelled by logic gates. Mutations, derived according to the literature, are introduced and modelled as well. The modelling approach and analysis are widely discussed within the paper. MiRNA therapeutics is investigated by the digital circuit analysis. The most effective miRNA and combination of miRNAs, in terms of reduction of pathogenic conditions, are obtained. A discussion of obtained results in comparison with literature data is provided. Results are confirmed by existing data. CONCLUSIONS: The proposed study is based on drug discovery and miRNA therapeutics and uses a digital circuit simulation of a cancer pathway. Using this simulation, the most effective combination of drugs and miRNAs for mutated cancer therapy design are obtained and these results were validated by the literature. The proposed modelling and analysis approach can be applied to each human disease, starting from the corresponding signalling pathway.


Assuntos
Lógica , MicroRNAs/genética , Transdução de Sinais/genética , Carcinoma Pulmonar de Células não Pequenas/genética , Simulação por Computador , Regulação Neoplásica da Expressão Gênica , Humanos , Neoplasias Pulmonares/genética , MicroRNAs/metabolismo , Mutação/genética
11.
BMC Bioinformatics ; 20(Suppl 4): 125, 2019 Apr 18.
Artigo em Inglês | MEDLINE | ID: mdl-30999855

RESUMO

The 17th International NETTAB workshop was held in Palermo, Italy, on October 16-18, 2017. The special topic for the meeting was "Methods, tools and platforms for Personalised Medicine in the Big Data Era", but the traditional topics of the meeting series were also included in the event. About 40 scientific contributions were presented, including four keynote lectures, five guest lectures, and many oral communications and posters. Also, three tutorials were organised before and after the workshop. Full papers from some of the best works presented in Palermo were submitted for this Supplement of BMC Bioinformatics. Here, we provide an overview of meeting aims and scope. We also shortly introduce selected papers that have been accepted for publication in this Supplement, for a complete presentation of the outcomes of the meeting.


Assuntos
Biologia Computacional/métodos , Atenção à Saúde , Genômica , Humanos , Itália , Neoplasias/genética , Medicina de Precisão
12.
BMC Bioinformatics ; 19(Suppl 15): 434, 2018 Nov 30.
Artigo em Inglês | MEDLINE | ID: mdl-30497361

RESUMO

BACKGROUND: microRNAs act as regulators of gene expression interacting with their gene targets. Current bioinformatics services, such as databases of validated miRNA-target interactions and prediction tools, usually provide interactions without any information about what tissue that interaction is more likely to appear nor information about the type of interactions, causing mRNA degradation or translation inhibition respectively. RESULTS: In this work, we introduce miRTissue, a web application that combines validated miRNA-target interactions with statistical correlation among expression profiles of miRNAs, genes and proteins in 15 different human tissues. Validated interactions are taken from the miRTarBase database, while expression profiles are downloaded from The Cancer Genome Atlas repository. As a result, the service provides a tissue-specific characterisation of each couple of miRNA and gene together with its statistical significance (p-value). The inclusion of protein data also allows providing the type of interaction. Moreover, miRTissue offers several views for analysing interactions, focusing for example on the comparison between different cancer types or different tissue conditions. All the results are freely downloadable in the most common formats. CONCLUSIONS: miRTissue fills a gap concerning current bioinformatics services related to miRNA-target interactions because it provides a tissue-specific context to each validated interaction and the type of interaction itself. miRTissue is easily browsable allowing the user to select miRNAs, genes, cancer types and tissue conditions. The results can be sorted according to p-values to immediately identify those interactions that are more likely to occur in a given tissue. miRTissue is available at http://tblab.pa.icar.cnr.it/mirtissue.html.


Assuntos
Biologia Computacional/métodos , Internet , MicroRNAs/genética , Especificidade de Órgãos/genética , Software , Biomarcadores Tumorais/metabolismo , Regulação Neoplásica da Expressão Gênica , Humanos , MicroRNAs/metabolismo , Neoplasias/genética , Mapas de Interação de Proteínas/genética
13.
BMC Syst Biol ; 12(Suppl 5): 98, 2018 11 20.
Artigo em Inglês | MEDLINE | ID: mdl-30458802

RESUMO

BACKGROUND: Several online databases provide a large amount of biomedical data of different biological entities. These resources are typically stored in systems implementing their own data model, user interface and query language. On the other hand, in many bioinformatics scenarios there is often the need to use more than one resource. The availability of a single bioinformatics platform that integrates many biological resources and services is, for those reasons a fundamental issue. DESCRIPTION: Here, we present BioGraph, a web application that allows to query, visualize and analyze biological data belonging to several online available sources. BioGraph is built upon our previously developed graph database called BioGraphDB, that integrates and stores heterogeneous biological resources and make them available by means of a common structure and a unique query language. BioGraph implements state-of-the-art technologies and provides pre-compiled bioinformatics scenarios, as well as the possibility to perform custom queries and obtaining an interactive and dynamic visualization of results. CONCLUSION: We present a case study about functional analysis of microRNA in breast cancer in order to demonstrate the functionalities of the system. BioGraph is freely available at http://biograph.pa.icar.cnr.it . Source files are available on GitHub at https://github.com/IcarPA-TBlab/BioGraph.


Assuntos
Neoplasias da Mama/genética , Biologia Computacional/métodos , MicroRNAs/fisiologia , Software , Neoplasias da Mama/patologia , Bases de Dados Genéticas , Feminino , Regulação Neoplásica da Expressão Gênica , Humanos , Internet , Interface Usuário-Computador
14.
BMC Bioinformatics ; 19(Suppl 7): 198, 2018 07 09.
Artigo em Inglês | MEDLINE | ID: mdl-30066629

RESUMO

BACKGROUND: An open challenge in translational bioinformatics is the analysis of sequenced metagenomes from various environmental samples. Of course, several studies demonstrated the 16S ribosomal RNA could be considered as a barcode for bacteria classification at the genus level, but till now it is hard to identify the correct composition of metagenomic data from RNA-seq short-read data. 16S short-read data are generated using two next generation sequencing technologies, i.e. whole genome shotgun (WGS) and amplicon (AMP); typically, the former is filtered to obtain short-reads belonging to a 16S shotgun (SG), whereas the latter take into account only some specific 16S hypervariable regions. The above mentioned two sequencing technologies, SG and AMP, are used alternatively, for this reason in this work we propose a deep learning approach for taxonomic classification of metagenomic data, that can be employed for both of them. RESULTS: To test the proposed pipeline, we simulated both SG and AMP short-reads, from 1000 16S full-length sequences. Then, we adopted a k-mer representation to map sequences as vectors into a numerical space. Finally, we trained two different deep learning architecture, i.e., convolutional neural network (CNN) and deep belief network (DBN), obtaining a trained model for each taxon. We tested our proposed methodology to find the best parameters configuration, and we compared our results against the classification performances provided by a reference classifier for bacteria identification, known as RDP classifier. We outperformed the RDP classifier at each taxonomic level with both architectures. For instance, at the genus level, both CNN and DBN reached 91.3% of accuracy with AMP short-reads, whereas RDP classifier obtained 83.8% with the same data. CONCLUSIONS: In this work, we proposed a 16S short-read sequences classification technique based on k-mer representation and deep learning architecture, in which each taxon (from phylum to genus) generates a classification model. Experimental results confirm the proposed pipeline as a valid approach for classifying bacteria sequences; for this reason, our approach could be integrated into the most common tools for metagenomic analysis. According to obtained results, it can be successfully used for classifying both SG and AMP data.


Assuntos
Bactérias/classificação , Bactérias/genética , Aprendizado Profundo , Metagenoma , Metagenômica/métodos , Modelos Genéticos , Algoritmos , Bases de Dados Genéticas , Redes Neurais de Computação , RNA Ribossômico 16S/genética , Reprodutibilidade dos Testes , Fatores de Tempo
15.
BioData Min ; 10: 27, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28785313

RESUMO

MOTIVATION: Non-coding RNA (ncRNA) are small non-coding sequences involved in gene expression regulation of many biological processes and diseases. The recent discovery of a large set of different ncRNAs with biologically relevant roles has opened the way to develop methods able to discriminate between the different ncRNA classes. Moreover, the lack of knowledge about the complete mechanisms in regulative processes, together with the development of high-throughput technologies, has required the help of bioinformatics tools in addressing biologists and clinicians with a deeper comprehension of the functional roles of ncRNAs. In this work, we introduce a new ncRNA classification tool, nRC (non-coding RNA Classifier). Our approach is based on features extraction from the ncRNA secondary structure together with a supervised classification algorithm implementing a deep learning architecture based on convolutional neural networks. RESULTS: We tested our approach for the classification of 13 different ncRNA classes. We obtained classification scores, using the most common statistical measures. In particular, we reach an accuracy and sensitivity score of about 74%. CONCLUSION: The proposed method outperforms other similar classification methods based on secondary structure features and machine learning algorithms, including the RNAcon tool that, to date, is the reference classifier. nRC tool is freely available as a docker image at https://hub.docker.com/r/tblab/nrc/. The source code of nRC tool is also available at https://github.com/IcarPA-TBlab/nrc.

16.
Artif Intell Med ; 64(3): 173-84, 2015 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-26170017

RESUMO

OBJECTIVES: In this paper, an alignment-free method for DNA barcode classification that is based on both a spectral representation and a neural gas network for unsupervised clustering is proposed. METHODS: In the proposed methodology, distinctive words are identified from a spectral representation of DNA sequences. A taxonomic classification of the DNA sequence is then performed using the sequence signature, i.e., the smallest set of k-mers that can assign a DNA sequence to its proper taxonomic category. Experiments were then performed to compare our method with other supervised machine learning classification algorithms, such as support vector machine, random forest, ripper, naïve Bayes, ridor, and classification tree, which also consider short DNA sequence fragments of 200 and 300 base pairs (bp). The experimental tests were conducted over 10 real barcode datasets belonging to different animal species, which were provided by the on-line resource "Barcode of Life Database". RESULTS: The experimental results showed that our k-mer-based approach is directly comparable, in terms of accuracy, recall and precision metrics, with the other classifiers when considering full-length sequences. In addition, we demonstrate the robustness of our method when a classification is performed task with a set of short DNA sequences that were randomly extracted from the original data. For example, the proposed method can reach the accuracy of 64.8% at the species level with 200-bp fragments. Under the same conditions, the best other classifier (random forest) reaches the accuracy of 20.9%. CONCLUSIONS: Our results indicate that we obtained a clear improvement over the other classifiers for the study of short DNA barcode sequence fragments.


Assuntos
Código de Barras de DNA Taxonômico/métodos , DNA/genética , Redes Neurais de Computação , Aprendizado de Máquina Supervisionado , Algoritmos , Animais , Sequência de Bases , Análise por Conglomerados , Biologia Computacional , DNA/classificação , Bases de Dados Genéticas , Árvores de Decisões , Reprodutibilidade dos Testes , Especificidade da Espécie , Máquina de Vetores de Suporte
17.
BMC Bioinformatics ; 16 Suppl 6: S2, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25916734

RESUMO

BACKGROUND: Studies on genomic sequences for classification and taxonomic identification have a leading role in the biomedical field and in the analysis of biodiversity. These studies are focusing on the so-called barcode genes, representing a well defined region of the whole genome. Recently, alignment-free techniques are gaining more importance because they are able to overcome the drawbacks of sequence alignment techniques. In this paper a new alignment-free method for DNA sequences clustering and classification is proposed. The method is based on k-mers representation and text mining techniques. METHODS: The presented method is based on Probabilistic Topic Modeling, a statistical technique originally proposed for text documents. Probabilistic topic models are able to find in a document corpus the topics (recurrent themes) characterizing classes of documents. This technique, applied on DNA sequences representing the documents, exploits the frequency of fixed-length k-mers and builds a generative model for a training group of sequences. This generative model, obtained through the Latent Dirichlet Allocation (LDA) algorithm, is then used to classify a large set of genomic sequences. RESULTS AND CONCLUSIONS: We performed classification of over 7000 16S DNA barcode sequences taken from Ribosomal Database Project (RDP) repository, training probabilistic topic models. The proposed method is compared to the RDP tool and Support Vector Machine (SVM) classification algorithm in a extensive set of trials using both complete sequences and short sequence snippets (from 400 bp to 25 bp). Our method reaches very similar results to RDP classifier and SVM for complete sequences. The most interesting results are obtained when short sequence snippets are considered. In these conditions the proposed method outperforms RDP and SVM with ultra short sequences and it exhibits a smooth decrease of performance, at every taxonomic level, when the sequence length is decreased.


Assuntos
Algoritmos , Bactérias/classificação , Bactérias/genética , Genoma Bacteriano , Genômica/métodos , Modelos Estatísticos , Alinhamento de Sequência , Máquina de Vetores de Suporte
18.
BMC Bioinformatics ; 16 Suppl 4: S7, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25734576

RESUMO

BACKGROUND: MicroRNAs (miRNAs) are important key regulators in multiple cellular functions, due to their a crucial role in different physiological processes. MiRNAs are differentially expressed in specific tissues, during specific cell status, or in different diseases as tumours. RNA sequencing (RNA-seq) is a Next Generation Sequencing (NGS) method for the analysis of differential gene expression. Using machine learning algorithms, it is possible to improve the functional significance interpretation of miRNA in the analysis and interpretation of data from RNA-seq. Furthermore, we tried to identify some patterns of deregulated miRNA in human breast cancer (BC), in order to give a contribution in the understanding of this type of cancer at the molecular level. RESULTS: We adopted a biclustering approach, using the Iterative Signature Algorithm (ISA) algorithm, in order to evaluate miRNA deregulation in the context of miRNA abundance and tissue heterogeneity. These are important elements to identify miRNAs that would be useful as prognostic and diagnostic markers. Considering a real word breast cancer dataset, the evaluation of miRNA differential expressions in tumours versus healthy tissues evidenced 12 different miRNA clusters, associated to specific groups of patients. The identified miRNAs were deregulated in breast tumours compared to healthy controls. Our approach has shown the association between specific sub-class of tumour samples having the same immuno-histo-chemical and/or histological features. Biclusters have been validated by means of two online repositories, MetaMirClust database and UCSC Genome Browser, and using another biclustering algorithm. CONCLUSIONS: The obtained results with biclustering algorithm aimed first of all to give a contribute in the differential expression analysis in a cohort of BC patients and secondly to support the potential role that these non-coding RNA molecules could play in the clinical practice, in terms of prognosis, evolution of tumour and treatment response.


Assuntos
Algoritmos , Neoplasias da Mama/classificação , Neoplasias da Mama/genética , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga Escala/métodos , MicroRNAs/genética , Feminino , Genoma Humano , Humanos
19.
BMC Bioinformatics ; 14 Suppl 7: S4, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23815444

RESUMO

BACKGROUND: The key idea of DNA barcode initiative is to identify, for each group of species belonging to different kingdoms of life, a short DNA sequence that can act as a true taxon barcode. DNA barcode represents a valuable type of information that can be integrated with ecological, genetic, and morphological data in order to obtain a more consistent taxonomy. Recent studies have shown that, for the animal kingdom, the mitochondrial gene cytochrome c oxidase I (COI), about 650 bp long, can be used as a barcode sequence for identification and taxonomic purposes of animals. In the present work we aims at introducing the use of an alignment-free approach in order to make taxonomic analysis of barcode sequences. Our approach is based on the use of two compression-based versions of non-computable Universal Similarity Metric (USM) class of distances. Our purpose is to justify the employ of USM also for the analysis of short DNA barcode sequences, showing how USM is able to correctly extract taxonomic information among those kind of sequences. RESULTS: We downloaded from Barcode of Life Data System (BOLD) database 30 datasets of barcode sequences belonging to different animal species. We built phylogenetic trees of every dataset, according to compression-based and classic evolutionary methods, and compared them in terms of topology preservation. In the experimental tests, we obtained scores with a percentage of similarity between evolutionary and compression-based trees between 80% and 100% for the most of datasets (94%). Moreover we carried out experimental tests using simulated barcode datasets composed of 100, 150, 200 and 500 sequences, each simulation replicated 25-fold. In this case, mean similarity scores between evolutionary and compression-based trees span between 83% and 99% for all simulated datasets. CONCLUSIONS: In the present work we aims at introducing the use of an alignment-free approach in order to make taxonomic analysis of barcode sequences. Our approach is based on the use of two compression-based versions of non-computable Universal Similarity Metric (USM) class of distances. This way we demonstrate the reliability of compression-based methods even for the analysis of short barcode sequences. Compression-based methods, with their strong theoretical assumptions, may then represent a valid alignment-free and parameter-free approach for barcode studies.


Assuntos
Código de Barras de DNA Taxonômico , Filogenia , Animais , Simulação por Computador , Complexo IV da Cadeia de Transporte de Elétrons/genética , Genes Mitocondriais , Humanos
20.
BMC Bioinformatics ; 14 Suppl 1: S5, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23368995

RESUMO

BACKGROUND: We introduce a Knowledge-based Decision Support System (KDSS) in order to face the Protein Complex Extraction issue. Using a Knowledge Base (KB) coding the expertise about the proposed scenario, our KDSS is able to suggest both strategies and tools, according to the features of input dataset. Our system provides a navigable workflow for the current experiment and furthermore it offers support in the configuration and running of every processing component of that workflow. This last feature makes our system a crossover between classical DSS and Workflow Management Systems. RESULTS: We briefly present the KDSS' architecture and basic concepts used in the design of the knowledge base and the reasoning component. The system is then tested using a subset of Saccharomyces cerevisiae Protein-Protein interaction dataset. We used this subset because it has been well studied in literature by several research groups in the field of complex extraction: in this way we could easily compare the results obtained through our KDSS with theirs. Our system suggests both a preprocessing and a clustering strategy, and for each of them it proposes and eventually runs suited algorithms. Our system's final results are then composed of a workflow of tasks, that can be reused for other experiments, and the specific numerical results for that particular trial. CONCLUSIONS: The proposed approach, using the KDSS' knowledge base, provides a novel workflow that gives the best results with regard to the other workflows produced by the system. This workflow and its numeric results have been compared with other approaches about PPI network analysis found in literature, offering similar results.


Assuntos
Bases de Conhecimento , Mapeamento de Interação de Proteínas , Algoritmos , Biologia Computacional/métodos , Técnicas de Apoio para a Decisão , Complexos Multiproteicos/metabolismo , Proteínas de Saccharomyces cerevisiae/metabolismo , Software , Fluxo de Trabalho
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA