Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
1.
BMC Bioinformatics ; 24(1): 395, 2023 Oct 20.
Artigo em Inglês | MEDLINE | ID: mdl-37864168

RESUMO

BACKGROUND: Transcription factors (TF) play a crucial role in the regulation of gene transcription; alterations of their activity and binding to DNA areas are strongly involved in cancer and other disease onset and development. For proper biomedical investigation, it is hence essential to correctly trace TF dense DNA areas, having multiple bindings of distinct factors, and select DNA high occupancy target (HOT) zones, showing the highest accumulation of such bindings. Indeed, systematic and replicable analysis of HOT zones in a large variety of cells and tissues would allow further understanding of their characteristics and could clarify their functional role. RESULTS: Here, we propose, thoroughly explain and discuss a full computational procedure to study in-depth DNA dense areas of transcription factor accumulation and identify HOT zones. This methodology, developed as a computationally efficient parametric algorithm implemented in an R/Bioconductor package, uses a systematic approach with two alternative methods to examine transcription factor bindings and provide comparative and fully-reproducible assessments. It offers different resolutions by introducing three distinct types of accumulation, which can analyze DNA from single-base to region-oriented levels, and a moving window, which can estimate the influence of the neighborhood for each DNA base under exam. CONCLUSIONS: We quantitatively assessed the full procedure by using our implemented software package, named TFHAZ, in two example applications of biological interest, proving its full reliability and relevance.


Assuntos
Regulação da Expressão Gênica , Fatores de Transcrição , Fatores de Transcrição/metabolismo , Reprodutibilidade dos Testes , DNA/genética , Ligação Proteica , Sítios de Ligação/genética
2.
J Biomed Inform ; 144: 104457, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37488024

RESUMO

BACKGROUND AND OBJECTIVE: Many classification tasks in translational bioinformatics and genomics are characterized by the high dimensionality of potential features and unbalanced sample distribution among classes. This can affect classifier robustness and increase the risk of overfitting, curse of dimensionality and generalization leaks; furthermore and most importantly, this can prevent obtaining adequate patient stratification required for precision medicine in facing complex diseases, like cancer. Setting up a feature selection strategy able to extract only proper predictive features by removing irrelevant, redundant, and noisy ones is crucial to achieving valuable results on the desired task. METHODS: We propose a new feature selection approach, called ReRa, based on supervised Relevance-Redundancy assessments. ReRa consists of a customized step of relevance-based filtering, to identify a reduced subset of meaningful features, followed by a supervised similarity-based procedure to minimize redundancy. This latter step innovatively uses a combination of global and class-specific similarity assessments to remove redundant features while preserving those differentiated across classes, even when these classes are strongly unbalanced. RESULTS: We compared ReRa with several existing feature selection methods to obtain feature spaces on which performing breast cancer patient subtyping using several classifiers: we considered two use cases based on gene or transcript isoform expression. In the vast majority of the assessed scenarios, when using ReRa-selected feature spaces, the performances were significantly increased compared to simple feature filtering, LASSO regularization, or even MRmr - another Relevance-Redundancy method. The two use cases represent an insightful example of translational application, taking advantage of ReRa capabilities to investigate and enhance a clinically-relevant patient stratification task, which could be easily applied also to other cancer types and diseases. CONCLUSIONS: ReRa approach has the potential to improve the performance of machine learning models used in an unbalanced classification scenario. Compared to another Relevance-Redundancy approach like MRmr, ReRa does not require tuning the number of preserved features, ensures efficiency and scalability over huge initial dimensionalities and allows re-evaluation of all previously selected features at each iteration of the redundancy assessment, to ultimately preserve only the most relevant and class-differentiated features.


Assuntos
Algoritmos , Neoplasias da Mama , Humanos , Feminino , Biologia Computacional/métodos , Genômica , Proteômica , Neoplasias da Mama/diagnóstico , Neoplasias da Mama/genética
3.
BMC Bioinformatics ; 23(1): 123, 2022 Apr 07.
Artigo em Inglês | MEDLINE | ID: mdl-35392801

RESUMO

BACKGROUND: Heterogeneous omics data, increasingly collected through high-throughput technologies, can contain hidden answers to very important and still unsolved biomedical questions. Their integration and processing are crucial mostly for tertiary analysis of Next Generation Sequencing data, although suitable big data strategies still address mainly primary and secondary analysis. Hence, there is a pressing need for algorithms specifically designed to explore big omics datasets, capable of ensuring scalability and interoperability, possibly relying on high-performance computing infrastructures. RESULTS: We propose RGMQL, a R/Bioconductor package conceived to provide a set of specialized functions to extract, combine, process and compare omics datasets and their metadata from different and differently localized sources. RGMQL is built over the GenoMetric Query Language (GMQL) data management and computational engine, and can leverage its open curated repository as well as its cloud-based resources, with the possibility of outsourcing computational tasks to GMQL remote services. Furthermore, it overcomes the limits of the GMQL declarative syntax, by guaranteeing a procedural approach in dealing with omics data within the R/Bioconductor environment. But mostly, it provides full interoperability with other packages of the R/Bioconductor framework and extensibility over the most used genomic data structures and processing functions. CONCLUSIONS: RGMQL is able to combine the query expressiveness and computational efficiency of GMQL with a complete processing flow in the R environment, being a fully integrated extension of the R/Bioconductor framework. Here we provide three fully reproducible example use cases of biological relevance that are particularly explanatory of its flexibility of use and interoperability with other R/Bioconductor packages. They show how RGMQL can easily scale up from local to parallel and cloud computing while it combines and analyzes heterogeneous omics data from local or remote datasets, both public and private, in a completely transparent way to the user.


Assuntos
Metadados , Software , Big Data , Computação em Nuvem , Genômica
4.
IEEE Trans Pattern Anal Mach Intell ; 45(1): 539-559, 2023 01.
Artigo em Inglês | MEDLINE | ID: mdl-35130142

RESUMO

Connecting Vision and Language plays an essential role in Generative Intelligence. For this reason, large research efforts have been devoted to image captioning, i.e. describing images with syntactically and semantically meaningful sentences. Starting from 2015 the task has generally been addressed with pipelines composed of a visual encoder and a language model for text generation. During these years, both components have evolved considerably through the exploitation of object regions, attributes, the introduction of multi-modal connections, fully-attentive approaches, and BERT-like early-fusion strategies. However, regardless of the impressive results, research in image captioning has not reached a conclusive answer yet. This work aims at providing a comprehensive overview of image captioning approaches, from visual encoding and text generation to training strategies, datasets, and evaluation metrics. In this respect, we quantitatively compare many relevant state-of-the-art approaches to identify the most impactful technical innovations in architectures and training strategies. Moreover, many variants of the problem and its open challenges are discussed. The final goal of this work is to serve as a tool for understanding the existing literature and highlighting the future directions for a research area where Computer Vision and Natural Language Processing can find an optimal synergy.


Assuntos
Aprendizado Profundo , Algoritmos , Benchmarking , Idioma , Processamento de Linguagem Natural
5.
Genome Med ; 15(1): 37, 2023 05 15.
Artigo em Inglês | MEDLINE | ID: mdl-37189167

RESUMO

BACKGROUND: Transcriptional classification has been used to stratify colorectal cancer (CRC) into molecular subtypes with distinct biological and clinical features. However, it is not clear whether such subtypes represent discrete, mutually exclusive entities or molecular/phenotypic states with potential overlap. Therefore, we focused on the CRC Intrinsic Subtype (CRIS) classifier and evaluated whether assigning multiple CRIS subtypes to the same sample provides additional clinically and biologically relevant information. METHODS: A multi-label version of the CRIS classifier (multiCRIS) was applied to newly generated RNA-seq profiles from 606 CRC patient-derived xenografts (PDXs), together with human CRC bulk and single-cell RNA-seq datasets. Biological and clinical associations of single- and multi-label CRIS were compared. Finally, a machine learning-based multi-label CRIS predictor (ML2CRIS) was developed for single-sample classification. RESULTS: Surprisingly, about half of the CRC cases could be significantly assigned to more than one CRIS subtype. Single-cell RNA-seq analysis revealed that multiple CRIS membership can be a consequence of the concomitant presence of cells of different CRIS class or, less frequently, of cells with hybrid phenotype. Multi-label assignments were found to improve prediction of CRC prognosis and response to treatment. Finally, the ML2CRIS classifier was validated for retaining the same biological and clinical associations also in the context of single-sample classification. CONCLUSIONS: These results show that CRIS subtypes retain their biological and clinical features even when concomitantly assigned to the same CRC sample. This approach could be potentially extended to other cancer types and classification systems.


Assuntos
Neoplasias Colorretais , Animais , Humanos , Neoplasias Colorretais/patologia , Prognóstico , Modelos Animais de Doenças , Biomarcadores Tumorais/genética
6.
Methods Mol Biol ; 2401: 195-215, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-34902130

RESUMO

The COVID-19 pandemic has hit heavily many aspects of our lives. At this time, genomic research is concerned with exploiting available datasets and knowledge to fuel discovery on this novel disease. Studies that can precisely characterize the gene expression profiles of human hosts infected by SARS-CoV-2 are of significant relevance. However, not many such experiments have yet been produced to date, nor made publicly available online. Thus, it is of paramount importance that data analysts explore all possibilities to integrate information coming from similar viruses and related diseases; interestingly, microarray gene profile experiments become extremely valuable for this purpose. This chapter reviews the aspects that should be considered when integrating transcriptomics data, considering mainly samples infected by different viruses and combining together various data types and also the extracted knowledge. It describes a series of scenarios from studies performed in literature and it suggests possible other directions of noteworthy integration.


Assuntos
COVID-19 , Perfilação da Expressão Gênica , COVID-19/genética , Genômica , Humanos , Pandemias , Transcriptoma
7.
Artigo em Inglês | MEDLINE | ID: mdl-33270566

RESUMO

Breast Cancer comprises multiple subtypes implicated in prognosis. Existing stratification methods rely on the expression quantification of small gene sets. Next Generation Sequencing promises large amounts of omic data in the next years. In this scenario, we explore the potential of machine learning and, particularly, deep learning for breast cancer subtyping. Due to the paucity of publicly available data, we leverage on pan-cancer and non-cancer data to design semi-supervised settings. We make use of multi-omic data, including microRNA expressions and copy number alterations, and we provide an in-depth investigation of several supervised and semi-supervised architectures. Obtained accuracy results show simpler models to perform at least as well as the deep semi-supervised approaches on our task over gene expression data. When multi-omic data types are combined together, performance of deep models shows little (if any) improvement in accuracy, indicating the need for further analysis on larger datasets of multi-omic data as and when they become available. From a biological perspective, our linear model mostly confirms known gene-subtype annotations. Conversely, deep approaches model non-linear relationships, which is reflected in a more varied and still unexplored set of representative omic features that may prove useful for breast cancer subtyping.


Assuntos
Neoplasias da Mama , Aprendizado Profundo , Neoplasias da Mama/genética , Variações do Número de Cópias de DNA , Feminino , Humanos , Aprendizado de Máquina , Aprendizado de Máquina Supervisionado
8.
Diagnostics (Basel) ; 12(10)2022 Oct 07.
Artigo em Inglês | MEDLINE | ID: mdl-36292114

RESUMO

PURPOSE: We evaluate the ability of Artificial Intelligence with automatic classification methods applied to semi-quantitative data from brain 18F-FDG PET/CT to improve the differential diagnosis between Alzheimer Disease (AD) and Mild Cognitive Impairment (MCI). PROCEDURES: We retrospectively analyzed a total of 150 consecutive patients who underwent diagnostic evaluation for suspected AD (n = 67) or MCI (n = 83). All patients received brain 18F-FDG PET/CT according to the international guidelines, and images were analyzed both Qualitatively (QL) and Quantitatively (QN), the latter by a fully automated post-processing software that produced a z score metabolic map of 25 anatomically different cortical regions. A subset of n = 122 cases with a confirmed diagnosis of AD (n = 53) or MDI (n = 69) by 18-24-month clinical follow-up was finally included in the study. Univariate analysis and three automated classification models (classification tree -ClT-, ridge classifier -RC- and linear Support Vector Machine -lSVM-) were considered to estimate the ability of the z scores to discriminate between AD and MCI cases in. RESULTS: The univariate analysis returned 14 areas where the z scores were significantly different between AD and MCI groups, and the classification accuracy ranged between 74.59% and 76.23%, with ClT and RC providing the best results. The best classification strategy consisted of one single split with a cut-off value of ≈ -2.0 on the z score from temporal lateral left area: cases below this threshold were classified as AD and those above the threshold as MCI. CONCLUSIONS: Our findings confirm the usefulness of brain 18F-FDG PET/CT QL and QN analyses in differentiating AD from MCI. Moreover, the combined use of automated classifications models can improve the diagnostic process since its use allows identification of a specific hypometabolic area involved in AD cases in respect to MCI. This data improves the traditional 18F-FDG PET/CT image interpretation and the diagnostic assessment of cognitive disorders.

9.
Sci Rep ; 10(1): 14071, 2020 08 21.
Artigo em Inglês | MEDLINE | ID: mdl-32826944

RESUMO

Stratification of breast cancer (BC) into molecular subtypes by multigene expression assays is of demonstrated clinical utility. In principle, global RNA-sequencing (RNA-seq) should enable reconstructing existing transcriptional classifications of BC samples. Yet, it is not clear whether adaptation to RNA-seq of classifiers originally developed using PCR or microarrays, or reconstruction through machine learning (ML) is preferable. Hence, we focused on robustness and portability of PAM50, a nearest-centroid classifier developed on microarray data to identify five BC "intrinsic subtypes". We found that standard PAM50 is profoundly affected by the composition of the sample cohort used for reference construction, and we propose a strategy, named AWCA, to mitigate this issue, improving classification robustness, with over 90% of concordance, and prognostic ability; we also show that AWCA-based PAM50 can even be applied as single-sample method. Furthermore, we explored five supervised learners to build robust, single-sample intrinsic subtype callers via RNA-seq. From our ML-based survey, regularized multiclass logistic regression (mLR) displayed the best performance, further increased by ad-hoc gene selection on the global transcriptome. On external test sets, mLR classifications reached 90% concordance with PAM50-based calls, without need of reference sample; mLR proven robustness and prognostic ability make it an equally valuable single-sample method to strengthen BC subtyping.


Assuntos
Neoplasias da Mama/classificação , Carcinoma/classificação , Aprendizado de Máquina , Análise de Sequência de RNA , Biomarcadores Tumorais , Neoplasias da Mama/química , Neoplasias da Mama/genética , Carcinoma/química , Carcinoma/genética , Conjuntos de Dados como Assunto , Estrogênios , Feminino , Humanos , Modelos Logísticos , Neoplasias Hormônio-Dependentes/química , Neoplasias Hormônio-Dependentes/genética , Prognóstico , Receptores de Estrogênio/análise , Recidiva
10.
Mol Imaging Biol ; 22(3): 703-710, 2020 06.
Artigo em Inglês | MEDLINE | ID: mdl-31309370

RESUMO

PURPOSE: To provide reliable and reproducible heart/mediastinum (H/M) ratio cut-off values for parkinsonian disorders using two machine learning techniques, Support Vector Machines (SVM) and Random Forest (RF) classifier, applied to [123I]MIBG cardiac scintigraphy. PROCEDURES: We studied 85 subjects, 50 with idiopathic Parkinson's disease, 26 with atypical Parkinsonian syndromes (P), and 9 with essential tremor (ET). All patients underwent planar early and delayed cardiac scintigraphy after [123I]MIBG (111 MBq) intravenous injection. Images were evaluated both qualitatively and quantitatively; the latter by the early and delayed H/M ratio obtained from regions of interest (ROIt1 and ROIt2) drawn on planar images. SVM and RF classifiers were finally used to obtain the correct cut-off value. RESULTS: SVM and RF produced excellent classification performances: SVM classifier achieved perfect classification and RF also attained very good accuracy. The better cut-off for H/M value was 1.55 since it remains the same for both ROIt1 and ROIt2. This value allowed to correctly classify PD from P and ET: patients with H/M ratio less than 1.55 were classified as PD while those with values higher than 1.55 were considered as affected by parkinsonism and/or ET. No difference was found when early or late H/M ratio were considered separately thus suggesting that a single early evaluation could be sufficient to obtain the final diagnosis. CONCLUSIONS: Our results evidenced that the use of SVM and CT permitted to define the better cut-off value for H/M ratios both in early and in delayed phase thus underlining the role of [123I]MIBG cardiac scintigraphy and the effectiveness of H/M ratio in differentiating PD from other parkinsonism or ET. Moreover, early scans alone could be used for a reliable diagnosis since no difference was found between early and late. Definitely, a larger series of cases is needed to confirm this data.


Assuntos
3-Iodobenzilguanidina , Coração/diagnóstico por imagem , Radioisótopos do Iodo , Mediastino/diagnóstico por imagem , Transtornos Parkinsonianos/classificação , Transtornos Parkinsonianos/diagnóstico por imagem , Cintilografia/métodos , 3-Iodobenzilguanidina/química , 3-Iodobenzilguanidina/farmacocinética , Adulto , Idoso , Idoso de 80 Anos ou mais , Feminino , Humanos , Radioisótopos do Iodo/química , Radioisótopos do Iodo/farmacocinética , Masculino , Pessoa de Meia-Idade , Transtornos Parkinsonianos/patologia , Compostos Radiofarmacêuticos/química , Compostos Radiofarmacêuticos/metabolismo , Estudos Retrospectivos , Máquina de Vetores de Suporte
11.
Anticancer Res ; 40(6): 3355-3360, 2020 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-32487631

RESUMO

BACKGROUND/AIM: Proliferation biomarkers such as MIB-1 are strong predictors of clinical outcome and response to therapy in patients with non-small-cell lung cancer, but they require histological examination. In this work, we present a classification model to predict MIB-1 expression based on clinical parameters from positron emission tomography. PATIENTS AND METHODS: We retrospectively evaluated 78 patients with histology-proven non-small-cell lung cancer (NSCLC) who underwent 18F-FDG-PET/CT for clinical examination. We stratified the population into a low and high proliferation group using MIB-1=25% as cut-off value. We built a predictive model based on binary classification trees to estimate the group label from the maximum standardized uptake value (SUVmax) and lesion diameter. RESULTS: The proposed model showed ability to predict the correct proliferation group with overall accuracy >82% (78% and 86% for the low- and high-proliferation group, respectively). CONCLUSION: Our results indicate that radiotracer activity evaluated via SUVmax and lesion diameter are correlated with tumour proliferation index MIB-1.


Assuntos
Carcinoma Pulmonar de Células não Pequenas/classificação , Carcinoma Pulmonar de Células não Pequenas/diagnóstico por imagem , Fluordesoxiglucose F18 , Antígeno Ki-67/biossíntese , Neoplasias Pulmonares/classificação , Neoplasias Pulmonares/diagnóstico por imagem , Carcinoma Pulmonar de Células não Pequenas/metabolismo , Carcinoma Pulmonar de Células não Pequenas/patologia , Proliferação de Células/fisiologia , Feminino , Humanos , Imuno-Histoquímica , Neoplasias Pulmonares/metabolismo , Neoplasias Pulmonares/patologia , Masculino , Tomografia por Emissão de Pósitrons combinada à Tomografia Computadorizada/métodos , Compostos Radiofarmacêuticos , Estudos Retrospectivos
12.
Curr Alzheimer Res ; 14(2): 198-207, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-27334942

RESUMO

Artificial Intelligence (AI) is a very active Computer Science research field aiming to develop systems that mimic human intelligence and is helpful in many human activities, including Medicine. In this review we presented some examples of the exploiting of AI techniques, in particular automatic classifiers such as Artificial Neural Network (ANN), Support Vector Machine (SVM), Classification Tree (ClT) and ensemble methods like Random Forest (RF), able to analyze findings obtained by positron emission tomography (PET) or single-photon emission tomography (SPECT) scans of patients with Neurodegenerative Diseases, in particular Alzheimer's Disease. We also focused our attention on techniques applied in order to preprocess data and reduce their dimensionality via feature selection or projection in a more representative domain (Principal Component Analysis - PCA - or Partial Least Squares - PLS - are examples of such methods); this is a crucial step while dealing with medical data, since it is necessary to compress patient information and retain only the most useful in order to discriminate subjects into normal and pathological classes. Main literature papers on the application of these techniques to classify patients with neurodegenerative disease extracting data from molecular imaging modalities are reported, showing that the increasing development of computer aided diagnosis systems is very promising to contribute to the diagnostic process.


Assuntos
Encéfalo/diagnóstico por imagem , Aprendizado de Máquina , Doenças Neurodegenerativas/diagnóstico por imagem , Reconhecimento Automatizado de Padrão , Tomografia por Emissão de Pósitrons , Tomografia Computadorizada de Emissão de Fóton Único , Humanos , Tomografia por Emissão de Pósitrons/métodos , Tomografia Computadorizada de Emissão de Fóton Único/métodos
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa