Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
1.
J Chem Inf Model ; 64(8): 3093-3104, 2024 Apr 22.
Artículo en Inglés | MEDLINE | ID: mdl-38523265

RESUMEN

The majority of chemicals detected via nontarget liquid chromatography high-resolution mass spectrometry (HRMS) in environmental samples remain unidentified, challenging the capability of existing machine learning models to pinpoint potential endocrine disruptors (EDs). Here, we predict the activity of unidentified chemicals across 12 bioassays related to EDs within the Tox21 10K dataset. Single- and multi-output models, utilizing various machine learning algorithms and molecular fingerprint features as an input, were trained for this purpose. To evaluate the models under near real-world conditions, Monte Carlo sampling was implemented for the first time. This technique enables the use of probabilistic fingerprint features derived from the experimental HRMS data with SIRIUS+CSI:FingerID as an input for models trained on true binary fingerprint features. Depending on the bioassay, the lowest false-positive rate at 90% recall ranged from 0.251 (sr.mmp, mitochondrial membrane potential) to 0.824 (nr.ar, androgen receptor), which is consistent with the trends observed in the models' performances submitted for the Tox21 Data Challenge. These findings underscore the informativeness of fingerprint features that can be compiled from HRMS in predicting the endocrine-disrupting activity. Moreover, an in-depth SHapley Additive exPlanations analysis unveiled the models' ability to pinpoint structural patterns linked to the modes of action of active chemicals. Despite the superior performance of the single-output models compared to that of the multi-output models, the latter's potential cannot be disregarded for similar tasks in the field of in silico toxicology. This study presents a significant advancement in identifying potentially toxic chemicals within complex mixtures without unambiguous identification and effectively reducing the workload for postprocessing by up to 75% in nontarget HRMS.


Asunto(s)
Bioensayo , Disruptores Endocrinos , Disruptores Endocrinos/química , Disruptores Endocrinos/farmacología , Espectrometría de Masas , Aprendizaje Automático , Humanos , Método de Montecarlo
2.
Sci Data ; 10(1): 162, 2023 03 23.
Artículo en Inglés | MEDLINE | ID: mdl-36959280

RESUMEN

SPHERE is a large multidisciplinary project to research and develop a sensor network to facilitate home healthcare by activity monitoring, specifically towards activities of daily living. It aims to use the latest technologies in low powered sensors, internet of things, machine learning and automated decision making to provide benefits to patients and clinicians. This dataset comprises data collected from a SPHERE sensor network deployment during a set of experiments conducted in the 'SPHERE House' in Bristol, UK, during 2016, including video tracking, accelerometer and environmental sensor data obtained by volunteers undertaking both scripted and non-scripted activities of daily living in a domestic residence. Trained annotators provided ground-truth labels annotating posture, ambulation, activity and location. This dataset is a valuable resource both within and outside the machine learning community, particularly in developing and evaluating algorithms for identifying activities of daily living from multi-modal sensor data in real-world environments. A subset of this dataset was released as a machine learning competition in association with the European Conference on Machine Learning (ECML-PKDD 2016).


Asunto(s)
Actividades Cotidianas , Monitoreo Ambulatorio , Humanos , Algoritmos , Aprendizaje Automático
3.
Anal Chim Acta ; 1204: 339402, 2022 Apr 29.
Artículo en Inglés | MEDLINE | ID: mdl-35397906

RESUMEN

Non-targeted screening with LC/ESI/HRMS aims to identify the structure of the detected compounds using their retention time, exact mass, and fragmentation pattern. Challenges remain in differentiating between isomeric compounds. One untapped possibility to facilitate identification of isomers relies on different ionic species formed in electrospray. In positive ESI mode, both protonated molecules and adducts can be formed; however, not all isomeric structures form the same ionic species. The complicated mechanism of adduct formation has hindered the use of this molecular characteristic in the structural elucidation in non-targeted screening. Here, we have studied the adduct formation for 94 small molecules with ion mobility spectra and compared collision cross-sections of the respective ions. Based on the results we developed a fast support vector machine classifier with polynomial kernels for accurately predicting the sodium adduct formation in ESI/HRMS. The model is trained on five independent data sets from different laboratories and uses the graph-based connectivity of functional groups and PubChem fingerprints to predict the sodium adduct formation in ESI/HRMS. The validation of the model showed an accuracy of 74.7% (balanced accuracy 70.0%) on a dataset from an independent laboratory, which was not used in the training of the model. Lastly, we applied the classification algorithm to the SusDat database by NORMAN network to evaluate the proportion of isomeric compounds that could be distinguished based on predicted sodium adduct formation. It was observed that sodium adduct formation probability can provide additional selectivity for about one quarter of the exact masses and, therefore, shows practical utility for structural assignment in non-targeted screening.


Asunto(s)
Sodio , Espectrometría de Masa por Ionización de Electrospray , Iones/química , Isomerismo , Aprendizaje Automático , Sodio/química , Espectrometría de Masa por Ionización de Electrospray/métodos
4.
Genes Cells ; 15(3): 209-28, 2010 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-20184659

RESUMEN

Brachyury(+) mesodermal cell population with purity over 79% was obtained from differentiating brachyury embryonic stem cells (ESC) generated with brachyury promoter driven enhanced green fluorescent protein and puromycin-N-acetyltransferase. A comprehensive transcriptomic analysis of brachyury(+) cells enriched with puromycin application from 6-day-old embryoid bodies (EBs), 6-day-old control EBs and undifferentiated ESCs led to identification of 1573 uniquely up-regulated and 1549 uniquely down-regulated transcripts in brachyury(+) cells. Furthermore, transcripts up-regulated in brachyury(+) cells have overrepresented the Gene Ontology annotations (cell differentiation, blood vessel morphogenesis, striated muscle development, placenta development and cell motility) and Kyoto Encyclopedia of Genes and Genomes pathway annotations (mitogen-activated protein kinase signaling and transforming growth factor beta signaling). Transcripts representing Larp2 and Ankrd34b are notably up-regulated in brachyury(+) cells. Knockdown of Larp2 resulted in a significantly down-regulation BMP-2 expression, and knockdown of Ankrd34b resulted in alteration of NF-H, PPARγ and PECAM1 expression. The elucidation of transcriptomic signatures of ESCs-derived brachyury(+) cells will contribute toward defining the genetic and cellular identities of presumptive mesodermal cells. Furthermore, there is a possible involvement of Larp2 in the regulation of the late mesodermal marker BMP-2. Ankrd34b might be a positive regulator of neurogenesis and a negative regulator of adipogenesis.


Asunto(s)
Células Madre Embrionarias/metabolismo , Proteínas Fetales/metabolismo , Proteínas de Dominio T Box/metabolismo , Linfocitos T/metabolismo , Transcriptoma , Acetiltransferasas/metabolismo , Animales , Autoantígenos/genética , Autoantígenos/metabolismo , Proteína Morfogenética Ósea 2/metabolismo , Células Cultivadas , Cuerpos Embrioides/citología , Cuerpos Embrioides/metabolismo , Ratones , Proteínas de Neurofilamentos/metabolismo , PPAR gamma/metabolismo , Molécula-1 de Adhesión Celular Endotelial de Plaqueta/metabolismo , Regiones Promotoras Genéticas , ARN Interferente Pequeño/genética , Ribonucleoproteínas/genética , Ribonucleoproteínas/metabolismo , Antígeno SS-B
5.
Nucleic Acids Res ; 37(Web Server issue): W587-92, 2009 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-19483095

RESUMEN

Measuring gene expression levels with microarrays is one of the key technologies of modern genomics. Clustering of microarray data is an important application, as genes with similar expression profiles may be regulated by common pathways and involved in related functions. Gene Ontology (GO) analysis and visualization allows researchers to study the biological context of discovered clusters and characterize genes with previously unknown functions. We present VisHiC (Visualization of Hierarchical Clustering), a web server for clustering and compact visualization of gene expression data combined with automated function enrichment analysis. The main output of the analysis is a dendrogram and visual heatmap of the expression matrix that highlights biologically relevant clusters based on enriched GO terms, pathways and regulatory motifs. Clusters with most significant enrichments are contracted in the final visualization, while less relevant parts are hidden altogether. Such a dense representation of microarray data gives a quick global overview of thousands of transcripts in many conditions and provides a good starting point for further analysis. VisHiC is freely available at http://biit.cs.ut.ee/vishic.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Programas Informáticos , Algoritmos , Enfermedades Cardiovasculares/genética , Enfermedades Cardiovasculares/metabolismo , Análisis por Conglomerados , Gráficos por Computador , Matriz Extracelular/metabolismo , Humanos , Mitocondrias/metabolismo , Miocardio/metabolismo
6.
Genomics ; 93(3): 213-20, 2009 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-19059335

RESUMEN

The Alternative Splicing and Transcript Diversity database (ASTD) gives access to a vast collection of alternative transcripts that integrate transcription initiation, polyadenylation and splicing variant data. Alternative transcripts are derived from the mapping of transcribed sequences to the complete human, mouse and rat genomes using an extension of the computational pipeline developed for the ASD (Alternative Splicing Database) and ATD (Alternative Transcript Diversity) databases, which are now superseded by ASTD. For the human genome, ASTD identifies splicing variants, transcription initiation variants and polyadenylation variants in 68%, 68% and 62% of the gene set, respectively, consistent with current estimates for transcription variation. Users can access ASTD through a variety of browsing and query tools, including expression state-based queries for the identification of tissue-specific isoforms. Participating laboratories have experimentally validated a subset of ASTD-predicted alternative splice forms and alternative polyadenylation forms that were not previously reported. The ASTD database can be accessed at http://www.ebi.ac.uk/astd.


Asunto(s)
Empalme Alternativo/genética , Bases de Datos Genéticas , Animales , Sistemas de Administración de Bases de Datos , Humanos , Almacenamiento y Recuperación de la Información/métodos , Ratones , Ratas , Reproducibilidad de los Resultados , Programas Informáticos , Interfaz Usuario-Computador
7.
Nucleic Acids Res ; 35(Web Server issue): W193-200, 2007 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-17478515

RESUMEN

g:Profiler (http://biit.cs.ut.ee/gprofiler/) is a public web server for characterising and manipulating gene lists resulting from mining high-throughput genomic data. g:Profiler has a simple, user-friendly web interface with powerful visualisation for capturing Gene Ontology (GO), pathway, or transcription factor binding site enrichments down to individual gene levels. Besides standard multiple testing corrections, a new improved method for estimating the true effect of multiple testing over complex structures like GO has been introduced. Interpreting ranked gene lists is supported from the same interface with very efficient algorithms. Such ordered lists may arise when studying the most significantly affected genes from high-throughput data or genes co-expressed with the query gene. Other important aspects of practical data analysis are supported by modules tightly integrated with g:Profiler. These are: g:Convert for converting between different database identifiers; g:Orth for finding orthologous genes from other species; and g:Sorter for searching a large body of public gene expression data for co-expression. g:Profiler supports 31 different species, and underlying data is updated regularly from sources like the Ensembl database. Bioinformatics communities wishing to integrate with g:Profiler can use alternative simple textual outputs.


Asunto(s)
Biología Computacional/métodos , Perfilación de la Expresión Génica , Análisis de Secuencia por Matrices de Oligonucleótidos , Programas Informáticos , Algoritmos , Animales , ADN/metabolismo , Interpretación Estadística de Datos , Genómica , Humanos , Internet , ARN Mensajero/metabolismo , Interfaz Usuario-Computador
8.
Nucleic Acids Res ; 32(Web Server issue): W465-70, 2004 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-15215431

RESUMEN

Expression Profiler (EP, http://www.ebi.ac.uk/expressionprofiler) is a web-based platform for microarray gene expression and other functional genomics-related data analysis. The new architecture, Expression Profiler: next generation (EP:NG), modularizes the original design and allows individual analysis-task-related components to be developed by different groups and yet still seamlessly to work together and share the same user interface look and feel. Data analysis components for gene expression data preprocessing, missing value imputation, filtering, clustering methods, visualization, significant gene finding, between group analysis and other statistical components are available from the EBI (European Bioinformatics Institute) web site. The web-based design of Expression Profiler supports data sharing and collaborative analysis in a secure environment. Developed tools are integrated with the microarray gene expression database ArrayExpress and form the exploratory analytical front-end to those data. EP:NG is an open-source project, encouraging broad distribution and further extensions from the scientific community.


Asunto(s)
Perfilación de la Expresión Génica , Análisis de Secuencia por Matrices de Oligonucleótidos , Programas Informáticos , Genómica , Internet , Interfaz Usuario-Computador
9.
Genome Biol ; 11(8): R80, 2010.
Artículo en Inglés | MEDLINE | ID: mdl-20678241

RESUMEN

BACKGROUND: The current epidemic of obesity has caused a surge of interest in the study of adipose tissue formation. While major progress has been made in defining the molecular networks that control adipocyte terminal differentiation, the early steps of adipocyte development and the embryonic origin of this lineage remain largely unknown. RESULTS: Here we performed genome-wide analysis of gene expression during adipogenesis of mouse embryonic stem cells (ESCs). We then pursued comprehensive bioinformatic analyses, including de novo functional annotation and curation of the generated data within the context of biological pathways, to uncover novel biological functions associated with the early steps of adipocyte development. By combining in-depth gene regulation studies and in silico analysis of transcription factor binding site enrichment, we also provide insights into the transcriptional networks that might govern these early steps. CONCLUSIONS: This study supports several biological findings: firstly, adipocyte development in mouse ESCs is coupled to blood vessel morphogenesis and neural development, just as it is during mouse development. Secondly, the early steps of adipocyte formation involve major changes in signaling and transcriptional networks. A large proportion of the transcription factors that we uncovered in mouse ESCs are also expressed in the mouse embryonic mesenchyme and in adipose tissues, demonstrating the power of our approach to probe for genes associated with early developmental processes on a genome-wide scale. Finally, we reveal a plethora of novel candidate genes for adipocyte development and present a unique resource that can be further explored in functional assays.


Asunto(s)
Adipocitos/citología , Adipogénesis/genética , Células Madre Embrionarias/citología , Perfilación de la Expresión Génica , Animales , Sitios de Unión , Biología Computacional/métodos , Regulación de la Expresión Génica , Redes Reguladoras de Genes , Estudios de Asociación Genética , Genoma , Ratones , Factores de Transcripción
10.
Genome Biol ; 10(12): R139, 2009.
Artículo en Inglés | MEDLINE | ID: mdl-19961599

RESUMEN

We present a web resource MEM (Multi-Experiment Matrix) for gene expression similarity searches across many datasets. MEM features large collections of microarray datasets and utilizes rank aggregation to merge information from different datasets into a single global ordering with simultaneous statistical significance estimation. Unique features of MEM include automatic detection, characterization and visualization of datasets that includes the strongest coexpression patterns. MEM is freely available at http://biit.cs.ut.ee/mem/.


Asunto(s)
Biología Computacional/métodos , Bases de Datos Genéticas , Internet , Análisis de Secuencia por Matrices de Oligonucleótidos/estadística & datos numéricos , Programas Informáticos
11.
BioData Min ; 1(1): 9, 2008 Sep 22.
Artículo en Inglés | MEDLINE | ID: mdl-18822115

RESUMEN

BACKGROUND: Agglomerative hierarchical clustering (AHC) is a common unsupervised data analysis technique used in several biological applications. Standard AHC methods require that all pairwise distances between data objects must be known. With ever-increasing data sizes this quadratic complexity poses problems that cannot be overcome by simply waiting for faster computers. RESULTS: We propose an approximate AHC algorithm HappieClust which can output a biologically meaningful clustering of a large dataset more than an order of magnitude faster than full AHC algorithms. The key to the algorithm is to limit the number of calculated pairwise distances to a carefully chosen subset of all possible distances. We choose distances using a similarity heuristic based on a small set of pivot objects. The heuristic efficiently finds pairs of similar objects and these help to mimic the greedy choices of full AHC. Quality of approximate AHC as compared to full AHC is studied with three measures. The first measure evaluates the global quality of the achieved clustering, while the second compares biological relevance using enrichment of biological functions in every subtree of the clusterings. The third measure studies how well the contents of subtrees are conserved between the clusterings. CONCLUSION: The HappieClust algorithm is well suited for large-scale gene expression visualization and analysis both on personal computers as well as public online web applications. The software is available from the URL http://www.quretec.com/HappieClust.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA