Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 50
Filtrar
Más filtros

Bases de datos
Tipo del documento
Intervalo de año de publicación
1.
Proc Natl Acad Sci U S A ; 119(16): e2118210119, 2022 04 19.
Artículo en Inglés | MEDLINE | ID: mdl-35412913

RESUMEN

The improving access to increasing amounts of biomedical data provides completely new chances for advanced patient stratification and disease subtyping strategies. This requires computational tools that produce uniformly robust results across highly heterogeneous molecular data. Unsupervised machine learning methodologies are able to discover de novo patterns in such data. Biclustering is especially suited by simultaneously identifying sample groups and corresponding feature sets across heterogeneous omics data. The performance of available biclustering algorithms heavily depends on individual parameterization and varies with their application. Here, we developed MoSBi (molecular signature identification using biclustering), an automated multialgorithm ensemble approach that integrates results utilizing an error model-supported similarity network. We systematically evaluated the performance of 11 available and established biclustering algorithms together with MoSBi. For this, we used transcriptomics, proteomics, and metabolomics data, as well as synthetic datasets covering various data properties. Profiting from multialgorithm integration, MoSBi identified robust group and disease-specific signatures across all scenarios, overcoming single algorithm specificities. Furthermore, we developed a scalable network-based visualization of bicluster communities that supports biological hypothesis generation. MoSBi is available as an R package and web service to make automated biclustering analysis accessible for application in molecular sample stratification.


Asunto(s)
Enfermedad , Perfilación de la Expresión Génica , Metabolómica , Pacientes , Proteómica , Programas Informáticos , Algoritmos , Análisis por Conglomerados , Enfermedad/clasificación , Humanos , Pacientes/clasificación
2.
Clin Chem ; 70(4): 653-659, 2024 04 03.
Artículo en Inglés | MEDLINE | ID: mdl-38416710

RESUMEN

BACKGROUND: Artificial intelligence models constitute specific uses of analysis results and, therefore, necessitate evaluation of analytical performance specifications (APS) for this context specifically. The Model of End-stage Liver Disease (MELD) is a clinical prediction model based on measurements of bilirubin, creatinine, and the international normalized ratio (INR). This study evaluates the propagation of error through the MELD, to inform choice of APS for the MELD input variables. METHODS: A total of 6093 consecutive MELD scores and underlying analysis results were retrospectively collected. "Desirable analytical variation" based on biological variation as well as current local analytical variation was simulated onto the data set as well as onto a constructed data set, representing a worst-case scenario. Resulting changes in MELD score and risk classification were calculated. RESULTS: Biological variation-based APS in the worst-case scenario resulted in 3.26% of scores changing by ≥1 MELD point. In the patient-derived data set, the same variation resulted in 0.92% of samples changing by ≥1 MELD point, and 5.5% of samples changing risk category. Local analytical performance resulted in lower reclassification rates. CONCLUSIONS: Error propagation through MELD is complex and includes population-dependent mechanisms. Biological variation-derived APS were acceptable for all uses of the MELD score. Other combinations of APS can yield equally acceptable results. This analysis exemplifies how error propagation through artificial intelligence models can become highly complex. This complexity will necessitate that both model suppliers and clinical laboratories address analytical performance specifications for the specific use case, as these may differ from performance specifications for traditional use of the analyses.


Asunto(s)
Enfermedad Hepática en Estado Terminal , Humanos , Estudios Retrospectivos , Inteligencia Artificial , Modelos Estadísticos , Pronóstico , Índice de Severidad de la Enfermedad , Creatinina
3.
Bioinformatics ; 38(3): 875-877, 2022 01 12.
Artículo en Inglés | MEDLINE | ID: mdl-34636883

RESUMEN

MOTIVATION: Liquid-chromatography mass-spectrometry (LC-MS) is the established standard for analyzing the proteome in biological samples by identification and quantification of thousands of proteins. Machine learning (ML) promises to considerably improve the analysis of the resulting data, however, there is yet to be any tool that mediates the path from raw data to modern ML applications. More specifically, ML applications are currently hampered by three major limitations: (i) absence of balanced training data with large sample size; (ii) unclear definition of sufficiently information-rich data representations for e.g. peptide identification; (iii) lack of benchmarking of ML methods on specific LC-MS problems. RESULTS: We created the MS2AI pipeline that automates the process of gathering vast quantities of MS data for large-scale ML applications. The software retrieves raw data from either in-house sources or from the proteomics identifications database, PRIDE. Subsequently, the raw data are stored in a standardized format amenable for ML, encompassing MS1/MS2 spectra and peptide identifications. This tool bridges the gap between MS and AI, and to this effect we also present an ML application in the form of a convolutional neural network for the identification of oxidized peptides. AVAILABILITY AND IMPLEMENTATION: An open-source implementation of the software can be found at https://gitlab.com/roettgerlab/ms2ai. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Péptidos , Espectrometría de Masas en Tándem , Cromatografía Liquida/métodos , Espectrometría de Masas en Tándem/métodos , Péptidos/análisis , Programas Informáticos , Proteoma/química
4.
Clin Exp Rheumatol ; 41(9): 1801-1807, 2023 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-36995323

RESUMEN

OBJECTIVES: To compare plasma levels of 92 cardiovascular- and inflammation-related proteins (CIRPs) and to analyse for associations with anti-cyclic citrullinated peptide (anti-CCP) status and disease activity in early and treatment-naive rheumatoid arthritis (RA). METHODS: Olink CVD-III-panel was used to measure 92 CIRP plasma levels in 180 early, treatment-naive, and highly inflamed RA patients from the OPERA trial. CIRP plasma levels as well as correlation between CIRP plasma levels and RA disease activity were compared between anti-CCP groups. CIRP level-based hierarchical cluster analysis was performed in each anti-CCP group separately. RESULTS: The study included 117 anti-CCP-positive and 63 anti-CCP-negative RA patients. Among the 92 CIRPs measured, the levels of chitotriosidase-1 (CHIT1) and tyrosine-protein-phosphatase non-receptor-type substrate-1 (SHPS-1) were increased and those of metalloproteinase inhibitor-4 (TIMP-4) decreased in the anti-CCP-negative group compared to anti-CCP-positive group. The strongest associations with RA disease activity were found for interleukin-2 receptor-subunit-alpha (IL2-RA) and E-selectin levels in the anti-CCP-negative group and for C-C-motif chemokine-16 levels (CCL16) in the anti-CCP-positive group. None of the differences passed the Hochberg sequential multiplicity test, however, the CIPRs were interacting and thus the prerequisites of the Hochberg procedure were not fulfilled. CIRP level-based cluster analysis identified two patient clusters in both anti-CCP groups. Demographic and clinical characteristics were similar in the two clusters for each anti-CCP group. CONCLUSIONS: In active and early RA, the findings regarding CHIT1, SHPS-1 TIMP-4, IL2-RA, E-selectin, and CCL16 differed between the two anti-CCP groups. In addition, we identified two patient clusters that were independent of the anti-CCP status.


Asunto(s)
Artritis Reumatoide , Selectina E , Humanos , Anticuerpos Antiproteína Citrulinada , Interleucina-2 , Autoanticuerpos , Artritis Reumatoide/diagnóstico , Inflamación , Péptidos Cíclicos
5.
J Med Internet Res ; 25: e42621, 2023 07 12.
Artículo en Inglés | MEDLINE | ID: mdl-37436815

RESUMEN

BACKGROUND: Machine learning and artificial intelligence have shown promising results in many areas and are driven by the increasing amount of available data. However, these data are often distributed across different institutions and cannot be easily shared owing to strict privacy regulations. Federated learning (FL) allows the training of distributed machine learning models without sharing sensitive data. In addition, the implementation is time-consuming and requires advanced programming skills and complex technical infrastructures. OBJECTIVE: Various tools and frameworks have been developed to simplify the development of FL algorithms and provide the necessary technical infrastructure. Although there are many high-quality frameworks, most focus only on a single application case or method. To our knowledge, there are no generic frameworks, meaning that the existing solutions are restricted to a particular type of algorithm or application field. Furthermore, most of these frameworks provide an application programming interface that needs programming knowledge. There is no collection of ready-to-use FL algorithms that are extendable and allow users (eg, researchers) without programming knowledge to apply FL. A central FL platform for both FL algorithm developers and users does not exist. This study aimed to address this gap and make FL available to everyone by developing FeatureCloud, an all-in-one platform for FL in biomedicine and beyond. METHODS: The FeatureCloud platform consists of 3 main components: a global frontend, a global backend, and a local controller. Our platform uses a Docker to separate the local acting components of the platform from the sensitive data systems. We evaluated our platform using 4 different algorithms on 5 data sets for both accuracy and runtime. RESULTS: FeatureCloud removes the complexity of distributed systems for developers and end users by providing a comprehensive platform for executing multi-institutional FL analyses and implementing FL algorithms. Through its integrated artificial intelligence store, federated algorithms can easily be published and reused by the community. To secure sensitive raw data, FeatureCloud supports privacy-enhancing technologies to secure the shared local models and assures high standards in data privacy to comply with the strict General Data Protection Regulation. Our evaluation shows that applications developed in FeatureCloud can produce highly similar results compared with centralized approaches and scale well for an increasing number of participating sites. CONCLUSIONS: FeatureCloud provides a ready-to-use platform that integrates the development and execution of FL algorithms while reducing the complexity to a minimum and removing the hurdles of federated infrastructure. Thus, we believe that it has the potential to greatly increase the accessibility of privacy-preserving and distributed data analyses in biomedicine and beyond.


Asunto(s)
Algoritmos , Inteligencia Artificial , Humanos , Empleos en Salud , Programas Informáticos , Redes de Comunicación de Computadores , Privacidad
6.
Mult Scler ; 27(12): 1829-1837, 2021 10.
Artículo en Inglés | MEDLINE | ID: mdl-33464158

RESUMEN

BACKGROUND: Human endogenous retrovirus (HERV) expression in multiple sclerosis (MS) brain lesions may contribute to chronic inflammation, but expression of genome-wide HERVs in different MS lesions is unknown. OBJECTIVE: We examined the HERV expression landscape in different MS lesions compared to control brains. METHODS: Transcripts from 71 MS brain samples and 25 control WM were obtained by next-generation RNA sequencing and mapped against HERV transcripts across the human genome. Differential expression of mapped HERV-W and HERV-H reads between MS lesion types and controls was analysed. RESULTS: Out of 6.38 billion high-quality paired end reads, 174 million reads (2.73%) mapped to HERV transcripts. There was no difference in HERVs expression level between MS and control brains, but HERV-W transcripts were significantly reduced in chronic active lesions. Of the four HERV-W transcripts exclusively present in MS, ERV3633503 located on chromosome 7q21.13 close to the MS genetic risk locus had the highest number of reads. In the HERV-H family, 75% of transcripts located to nearby 7q21-22 were overrepresented in MS, and ERV3643914 was expressed more than 16 times in MS compared to control brains. CONCLUSION: Novel HERV-W and HERV-H transcripts located at chromosome 7 regions were uniquely expressed in MS lesions, indicating their potential role in brain lesion evolution.


Asunto(s)
Retrovirus Endógenos , Esclerosis Múltiple , Encéfalo , Retrovirus Endógenos/genética , Genoma Humano , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Esclerosis Múltiple/genética
7.
Nucleic Acids Res ; 47(1): 85-92, 2019 01 10.
Artículo en Inglés | MEDLINE | ID: mdl-30462289

RESUMEN

Gene regulatory networks (GRNs) and gene expression data form a core element of systems biology-based phenotyping. Changes in the expression of transcription factors are commonly believed to have a causal effect on the expression of their targets. Here we evaluated in the best researched model organism, Escherichia coli, the consistency between a GRN and a large gene expression compendium. Surprisingly, a modest correlation was observed between the expression of transcription factors and their targets and, most noteworthy, both activating and repressing interactions were associated with positive correlation. When evaluated using a sign consistency model we found the regulatory network was not more consistent with measured expression than random network models. We conclude that, at least in E. coli, one cannot expect a causal relationship between the expression of transcription and factors their targets, and that the current static GRN does not adequately explain transcriptional regulation. The implications of this are profound as they question what we consider established knowledge of the systemic biology of cells and point to methodological limitations with respect to single omics analysis, static networks and temporality.


Asunto(s)
Escherichia coli/genética , Redes Reguladoras de Genes/genética , Modelos Teóricos , Algoritmos , Regulación Bacteriana de la Expresión Génica/genética , Biología de Sistemas/tendencias
9.
J Med Internet Res ; 23(6): e28253, 2021 06 02.
Artículo en Inglés | MEDLINE | ID: mdl-33900934

RESUMEN

BACKGROUND: Before the advent of an effective vaccine, nonpharmaceutical interventions, such as mask-wearing, social distancing, and lockdowns, have been the primary measures to combat the COVID-19 pandemic. Such measures are highly effective when there is high population-wide adherence, which requires information on current risks posed by the pandemic alongside a clear exposition of the rules and guidelines in place. OBJECTIVE: Here we analyzed online news media coverage of COVID-19. We quantified the total volume of COVID-19 articles, their sentiment polarization, and leading subtopics to act as a reference to inform future communication strategies. METHODS: We collected 26 million news articles from the front pages of 172 major online news sources in 11 countries (available online at SciRide). Using topic detection, we identified COVID-19-related content to quantify the proportion of total coverage the pandemic received in 2020. The sentiment analysis tool Vader was employed to stratify the emotional polarity of COVID-19 reporting. Further topic detection and sentiment analysis was performed on COVID-19 coverage to reveal the leading themes in pandemic reporting and their respective emotional polarizations. RESULTS: We found that COVID-19 coverage accounted for approximately 25.3% of all front-page online news articles between January and October 2020. Sentiment analysis of English-language sources revealed that overall COVID-19 coverage was not exclusively negatively polarized, suggesting wide heterogeneous reporting of the pandemic. Within this heterogenous coverage, 16% of COVID-19 news articles (or 4% of all English-language articles) can be classified as highly negatively polarized, citing issues such as death, fear, or crisis. CONCLUSIONS: The goal of COVID-19 public health communication is to increase understanding of distancing rules and to maximize the impact of governmental policy. The extent to which the quantity and quality of information from different communication channels (eg, social media, government pages, and news) influence public understanding of public health measures remains to be established. Here we conclude that a quarter of all reporting in 2020 covered COVID-19, which is indicative of information overload. In this capacity, our data and analysis form a quantitative basis for informing health communication strategies along traditional news media channels to minimize the risks of COVID-19 while vaccination is rolled out.


Asunto(s)
COVID-19/epidemiología , Minería de Datos/métodos , Medios de Comunicación de Masas/estadística & datos numéricos , Salud Pública/métodos , Medios de Comunicación Sociales/estadística & datos numéricos , Recursos en Salud , Humanos , Pandemias , SARS-CoV-2/aislamiento & purificación
11.
Nat Methods ; 12(11): 1033-8, 2015 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-26389570

RESUMEN

Identifying groups of similar objects is a popular first step in biomedical data analysis, but it is error-prone and impossible to perform manually. Many computational methods have been developed to tackle this problem. Here we assessed 13 well-known methods using 24 data sets ranging from gene expression to protein domains. Performance was judged on the basis of 13 common cluster validity indices. We developed a clustering analysis platform, ClustEval (http://clusteval.mpi-inf.mpg.de), to promote streamlined evaluation, comparison and reproducibility of clustering results in the future. This allowed us to objectively evaluate the performance of all tools on all data sets with up to 1,000 different parameter sets each, resulting in a total of more than 4 million calculated cluster validity indices. We observed that there was no universal best performer, but on the basis of this wide-ranging comparison we were able to develop a short guideline for biomedical clustering tasks. ClustEval allows biomedical researchers to pick the appropriate tool for their data type and allows method developers to compare their tool to the state of the art.


Asunto(s)
Análisis por Conglomerados , Biología Computacional/métodos , Perfilación de la Expresión Génica/métodos , Reconocimiento de Normas Patrones Automatizadas/métodos , Algoritmos , Animales , Automatización , Regulación de la Expresión Génica , Humanos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Estructura Terciaria de Proteína , Control de Calidad , Reproducibilidad de los Resultados , Programas Informáticos
12.
Bioinformatics ; 33(4): 549-551, 2017 02 15.
Artículo en Inglés | MEDLINE | ID: mdl-27794558

RESUMEN

Motivation: Epigenome-wide association studies (EWAS) generate big epidemiological datasets. They aim for detecting differentially methylated DNA regions that are likely to influence transcriptional gene activity and, thus, the regulation of metabolic processes. The by far most widely used technology is the Illumina Methylation BeadChip, which measures the methylation levels of 450 (850) thousand cytosines, in the CpG dinucleotide context in a set of patients compared to a control group. Many bioinformatics tools exist for raw data analysis. However, most of them require some knowledge in the programming language R, have no user interface, and do not offer all necessary steps to guide users from raw data all the way down to statistically significant differentially methylated regions (DMRs) and the associated genes. Results: Here, we present DiMmeR (Discovery of Multiple Differentially Methylated Regions), the first free standalone software that interactively guides with a user-friendly graphical user interface (GUI) scientists the whole way through EWAS data analysis. It offers parallelized statistical methods for efficiently identifying DMRs in both Illumina 450K and 850K EPIC chip data. DiMmeR computes empirical P -values through randomization tests, even for big datasets of hundreds of patients and thousands of permutations within a few minutes on a standard desktop PC. It is independent of any third-party libraries, computes regression coefficients, P -values and empirical P -values, and it corrects for multiple testing. Availability and Implementation: DiMmeR is publicly available at http://dimmer.compbio.sdu.dk . Contact: diogoma@bmb.sdu.dk. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Islas de CpG , Metilación de ADN , Epigenómica/métodos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Programas Informáticos , Humanos
14.
Ann Hum Genet ; 80(2): 81-7, 2016 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-26831219

RESUMEN

Poor nutrition during critical growth phases may alter the structural and physiologic development of vital organs thus "programming" the susceptibility to adult-onset diseases and disease-related health conditions. Epigenome-wide association studies have been performed in birth-weight discordant twin pairs to find evidence for such "programming" effects, but no significant results emerged. We further investigated this issue using a new computational approach: Instead of probing single genomic sites for significant alterations in epigenetic marks, we scan for differentially methylated genomic regions. Whole genome DNA methylation levels were measured in whole blood from 150 pairs of adult identical twins discordant for birth-weight. Intrapair differential DNA methylation was associated with qualitative (large or small) and quantitative (percentage) birth-weight discordance at each genomic site using regression models adjusting for age and sex. Based on the regression results, genomic regions with consistent alteration patterns of DNA methylation were located and tested for significant robustness using computational permutation tests. This yielded an interesting genomic region on chromosome 1, which is significantly differentially methylated for quantitative birth-weight discordance. The region covers two genes (TYW3 and CRYZ) both reportedly associated with metabolism. We conclude that prenatal conditions for birth-weight discordance may result in persistent epigenetic modifications potentially affecting even adult health.


Asunto(s)
Peso al Nacer , Metilación de ADN , Epigénesis Genética , Adulto , Anciano , Femenino , Genoma Humano , Genómica , Humanos , Modelos Lineales , Masculino , Persona de Mediana Edad , Gemelos Monocigóticos
15.
Nucleic Acids Res ; 42(9): e78, 2014 May.
Artículo en Inglés | MEDLINE | ID: mdl-24682815

RESUMEN

The explosion of the biological data has dramatically reformed today's biological research. The need to integrate and analyze high-dimensional biological data on a large scale is driving the development of novel bioinformatics approaches. Biclustering, also known as 'simultaneous clustering' or 'co-clustering', has been successfully utilized to discover local patterns in gene expression data and similar biomedical data types. Here, we contribute a new heuristic: 'Bi-Force'. It is based on the weighted bicluster editing model, to perform biclustering on arbitrary sets of biological entities, given any kind of pairwise similarities. We first evaluated the power of Bi-Force to solve dedicated bicluster editing problems by comparing Bi-Force with two existing algorithms in the BiCluE software package. We then followed a biclustering evaluation protocol in a recent review paper from Eren et al. (2013) (A comparative analysis of biclustering algorithms for gene expressiondata. Brief. Bioinform., 14:279-292.) and compared Bi-Force against eight existing tools: FABIA, QUBIC, Cheng and Church, Plaid, BiMax, Spectral, xMOTIFs and ISA. To this end, a suite of synthetic datasets as well as nine large gene expression datasets from Gene Expression Omnibus were analyzed. All resulting biclusters were subsequently investigated by Gene Ontology enrichment analysis to evaluate their biological relevance. The distinct theoretical foundation of Bi-Force (bicluster editing) is more powerful than strict biclustering. We thus outperformed existing tools with Bi-Force at least when following the evaluation protocols from Eren et al. Bi-Force is implemented in Java and integrated into the open source software package of BiCluE. The software as well as all used datasets are publicly available at http://biclue.mpi-inf.mpg.de.


Asunto(s)
Perfilación de la Expresión Génica , Algoritmos , Animales , Análisis por Conglomerados , Simulación por Computador , Bases de Datos Genéticas , Ontología de Genes , Humanos , Modelos Genéticos , Análisis de Secuencia por Matrices de Oligonucleótidos , Análisis de Componente Principal , Programas Informáticos
16.
BMC Genomics ; 16: 452, 2015 Jun 11.
Artículo en Inglés | MEDLINE | ID: mdl-26062809

RESUMEN

BACKGROUND: Organisms utilize a multitude of mechanisms for responding to changing environmental conditions, maintaining their functional homeostasis and to overcome stress situations. One of the most important mechanisms is transcriptional gene regulation. In-depth study of the transcriptional gene regulatory network can lead to various practical applications, creating a greater understanding of how organisms control their cellular behavior. DESCRIPTION: In this work, we present a new database, CMRegNet for the gene regulatory networks of Corynebacterium glutamicum ATCC 13032 and Mycobacterium tuberculosis H37Rv. We furthermore transferred the known networks of these model organisms to 18 other non-model but phylogenetically close species (target organisms) of the CMNR group. In comparison to other network transfers, for the first time we utilized two model organisms resulting into a more diverse and complete network of the target organisms. CONCLUSION: CMRegNet provides easy access to a total of 3,103 known regulations in C. glutamicum ATCC 13032 and M. tuberculosis H37Rv and to 38,940 evolutionary conserved interactions for 18 non-model species of the CMNR group. This makes CMRegNet to date the most comprehensive database of regulatory interactions of CMNR bacteria. The content of CMRegNet is publicly available online via a web interface found at http://lgcm.icb.ufmg.br/cmregnet .


Asunto(s)
Corynebacterium glutamicum/genética , Bases de Datos Genéticas , Redes Reguladoras de Genes , Mycobacterium tuberculosis/genética , Biología Computacional , Corynebacterium glutamicum/clasificación , Regulación Bacteriana de la Expresión Génica , Genes Bacterianos , Internet , Mycobacterium tuberculosis/clasificación , Filogenia
17.
Bioinformatics ; 29(2): 215-22, 2013 Jan 15.
Artículo en Inglés | MEDLINE | ID: mdl-23142964

RESUMEN

MOTIVATION: Homology detection is a long-standing challenge in computational biology. To tackle this problem, typically all-versus-all BLAST results are coupled with data partitioning approaches resulting in clusters of putative homologous proteins. One of the main problems, however, has been widely neglected: all clustering tools need a density parameter that adjusts the number and size of the clusters. This parameter is crucial but hard to estimate without gold standard data at hand. Developing a gold standard, however, is a difficult and time consuming task. Having a reliable method for detecting clusters of homologous proteins between a huge set of species would open opportunities for better understanding the genetic repertoire of bacteria with different lifestyles. RESULTS: Our main contribution is a method for identifying a suitable and robust density parameter for protein homology detection without a given gold standard. Therefore, we study the core genome of 89 actinobacteria. This allows us to incorporate background knowledge, i.e. the assumption that a set of evolutionarily closely related species should share a comparably high number of evolutionarily conserved proteins (emerging from phylum-specific housekeeping genes). We apply our strategy to find genes/proteins that are specific for certain actinobacterial lifestyles, i.e. different types of pathogenicity. The whole study was performed with transitivity clustering, as it only requires a single intuitive density parameter and has been shown to be well applicable for the task of protein sequence clustering. Note, however, that the presented strategy generally does not depend on our clustering method but can easily be adapted to other clustering approaches. AVAILABILITY: All results are publicly available at http://transclust.mmci.uni-saarland.de/actino_core/ or as Supplementary Material of this article. CONTACT: roettger@mpi-inf.mpg.de SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Actinobacteria/clasificación , Proteínas Bacterianas/química , Homología de Secuencia de Aminoácido , Actinobacteria/genética , Actinobacteria/patogenicidad , Algoritmos , Proteínas Bacterianas/genética , Análisis por Conglomerados , Genoma Bacteriano , Modelos Genéticos , Filogenia , Alineación de Secuencia
18.
Nucleic Acids Res ; 40(Database issue): D610-4, 2012 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-22080556

RESUMEN

Post-genomic analysis techniques such as next-generation sequencing have produced vast amounts of data about micro organisms including genetic sequences, their functional annotations and gene regulatory interactions. The latter are genetic mechanisms that control a cell's characteristics, for instance, pathogenicity as well as survival and reproduction strategies. CoryneRegNet is the reference database and analysis platform for corynebacterial gene regulatory networks. In this article we introduce the updated version 6.0 of CoryneRegNet and describe the updated database content which includes, 6352 corynebacterial regulatory interactions compared with 4928 interactions in release 5.0 and 3235 regulations in release 4.0, respectively. We also demonstrate how we support the community by integrating analysis and visualization features for transiently imported custom data, such as gene regulatory interactions. Furthermore, with release 6.0, we provide easy-to-use functions that allow the user to submit data for persistent storage with the CoryneRegNet database. Thus, it offers important options to its users in terms of community demands. CoryneRegNet is publicly available at http://www.coryneregnet.de.


Asunto(s)
Corynebacterium/genética , Bases de Datos Genéticas , Redes Reguladoras de Genes , Gráficos por Computador , Regulación Bacteriana de la Expresión Génica , Anotación de Secuencia Molecular
19.
JBI Evid Synth ; 22(3): 453-460, 2024 Mar 01.
Artículo en Inglés | MEDLINE | ID: mdl-38328955

RESUMEN

OBJECTIVE: The objective of this scoping review is to describe the scope and nature of research on the monitoring of clinical artificial intelligence (AI) systems. The review will identify the various methodologies used to monitor clinical AI, while also mapping the factors that influence the selection of monitoring approaches. INTRODUCTION: AI is being used in clinical decision-making at an increasing rate. While much attention has been directed toward the development and validation of AI for clinical applications, the practical implementation aspects, notably the establishment of rational monitoring/quality assurance systems, has received comparatively limited scientific interest. Given the scarcity of evidence and the heterogeneity of methodologies used in this domain, there is a compelling rationale for conducting a scoping review on this subject. INCLUSION CRITERIA: This scoping review will include any publications that describe systematic, continuous, or repeated initiatives that evaluate or predict clinical performance of AI models with direct implications for the management of patients in any segment of the health care system. METHODS: Publications will be identified through searches of the MEDLINE (Ovid), Embase (Ovid), and Scopus databases. Additionally, backward and forward citation searches, as well as a thorough investigation of gray literature, will be conducted. Title and abstract screening, full-text evaluation, and data extraction will be performed by 2 or more independent reviewers. Data will be extracted using a tool developed by the authors. The results will be presented graphically and narratively. REVIEW REGISTRATION: Open Science Framework https://osf.io/afkrn.


Asunto(s)
Inteligencia Artificial , Literatura de Revisión como Asunto , Humanos
20.
Bioinform Adv ; 4(1): vbae033, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38560554

RESUMEN

Motivation: Nanobodies are a subclass of immunoglobulins, whose binding site consists of only one peptide chain, bestowing favorable biophysical properties. Recently, the first nanobody therapy was approved, paving the way for further clinical applications of this antibody format. Further development of nanobody-based therapeutics could be streamlined by computational methods. One of such methods is infilling-positional prediction of biologically feasible mutations in nanobodies. Being able to identify possible positional substitutions based on sequence context, facilitates functional design of such molecules. Results: Here we present nanoBERT, a nanobody-specific transformer to predict amino acids in a given position in a query sequence. We demonstrate the need to develop such machine-learning based protocol as opposed to gene-specific positional statistics since appropriate genetic reference is not available. We benchmark nanoBERT with respect to human-based language models and ESM-2, demonstrating the benefit for domain-specific language models. We also demonstrate the benefit of employing nanobody-specific predictions for fine-tuning on experimentally measured thermostability dataset. We hope that nanoBERT will help engineers in a range of predictive tasks for designing therapeutic nanobodies. Availability and implementation: https://huggingface.co/NaturalAntibody/.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA