Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 75
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Int J Mol Sci ; 21(3)2020 Feb 05.
Artículo en Inglés | MEDLINE | ID: mdl-32033398

RESUMEN

Osteosarcoma is the most common subtype of primary bone cancer, affecting mostly adolescents. In recent years, several studies have focused on elucidating the molecular mechanisms of this sarcoma; however, its molecular etiology has still not been determined with precision. Therefore, we applied a consensus strategy with the use of several bioinformatics tools to prioritize genes involved in its pathogenesis. Subsequently, we assessed the physical interactions of the previously selected genes and applied a communality analysis to this protein-protein interaction network. The consensus strategy prioritized a total list of 553 genes. Our enrichment analysis validates several studies that describe the signaling pathways PI3K/AKT and MAPK/ERK as pathogenic. The gene ontology described TP53 as a principal signal transducer that chiefly mediates processes associated with cell cycle and DNA damage response It is interesting to note that the communality analysis clusters several members involved in metastasis events, such as MMP2 and MMP9, and genes associated with DNA repair complexes, like ATM, ATR, CHEK1, and RAD51. In this study, we have identified well-known pathogenic genes for osteosarcoma and prioritized genes that need to be further explored.


Asunto(s)
Neoplasias Óseas/genética , Neoplasias Óseas/patología , Osteosarcoma/genética , Osteosarcoma/patología , Biología Computacional/métodos , Consenso , Reparación del ADN/genética , Regulación Neoplásica de la Expresión Génica/genética , Ontología de Genes , Redes Reguladoras de Genes/genética , Humanos , Mapas de Interacción de Proteínas/genética , Transducción de Señal/genética
2.
Int J Mol Sci ; 20(18)2019 Sep 05.
Artículo en Inglés | MEDLINE | ID: mdl-31491969

RESUMEN

In this work, we improved a previous model used for the prediction of proteomes as new B-cell epitopes in vaccine design. The predicted epitope activity of a queried peptide is based on its sequence, a known reference epitope sequence under specific experimental conditions. The peptide sequences were transformed into molecular descriptors of sequence recurrence networks and were mixed under experimental conditions. The new models were generated using 709,100 instances of pair descriptors for query and reference peptide sequences. Using perturbations of the initial descriptors under sequence or assay conditions, 10 transformed features were used as inputs for seven Machine Learning methods. The best model was obtained with random forest classifiers with an Area Under the Receiver Operating Characteristics (AUROC) of 0.981 ± 0.0005 for the external validation series (five-fold cross-validation). The database included information about 83,683 peptides sequences, 1448 epitope organisms, 323 host organisms, 15 types of in vivo processes, 28 experimental techniques, and 505 adjuvant additives. The current model could improve the in silico predictions of epitopes for vaccine design. The script and results are available as a free repository.


Asunto(s)
Mapeo Epitopo , Aprendizaje Automático , Péptidos/inmunología , Secuencia de Aminoácidos , Humanos , Péptidos/química , Curva ROC , Relación Estructura-Actividad
4.
Int J Mol Sci ; 17(8)2016 Aug 11.
Artículo en Inglés | MEDLINE | ID: mdl-27529225

RESUMEN

Over the past decade, Deep Artificial Neural Networks (DNNs) have become the state-of-the-art algorithms in Machine Learning (ML), speech recognition, computer vision, natural language processing and many other tasks. This was made possible by the advancement in Big Data, Deep Learning (DL) and drastically increased chip processing abilities, especially general-purpose graphical processing units (GPGPUs). All this has created a growing interest in making the most of the potential offered by DNNs in almost every field. An overview of the main architectures of DNNs, and their usefulness in Pharmacology and Bioinformatics are presented in this work. The featured applications are: drug design, virtual screening (VS), Quantitative Structure-Activity Relationship (QSAR) research, protein structure prediction and genomics (and other omics) data mining. The future need of neuromorphic hardware for DNNs is also discussed, and the two most advanced chips are reviewed: IBM TrueNorth and SpiNNaker. In addition, this review points out the importance of considering not only neurons, as DNNs and neuromorphic chips should also include glial cells, given the proven importance of astrocytes, a type of glial cell which contributes to information processing in the brain. The Deep Artificial Neuron-Astrocyte Networks (DANAN) could overcome the difficulties in architecture design, learning process and scalability of the current ML methods.


Asunto(s)
Biología Computacional/métodos , Redes Neurales de la Computación , Algoritmos , Animales , Humanos , Relación Estructura-Actividad Cuantitativa
5.
J Theor Biol ; 349: 12-21, 2014 May 21.
Artículo en Inglés | MEDLINE | ID: mdl-24491256

RESUMEN

The cell death (CD) is a dynamic biological function involved in physiological and pathological processes. Due to the complexity of CD, there is a demand for fast theoretical methods that can help to find new CD molecular targets. The current work presents the first classification model to predict CD-related proteins based on Markov Mean Properties. These protein descriptors have been calculated with the MInD-Prot tool using the topological information of the amino acid contact networks of the 2423 protein chains, five atom physicochemical properties and the protein 3D regions. The Machine Learning algorithms from Weka were used to find the best classification model for CD-related protein chains using all 20 attributes. The most accurate algorithm to solve this problem was K*. After several feature subset methods, the best model found is based on only 11 variables and is characterized by the Area Under the Receiver Operating Characteristic Curve (AUROC) of 0.992 and the true positive rate (TP Rate) of 88.2% (validation set). 7409 protein chains labeled with "unknown function" in the PDB Databank were analyzed with the best model in order to predict the CD-related biological activity. Thus, several proteins have been predicted to have CD-related function in Homo sapiens: 3DRX-involved in virus-host interaction biological process, protein homooligomerization; 4DWF-involved in cell differentiation, chromatin modification, DNA damage response, protein stabilization; 1IUR-involved in ATP binding, chaperone binding; 1J7D-involved in DNA double-strand break processing, histone ubiquitination, nucleotide-binding oligomerization; 1UTU-linked with DNA repair, regulation of transcription; 3EEC-participating to the cellular membrane organization, egress of virus within host cell, class mediator resulting in cell cycle arrest, negative regulation of ubiquitin-protein ligase activity involved in mitotic cell cycle and apoptotic process. Other proteins from bacteria predicted as CD-related are 2G3V - a CAG pathogenicity island protein 13 from Helicobacter pylori, 4G5A - a hypothetical protein in Bacteroides thetaiotaomicron, 1YLK-involved in the nitrogen metabolism of Mycobacterium tuberculosis, and 1XSV - with possible DNA/RNA binding domains. The results demonstrated the possibility to predict CD-related proteins using molecular information encoded into the protein 3D structure. Thus, the current work demonstrated the possibility to predict new molecular targets involved in cell-death processes.


Asunto(s)
Cadenas de Markov , Proteínas/clasificación , Algoritmos , Muerte Celular , Bases de Datos de Proteínas , Estándares de Referencia
6.
J Chem Inf Model ; 54(1): 16-29, 2014 Jan 27.
Artículo en Inglés | MEDLINE | ID: mdl-24320872

RESUMEN

The use of numerical parameters in Complex Network analysis is expanding to new fields of application. At a molecular level, we can use them to describe the molecular structure of chemical entities, protein interactions, or metabolic networks. However, the applications are not restricted to the world of molecules and can be extended to the study of macroscopic nonliving systems, organisms, or even legal or social networks. On the other hand, the development of the field of Artificial Intelligence has led to the formulation of computational algorithms whose design is based on the structure and functioning of networks of biological neurons. These algorithms, called Artificial Neural Networks (ANNs), can be useful for the study of complex networks, since the numerical parameters that encode information of the network (for example centralities/node descriptors) can be used as inputs for the ANNs. The Wiener index (W) is a graph invariant widely used in chemoinformatics to quantify the molecular structure of drugs and to study complex networks. In this work, we explore for the first time the possibility of using Markov chains to calculate analogues of node distance numbers/W to describe complex networks from the point of view of their nodes. These parameters are called Markov-Wiener node descriptors of order k(th) (W(k)). Please, note that these descriptors are not related to Markov-Wiener stochastic processes. Here, we calculated the W(k)(i) values for a very high number of nodes (>100,000) in more than 100 different complex networks using the software MI-NODES. These networks were grouped according to the field of application. Molecular networks include the Metabolic Reaction Networks (MRNs) of 40 different organisms. In addition, we analyzed other biological and legal and social networks. These include the Interaction Web Database Biological Networks (IWDBNs), with 75 food webs or ecological systems and the Spanish Financial Law Network (SFLN). The calculated W(k)(i) values were used as inputs for different ANNs in order to discriminate correct node connectivity patterns from incorrect random patterns. The MIANN models obtained present good values of Sensitivity/Specificity (%): MRNs (78/78), IWDBNs (90/88), and SFLN (86/84). These preliminary results are very promising from the point of view of a first exploratory study and suggest that the use of these models could be extended to the high-throughput re-evaluation of connectivity in known complex networks (collation).


Asunto(s)
Modelos Biológicos , Redes Neurales de la Computación , Algoritmos , Biología Computacional , Bases de Datos Factuales , Ecosistema , Jurisprudencia , Cadenas de Markov , Redes y Vías Metabólicas , Modelos Econométricos , Modelos Teóricos , Apoyo Social , Programas Informáticos
7.
J Chem Inf Model ; 54(3): 744-55, 2014 Mar 24.
Artículo en Inglés | MEDLINE | ID: mdl-24521170

RESUMEN

This work is aimed at describing the workflow for a methodology that combines chemoinformatics and pharmacoepidemiology methods and at reporting the first predictive model developed with this methodology. The new model is able to predict complex networks of AIDS prevalence in the US counties, taking into consideration the social determinants and activity/structure of anti-HIV drugs in preclinical assays. We trained different Artificial Neural Networks (ANNs) using as input information indices of social networks and molecular graphs. We used a Shannon information index based on the Gini coefficient to quantify the effect of income inequality in the social network. We obtained the data on AIDS prevalence and the Gini coefficient from the AIDSVu database of Emory University. We also used the Balaban information indices to quantify changes in the chemical structure of anti-HIV drugs. We obtained the data on anti-HIV drug activity and structure (SMILE codes) from the ChEMBL database. Last, we used Box-Jenkins moving average operators to quantify information about the deviations of drugs with respect to data subsets of reference (targets, organisms, experimental parameters, protocols). The best model found was a Linear Neural Network (LNN) with values of Accuracy, Specificity, and Sensitivity above 0.76 and AUROC > 0.80 in training and external validation series. This model generates a complex network of AIDS prevalence in the US at county level with respect to the preclinical activity of anti-HIV drugs in preclinical assays. To train/validate the model and predict the complex network we needed to analyze 43,249 data points including values of AIDS prevalence in 2,310 counties in the US vs ChEMBL results for 21,582 unique drugs, 9 viral or human protein targets, 4,856 protocols, and 10 possible experimental measures.


Asunto(s)
Síndrome de Inmunodeficiencia Adquirida/tratamiento farmacológico , Síndrome de Inmunodeficiencia Adquirida/epidemiología , Fármacos Anti-VIH/uso terapéutico , Algoritmos , Animales , Fármacos Anti-VIH/química , Bases de Datos Factuales , Evaluación Preclínica de Medicamentos , VIH/efectos de los fármacos , VIH/aislamiento & purificación , Humanos , Modelos Estadísticos , Redes Neurales de la Computación , Prevalencia , Apoyo Social , Estados Unidos/epidemiología
8.
Assist Technol ; 26(1): 33-44, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-24800452

RESUMEN

The purpose of this study is to describe the process of assessment of three assistive devices to meet the needs of a woman with cerebral palsy (CP) in order to provide her with computer access and use. The user has quadriplegic CP, with anarthria, using a syllabic keyboard. Devices were evaluated through a three-step approach: (a) use of a questionnaire to preselect potential assistive technologies, (b) use of an eTAO tool to determine the effectiveness of each devised, and (c) a conducting semi-structured interview to obtain qualitative data. Touch screen, joystick, and trackball were the preselected devices. The best device that met the user's needs and priorities was joystick. The finding was corroborated by both the eTAO tool and the semi-structured interview. Computers are a basic form of social participation. It is important to consider the special needs and priorities of users and to try different devices when undertaking a device-selection process. Environmental and personal factors have to be considered, as well. This leads to a need to evaluate new tools in order to provide the appropriate support. The eTAO could be a suitable instrument for this purpose. Additional research is also needed to understand how to better match devices with different user populations and how to comprehensively evaluate emerging technologies relative to users with disabilities.


Asunto(s)
Parálisis Cerebral , Periféricos de Computador , Diseño de Equipo , Dispositivos de Autoayuda/normas , Interfaz Usuario-Computador , Adulto , Ergonomía , Femenino , Humanos , Investigación Cualitativa , Encuestas y Cuestionarios
9.
Heliyon ; 10(7): e28560, 2024 Apr 15.
Artículo en Inglés | MEDLINE | ID: mdl-38590890

RESUMEN

Single Sign-On (SSO) methods are the primary solution to authenticate users across multiple web systems. These mechanisms streamline the authentication procedure by avoiding duplicate developments of authentication modules for each application. Besides, these mechanisms also provide convenience to the end-user by keeping the user authenticated when switching between different contexts. To ensure this cross-application authentication, SSO relies on an Identity Provider (IdP), which is commonly set up and managed by each institution that needs to enforce SSO internally. However, the solution is not so straightforward when several institutions need to cooperate in a unique ecosystem. This could be tackled by centralizing the authentication mechanisms in one of the involved entities, a solution raising responsibilities that may be difficult for peers to accept. Moreover, this solution is not appropriate for dynamic groups, where peers may join or leave frequently. In this paper, we propose an architecture that uses a trusted third-party service to authenticate multiple entities, ensuring the isolation of the user's attributes between this service and the institutional SSO systems. This architecture was validated in the EHDEN Portal, which includes web tools and services of this European health project, to establish a Federated Authentication schema.

10.
J Cheminform ; 16(1): 27, 2024 Mar 07.
Artículo en Inglés | MEDLINE | ID: mdl-38449058

RESUMEN

For understanding a chemical compound's mechanism of action and its side effects, as well as for drug discovery, it is crucial to predict its possible protein targets. This study examines 15 developed target-centric models (TCM) employing different molecular descriptions and machine learning algorithms. They were contrasted with 17 third-party models implemented as web tools (WTCM). In both sets of models, consensus strategies were implemented as potential improvement over individual predictions. The findings indicate that TCM reach f1-score values greater than 0.8. Comparing both approaches, the best TCM achieves values of 0.75, 0.61, 0.25 and 0.38 for true positive/negative rates (TPR, TNR) and false negative/positive rates (FNR, FPR); outperforming the best WTCM. Moreover, the consensus strategy proves to have the most relevant results in the top 20 % of target profiles. TCM consensus reach TPR and FNR values of 0.98 and 0; while on WTCM reach values of 0.75 and 0.24. The implemented computational tool with the TCM and their consensus strategy at: https://bioquimio.udla.edu.ec/tidentification01/ . Scientific Contribution: We compare and discuss the performances of 17 public compound-target interaction prediction models and 15 new constructions. We also explore a compound-target interaction prioritization strategy using a consensus approach, and we analyzed the challenging involved in interactions modeling.

11.
An Pediatr (Engl Ed) ; 100(3): 195-201, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-38461129

RESUMEN

This article examines the use of artificial intelligence (AI) in the field of paediatric care within the framework of the 7P medicine model (Predictive, Preventive, Personalized, Precise, Participatory, Peripheral and Polyprofessional). It highlights various applications of AI in the diagnosis, treatment and management of paediatric diseases as well as the role of AI in prevention and in the efficient management of health care resources and the resulting impact on the sustainability of public health systems. Successful cases of the application of AI in the paediatric care setting are presented, placing emphasis on the need to move towards a 7P health care model. Artificial intelligence is revolutionizing society at large and has a great potential for significantly improving paediatric care.


Asunto(s)
Inteligencia Artificial , Humanos , Niño
12.
Sci Rep ; 14(1): 19359, 2024 08 21.
Artículo en Inglés | MEDLINE | ID: mdl-39169044

RESUMEN

The druggable proteome refers to proteins that can bind to small molecules with appropriate chemical affinity, inducing a favorable clinical response. Predicting druggable proteins through screening and in silico modeling is imperative for drug design. To contribute to this field, we developed an accurate predictive classifier for druggable cancer-driving proteins using amino acid composition descriptors of protein sequences and 13 machine learning linear and non-linear classifiers. The optimal classifier was achieved with the support vector machine method, utilizing 200 tri-amino acid composition descriptors. The high performance of the model is evident from an area under the receiver operating characteristics (AUROC) of 0.975 ± 0.003 and an accuracy of 0.929 ± 0.006 (threefold cross-validation). The machine learning prediction model was enhanced with multi-omics approaches, including the target-disease evidence score, the shortest pathways to cancer hallmarks, structure-based ligandability assessment, unfavorable prognostic protein analysis, and the oncogenic variome. Additionally, we performed a drug repurposing analysis to identify drugs with the highest affinity capable of targeting the best predicted proteins. As a result, we identified 79 key druggable cancer-driving proteins with the highest ligandability, and 23 of them demonstrated unfavorable prognostic significance across 16 TCGA PanCancer types: CDKN2A, BCL10, ACVR1, CASP8, JAG1, TSC1, NBN, PREX2, PPP2R1A, DNM2, VAV1, ASXL1, TPR, HRAS, BUB1B, ATG7, MARK3, SETD2, CCNE1, MUTYH, CDKN2C, RB1, and SMARCA4. Moreover, we prioritized 11 clinically relevant drugs targeting these proteins. This strategy effectively predicts and prioritizes biomarkers, therapeutic targets, and drugs for in-depth studies in clinical trials. Scripts are available at https://github.com/muntisa/machine-learning-for-druggable-proteins .


Asunto(s)
Inteligencia Artificial , Neoplasias , Humanos , Neoplasias/tratamiento farmacológico , Neoplasias/genética , Neoplasias/metabolismo , Antineoplásicos/farmacología , Antineoplásicos/uso terapéutico , Antineoplásicos/química , Aprendizaje Automático , Proteínas de Neoplasias/metabolismo , Proteínas de Neoplasias/genética , Proteínas de Neoplasias/química , Máquina de Vectores de Soporte , Reposicionamiento de Medicamentos/métodos , Biología Computacional/métodos , Multiómica
13.
Stud Health Technol Inform ; 294: 585-586, 2022 May 25.
Artículo en Inglés | MEDLINE | ID: mdl-35612156

RESUMEN

Many clinical studies are greatly dependent on an efficient identification of relevant datasets. This selection can be performed in existing health data catalogues, by searching for available metadata. The search process can be optimised through questioning-answering interfaces, to help researchers explore the available data present. However, when searching the distinct catalogues the lack of metadata harmonisation imposes a few bottlenecks. This paper presents a methodology to allow semantic search over several biomedical database catalogues, by extracting the information using a shared domain knowledge. The resulting pipeline allows the converted data to be published as FAIR endpoints, and it provides an end-user interface that accepts natural language questions.


Asunto(s)
Metadatos , Semántica , Bases de Datos Factuales , Lenguaje , Procesamiento de Lenguaje Natural
14.
J Proteome Res ; 10(4): 1698-718, 2011 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-21184613

RESUMEN

Many drugs with very different affinity to a large number of receptors are described. Thus, in this work, we selected drug-target pairs (DTPs/nDTPs) of drugs with high affinity/nonaffinity for different targets. Quantitative structure-activity relationship (QSAR) models become a very useful tool in this context because they substantially reduce time and resource-consuming experiments. Unfortunately, most QSAR models predict activity against only one protein target and/or they have not been implemented on a public Web server yet, freely available online to the scientific community. To solve this problem, we developed a multitarget QSAR (mt-QSAR) classifier combining the MARCH-INSIDE software for the calculation of the structural parameters of drug and target with the linear discriminant analysis (LDA) method in order to seek the best model. The accuracy of the best LDA model was 94.4% (3,859/4,086 cases) for training and 94.9% (1,909/2,012 cases) for the external validation series. In addition, we implemented the model into the Web portal Bio-AIMS as an online server entitled MARCH-INSIDE Nested Drug-Bank Exploration & Screening Tool (MIND-BEST), located at http://miaja.tic.udc.es/Bio-AIMS/MIND-BEST.php . This online tool is based on PHP/HTML/Python and MARCH-INSIDE routines. Finally, we illustrated two practical uses of this server with two different experiments. In experiment 1, we report for the first time a MIND-BEST prediction, synthesis, characterization, and MAO-A and MAO-B pharmacological assay of eight rasagiline derivatives, promising for anti-Parkinson drug design. In experiment 2, we report sampling, parasite culture, sample preparation, 2-DE, MALDI-TOF and -TOF/TOF MS, MASCOT search, 3D structure modeling with LOMETS, and MIND-BEST prediction for different peptides as new protein of the found in the proteome of the bird parasite Trichomonas gallinae, which is promising for antiparasite drug targets discovery.


Asunto(s)
Diseño de Fármacos , Evaluación Preclínica de Medicamentos/métodos , Glucosafosfato Deshidrogenasa/metabolismo , Internet , Inhibidores de la Monoaminooxidasa/química , Monoaminooxidasa/metabolismo , Proteínas Protozoarias/metabolismo , Trichomonas , Animales , Antiparasitarios/química , Antiparasitarios/farmacología , Columbidae/microbiología , Descubrimiento de Drogas , Glucosafosfato Deshidrogenasa/química , Indanos/síntesis química , Indanos/química , Modelos Moleculares , Modelos Teóricos , Datos de Secuencia Molecular , Estructura Molecular , Monoaminooxidasa/química , Inhibidores de la Monoaminooxidasa/síntesis química , Péptidos/química , Conformación Proteica , Proteínas Protozoarias/química , Relación Estructura-Actividad Cuantitativa , Trichomonas/química , Trichomonas/efectos de los fármacos , Trichomonas/enzimología
15.
J Theor Biol ; 271(1): 136-44, 2011 Feb 21.
Artículo en Inglés | MEDLINE | ID: mdl-21130100

RESUMEN

A statistical approach has been applied to analyse primary structure patterns at inner positions of α-helices in proteins. A systematic survey was carried out in a recent sample of non-redundant proteins selected from the Protein Data Bank, which were used to analyse α-helix structures for amino acid pairing patterns. Only residues more than three positions apart from both termini of the α-helix were considered as inner. Amino acid pairings i, i+k (k=1, 2, 3, 4, 5), were analysed and the corresponding 20×20 matrices of relative global propensities were constructed. An analysis of (i, i+4, i+8) and (i, i+3, i+4) triplet patterns was also performed. These analysis yielded information on a series of amino acid patterns (pairings and triplets) showing either high or low preference for α-helical motifs and suggested a novel approach to protein alphabet reduction. In addition, it has been shown that the individual amino acid propensities are not enough to define the statistical distribution of these patterns. Global pair propensities also depend on the type of pattern, its composition and orientation in the protein sequence. The data presented should prove useful to obtain and refine useful predictive rules which can further the development and fine-tuning of protein structure prediction algorithms and tools.


Asunto(s)
Aminoácidos/química , Estructura Secundaria de Proteína , Proteínas/química , Algoritmos , Bases de Datos de Proteínas , Pliegue de Proteína
16.
PeerJ Comput Sci ; 7: e584, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34322589

RESUMEN

In recent years, machine learning (ML) researchers have changed their focus towards biological problems that are difficult to analyse with standard approaches. Large initiatives such as The Cancer Genome Atlas (TCGA) have allowed the use of omic data for the training of these algorithms. In order to study the state of the art, this review is provided to cover the main works that have used ML with TCGA data. Firstly, the principal discoveries made by the TCGA consortium are presented. Once these bases have been established, we begin with the main objective of this study, the identification and discussion of those works that have used the TCGA data for the training of different ML approaches. After a review of more than 100 different papers, it has been possible to make a classification according to following three pillars: the type of tumour, the type of algorithm and the predicted biological problem. One of the conclusions drawn in this work shows a high density of studies based on two major algorithms: Random Forest and Support Vector Machines. We also observe the rise in the use of deep artificial neural networks. It is worth emphasizing, the increase of integrative models of multi-omic data analysis. The different biological conditions are a consequence of molecular homeostasis, driven by both protein coding regions, regulatory elements and the surrounding environment. It is notable that a large number of works make use of genetic expression data, which has been found to be the preferred method by researchers when training the different models. The biological problems addressed have been classified into five types: prognosis prediction, tumour subtypes, microsatellite instability (MSI), immunological aspects and certain pathways of interest. A clear trend was detected in the prediction of these conditions according to the type of tumour. That is the reason for which a greater number of works have focused on the BRCA cohort, while specific works for survival, for example, were centred on the GBM cohort, due to its large number of events. Throughout this review, it will be possible to go in depth into the works and the methodologies used to study TCGA cancer data. Finally, it is intended that this work will serve as a basis for future research in this field of study.

17.
Stud Health Technol Inform ; 281: 327-331, 2021 May 27.
Artículo en Inglés | MEDLINE | ID: mdl-34042759

RESUMEN

The process of refining the research question in a medical study depends greatly on the current background of the investigated subject. The information found in prior works can directly impact several stages of the study, namely the cohort definition stage. Besides previous published methods, researchers could also leverage on other materials, such as the output of cohort selection tools, to enrich and to accelerate their own work. However, this kind of information is not always captured by search engines. In this paper, we present a methodology, based on a combination of content-based retrieval and text annotation techniques, to identify relevant scientific publications related to a research question and to the selected data sources.


Asunto(s)
Almacenamiento y Recuperación de la Información , Motor de Búsqueda , Estudios de Cohortes
18.
JMIR Med Inform ; 9(2): e22976, 2021 Feb 25.
Artículo en Inglés | MEDLINE | ID: mdl-33629960

RESUMEN

BACKGROUND: Currently, existing biomedical literature repositories do not commonly provide users with specific means to locate and remotely access biomedical databases. OBJECTIVE: To address this issue, we developed the Biomedical Database Inventory (BiDI), a repository linking to biomedical databases automatically extracted from the scientific literature. BiDI provides an index of data resources and a path to access them seamlessly. METHODS: We designed an ensemble of deep learning methods to extract database mentions. To train the system, we annotated a set of 1242 articles that included mentions of database publications. Such a data set was used along with transfer learning techniques to train an ensemble of deep learning natural language processing models targeted at database publication detection. RESULTS: The system obtained an F1 score of 0.929 on database detection, showing high precision and recall values. When applying this model to the PubMed and PubMed Central databases, we identified over 10,000 unique databases. The ensemble model also extracted the weblinks to the reported databases and discarded irrelevant links. For the extraction of weblinks, the model achieved a cross-validated F1 score of 0.908. We show two use cases: one related to "omics" and the other related to the COVID-19 pandemic. CONCLUSIONS: BiDI enables access to biomedical resources over the internet and facilitates data-driven research and other scientific initiatives. The repository is openly available online and will be regularly updated with an automatic text processing pipeline. The approach can be reused to create repositories of different types (ie, biomedical and others).

19.
Comput Struct Biotechnol J ; 19: 4538-4558, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34471498

RESUMEN

Drug discovery aims at finding new compounds with specific chemical properties for the treatment of diseases. In the last years, the approach used in this search presents an important component in computer science with the skyrocketing of machine learning techniques due to its democratization. With the objectives set by the Precision Medicine initiative and the new challenges generated, it is necessary to establish robust, standard and reproducible computational methodologies to achieve the objectives set. Currently, predictive models based on Machine Learning have gained great importance in the step prior to preclinical studies. This stage manages to drastically reduce costs and research times in the discovery of new drugs. This review article focuses on how these new methodologies are being used in recent years of research. Analyzing the state of the art in this field will give us an idea of where cheminformatics will be developed in the short term, the limitations it presents and the positive results it has achieved. This review will focus mainly on the methods used to model the molecular data, as well as the biological problems addressed and the Machine Learning algorithms used for drug discovery in recent years.

20.
J Proteome Res ; 9(2): 1182-90, 2010 Feb 05.
Artículo en Inglés | MEDLINE | ID: mdl-19947655

RESUMEN

Trypanosoma brucei causes African trypanosomiasis in humans (HAT or African sleeping sickness) and Nagana in cattle. The disease threatens over 60 million people and uncounted numbers of cattle in 36 countries of sub-Saharan Africa and has a devastating impact on human health and the economy. On the other hand, Trypanosoma cruzi is responsible in South America for Chagas disease, which can cause acute illness and death, especially in young children. In this context, the discovery of novel drug targets in Trypanosome proteome is a major focus for the scientific community. Recently, many researchers have spent important efforts on the study of protein-protein interactions (PPIs) in pathogen Trypanosome species concluding that the low sequence identities between some parasite proteins and their human host render these PPIs as highly promising drug targets. To the best of our knowledge, there are no general models to predict Unique PPIs in Trypanosome (TPPIs). On the other hand, the 3D structure of an increasing number of Trypanosome proteins is reported in databases. In this regard, the introduction of a new model to predict TPPIs from the 3D structure of proteins involved in PPI is very important. For this purpose, we introduced new protein-protein complex invariants based on the Markov average electrostatic potential xi(k)(R(i)) for amino acids located in different regions (R(i)) of i-th protein and placed at a distance k one from each other. We calculated more than 30 different types of parameters for 7866 pairs of proteins (1023 TPPIs and 6823 non-TPPIs) from more than 20 organisms, including parasites and human or cattle hosts. We found a very simple linear model that predicts above 90% of TPPIs and non-TPPIs both in training and independent test subsets using only two parameters. The parameters were (d)xi(k)(s) = |xi(k)(s(1)) - xi(k)(s(2))|, the absolute difference between the xi(k)(s(i)) values on the surface of the two proteins of the pairs. We also tested nonlinear ANN models for comparison purposes but the linear model gives the best results. We implemented this predictor in the web server named TrypanoPPI freely available to public at http://miaja.tic.udc.es/Bio-AIMS/TrypanoPPI.php. This is the first model that predicts how unique a protein-protein complex in Trypanosome proteome is with respect to other parasites and hosts, opening new opportunities for antitrypanosome drug target discovery.


Asunto(s)
Internet , Proteínas/química , Proteínas Protozoarias/química , Trypanosoma/química , Cadenas de Markov , Modelos Moleculares , Redes Neurales de la Computación , Unión Proteica , Electricidad Estática
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA