Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
1.
PLoS Biol ; 20(12): e3001901, 2022 12.
Artigo em Inglês | MEDLINE | ID: mdl-36508416

RESUMO

Does reductionism, in the era of machine learning and now interpretable AI, facilitate or hinder scientific insight? The protein ribbon diagram, as a means of visual reductionism, is a case in point.


Assuntos
Aprendizado de Máquina , Sinapses
2.
BMC Bioinformatics ; 25(1): 11, 2024 Jan 04.
Artigo em Inglês | MEDLINE | ID: mdl-38177985

RESUMO

BACKGROUND: Machine learning (ML) has a rich history in structural bioinformatics, and modern approaches, such as deep learning, are revolutionizing our knowledge of the subtle relationships between biomolecular sequence, structure, function, dynamics and evolution. As with any advance that rests upon statistical learning approaches, the recent progress in biomolecular sciences is enabled by the availability of vast volumes of sufficiently-variable data. To be useful, such data must be well-structured, machine-readable, intelligible and manipulable. These and related requirements pose challenges that become especially acute at the computational scales typical in ML. Furthermore, in structural bioinformatics such data generally relate to protein three-dimensional (3D) structures, which are inherently more complex than sequence-based data. A significant and recurring challenge concerns the creation of large, high-quality, openly-accessible datasets that can be used for specific training and benchmarking tasks in ML pipelines for predictive modeling projects, along with reproducible splits for training and testing. RESULTS: Here, we report 'Prop3D', a platform that allows for the creation, sharing and extensible reuse of libraries of protein domains, featurized with biophysical and evolutionary properties that can range from detailed, atomically-resolved physicochemical quantities (e.g., electrostatics) to coarser, residue-level features (e.g., phylogenetic conservation). As a community resource, we also supply a 'Prop3D-20sf' protein dataset, obtained by applying our approach to CATH . We have developed and deployed the Prop3D framework, both in the cloud and on local HPC resources, to systematically and reproducibly create comprehensive datasets via the Highly Scalable Data Service ( HSDS ). Our datasets are freely accessible via a public HSDS instance, or they can be used with accompanying Python wrappers for popular ML frameworks. CONCLUSION: Prop3D and its associated Prop3D-20sf dataset can be of broad utility in at least three ways. Firstly, the Prop3D workflow code can be customized and deployed on various cloud-based compute platforms, with scalability achieved largely by saving the results to distributed HDF5 files via HSDS . Secondly, the linked Prop3D-20sf dataset provides a hand-crafted, already-featurized dataset of protein domains for 20 highly-populated CATH families; importantly, provision of this pre-computed resource can aid the more efficient development (and reproducible deployment) of ML pipelines. Thirdly, Prop3D-20sf's construction explicitly takes into account (in creating datasets and data-splits) the enigma of 'data leakage', stemming from the evolutionary relationships between proteins.


Assuntos
Biologia Computacional , Proteínas , Humanos , Filogenia , Biologia Computacional/métodos , Fluxo de Trabalho , Aprendizado de Máquina
3.
Annu Rev Pharmacol Toxicol ; 57: 245-262, 2017 01 06.
Artigo em Inglês | MEDLINE | ID: mdl-27814027

RESUMO

Systems pharmacology aims to holistically understand mechanisms of drug actions to support drug discovery and clinical practice. Systems pharmacology modeling (SPM) is data driven. It integrates an exponentially growing amount of data at multiple scales (genetic, molecular, cellular, organismal, and environmental). The goal of SPM is to develop mechanistic or predictive multiscale models that are interpretable and actionable. The current explosions in genomics and other omics data, as well as the tremendous advances in big data technologies, have already enabled biologists to generate novel hypotheses and gain new knowledge through computational models of genome-wide, heterogeneous, and dynamic data sets. More work is needed to interpret and predict a drug response phenotype, which is dependent on many known and unknown factors. To gain a comprehensive understanding of drug actions, SPM requires close collaborations between domain experts from diverse fields and integration of heterogeneous models from biophysics, mathematics, statistics, machine learning, and semantic webs. This creates challenges in model management, model integration, model translation, and knowledge integration. In this review, we discuss several emergent issues in SPM and potential solutions using big data technology and analytics. The concurrent development of high-throughput techniques, cloud computing, data science, and the semantic web will likely allow SPM to be findable, accessible, interoperable, reusable, reliable, interpretable, and actionable.


Assuntos
Interpretação Estatística de Dados , Bases de Dados Factuais/estatística & dados numéricos , Farmacologia Clínica/métodos , Biologia de Sistemas/métodos , Animais , Ensaios de Triagem em Larga Escala/métodos , Ensaios de Triagem em Larga Escala/tendências , Humanos , Farmacologia Clínica/tendências , Biologia de Sistemas/tendências
4.
Bioinformatics ; 35(9): 1582-1584, 2019 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-30304492

RESUMO

SUMMARY: Coevolutionary sequence analysis has become a commonly used technique for de novo prediction of the structure and function of proteins, RNA, and protein complexes. We present the EVcouplings framework, a fully integrated open-source application and Python package for coevolutionary analysis. The framework enables generation of sequence alignments, calculation and evaluation of evolutionary couplings (ECs), and de novo prediction of structure and mutation effects. The combination of an easy to use, flexible command line interface and an underlying modular Python package makes the full power of coevolutionary analyses available to entry-level and advanced users. AVAILABILITY AND IMPLEMENTATION: https://github.com/debbiemarkslab/evcouplings.


Assuntos
Análise de Sequência , Software , Proteínas , RNA , Alinhamento de Sequência
5.
F1000Res ; 82019.
Artigo em Inglês | MEDLINE | ID: mdl-30647915

RESUMO

The Student Council of the International Society for Computational Biology (ISCB-SC) is a student-focused organization for researchers from all early career levels of training (undergraduates, masters, PhDs and postdocs) that organizes bioinformatics and computational biology activities across the globe. Among its activities, the ISCB-SC organizes several symposia in different continents, many times, with the help of the Regional Student Groups (RSGs) that are based on each region. In this editorial we highlight various key moments and learned lessons from the 14th Student Council Symposium (SCS, Chicago, USA), the 5th European Student Council Symposium (ESCS, Athens, Greece) and the 3rd Latin American Student Council Symposium (LA-SCS, Viña del Mar, Chile).


Assuntos
Biologia Computacional , Liderança , Estudantes , Chile , Humanos , Pesquisadores
6.
Curr Opin Struct Biol ; 52: 95-102, 2018 10.
Artigo em Inglês | MEDLINE | ID: mdl-30267935

RESUMO

Data science has emerged from the proliferation of digital data, coupled with advances in algorithms, software and hardware (e.g., GPU computing). Innovations in structural biology have been driven by similar factors, spurring us to ask: can these two fields impact one another in deep and hitherto unforeseen ways? We posit that the answer is yes. New biological knowledge lies in the relationships between sequence, structure, function and disease, all of which play out on the stage of evolution, and data science enables us to elucidate these relationships at scale. Here, we consider the above question from the five key pillars of data science: acquisition, engineering, analytics, visualization and policy, with an emphasis on machine learning as the premier analytics approach.


Assuntos
Biologia/métodos , Biologia Computacional/métodos , Estrutura Molecular , Software , Algoritmos , Ciência de Dados , Humanos , Aprendizado de Máquina , Reprodutibilidade dos Testes
7.
Acta Crystallogr D Struct Biol ; 73(Pt 2): 123-130, 2017 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-28177308

RESUMO

Chemical restraints for use in macromolecular structure refinement are produced by a variety of methods, including a number of programs that use chemical information to generate the required bond, angle, dihedral, chiral and planar restraints. These programs help to automate the process and therefore minimize the errors that could otherwise occur if it were performed manually. Furthermore, restraint-dictionary generation programs can incorporate chemical and other prior knowledge to provide reasonable choices of types and values. However, the use of restraints to define the geometry of a molecule is an approximation introduced with efficiency in mind. The representation of a bond as a parabolic function is a convenience and does not reflect the true variability in even the simplest of molecules. Another complicating factor is the interplay of the molecule with other parts of the macromolecular model. Finally, difficult situations arise from molecules with rare or unusual moieties that may not have their conformational space fully explored. These factors give rise to the need for an interactive editor for WYSIWYG interactions with the restraints and molecule. Restraints Editor, Especially Ligands (REEL) is a graphical user interface for simple and error-free editing along with additional features to provide greater control of the restraint dictionaries in macromolecular refinement.


Assuntos
Cristalografia/métodos , Software , Configuração de Carboidratos , Bases de Dados de Proteínas , Modelos Moleculares , Polissacarídeos/química , Conformação Proteica , Proteínas/química
8.
Artigo em Inglês | MEDLINE | ID: mdl-26989147

RESUMO

Compaction of DNA into chromatin is a characteristic feature of eukaryotic organisms. The core (H2A, H2B, H3, H4) and linker (H1) histone proteins are responsible for this compaction through the formation of nucleosomes and higher order chromatin aggregates. Moreover, histones are intricately involved in chromatin functioning and provide a means for genome dynamic regulation through specific histone variants and histone post-translational modifications. 'HistoneDB 2.0--with variants' is a comprehensive database of histone protein sequences, classified by histone types and variants. All entries in the database are supplemented by rich sequence and structural annotations with many interactive tools to explore and compare sequences of different variants from various organisms. The core of the database is a manually curated set of histone sequences grouped into 30 different variant subsets with variant-specific annotations. The curated set is supplemented by an automatically extracted set of histone sequences from the non-redundant protein database using algorithms trained on the curated set. The interactive web site supports various searching strategies in both datasets: browsing of phylogenetic trees; on-demand generation of multiple sequence alignments with feature annotations; classification of histone-like sequences and browsing of the taxonomic diversity for every histone variant. HistoneDB 2.0 is a resource for the interactive comparative analysis of histone protein sequences and their implications for chromatin function. Database URL: http://www.ncbi.nlm.nih.gov/projects/HistoneDB2.0.


Assuntos
Bases de Dados de Proteínas , Variação Genética , Histonas/genética , Sequência de Aminoácidos , Animais , Mineração de Dados , Histonas/química , Humanos , Internet , Anotação de Sequência Molecular , Dados de Sequência Molecular , Nucleossomos/química , Alinhamento de Sequência
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA