Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Protein Sci ; 33(9): e5140, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-39145441

RESUMO

Proteins, fundamental to cellular activities, reveal their function and evolution through their structure and sequence. CATH functional families (FunFams) are coherent clusters of protein domain sequences in which the function is conserved across their members. The increasing volume and complexity of protein data enabled by large-scale repositories like MGnify or AlphaFold Database requires more powerful approaches that can scale to the size of these new resources. In this work, we introduce MARC and FRAN, two algorithms developed to build upon and address limitations of GeMMA/FunFHMMER, our original methods developed to classify proteins with related functions using a hierarchical approach. We also present CATH-eMMA, which uses embeddings or Foldseek distances to form relationship trees from distance matrices, reducing computational demands and handling various data types effectively. CATH-eMMA offers a highly robust and much faster tool for clustering protein functions on a large scale, providing a new tool for future studies in protein function and evolution.


Assuntos
Algoritmos , Bases de Dados de Proteínas , Proteínas , Proteínas/química , Proteínas/metabolismo , Análise por Conglomerados , Biologia Computacional/métodos , Domínios Proteicos
2.
Neural Netw ; 165: 463-471, 2023 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-37336031

RESUMO

Universal domain adaptation (UniDA) is an unsupervised domain adaptation that selectively transfers the knowledge between different domains containing different label sets. However, the existing methods do not predict the common labels of different domains and manually set a threshold to discriminate private samples, so they rely on the target domain to finely select the threshold and ignore the problem of negative transfer. In this paper, to address the above problems, we propose a novel classification model named Prediction of Common Labels (PCL) for UniDA, in which the common labels are predicted by Category Separation via Clustering (CSC). It is noted that we devise a new evaluation metric called category separation accuracy to measure the performance of category separation. To weaken negative transfer, we select source samples by the predicted common labels to fine-tune model for better domain alignment. In the test process, the target samples are discriminated by the predicted common labels and the results of clustering. Experimental results on three widely used benchmark datasets indicate the effectiveness of the proposed method.


Assuntos
Benchmarking , Conhecimento , Análise por Conglomerados
3.
Proc Natl Acad Sci U S A ; 120(12): e2214069120, 2023 03 21.
Artigo em Inglês | MEDLINE | ID: mdl-36917664

RESUMO

Recent advances in protein structure prediction have generated accurate structures of previously uncharacterized human proteins. Identifying domains in these predicted structures and classifying them into an evolutionary hierarchy can reveal biological insights. Here, we describe the detection and classification of domains from the human proteome. Our classification indicates that only 62% of residues are located in globular domains. We further classify these globular domains and observe that the majority (65%) can be classified among known folds by sequence, with a smaller fraction (33%) requiring structural data to refine the domain boundaries and/or to support their homology. A relatively small number (966 domains) cannot be confidently assigned using our automatic pipelines, thus demanding manual inspection. We classify 47,576 domains, of which only 23% have been included in experimental structures. A portion (6.3%) of these classified globular domains lack sequence-based annotation in InterPro. A quarter (23%) have not been structurally modeled by homology, and they contain 2,540 known disease-causing single amino acid variations whose pathogenesis can now be inferred using AF models. A comparison of classified domains from a series of model organisms revealed expansions of several immune response-related domains in humans and a depletion of olfactory receptors. Finally, we use this classification to expand well-known protein families of biological significance. These classifications are presented on the ECOD website (http://prodata.swmed.edu/ecod/index_human.php).


Assuntos
Aminoácidos , Proteoma , Humanos , Proteoma/genética , Alinhamento de Sequência , Bases de Dados de Proteínas
4.
ISPRS J Photogramm Remote Sens ; 195: 192-203, 2023 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-36726963

RESUMO

Remote sensing (RS) image scene classification has obtained increasing attention for its broad application prospects. Conventional fully-supervised approaches usually require a large amount of manually-labeled data. As more and more RS images becoming available, how to make full use of these unlabeled data is becoming an urgent topic. Semi-supervised learning, which uses a few labeled data to guide the self-training of numerous unlabeled data, is an intuitive strategy. However, it is hard to apply it to cross-dataset (i.e., cross-domain) scene classification due to the significant domain shift among different datasets. To this end, semi-supervised domain adaptation (SSDA), which can reduce the domain shift and further transfer knowledge from a fully-labeled RS scene dataset (source domain) to a limited-labeled RS scene dataset (target domain), would be a feasible solution. In this paper, we propose an SSDA method termed bidirectional sample-class alignment (BSCA) for RS cross-domain scene classification. BSCA consists of two alignment strategies, unsupervised alignment (UA) and supervised alignment (SA), both of which can contribute to decreasing domain shift. UA concentrates on reducing the distance of maximum mean discrepancy across domains, with no demand for class labels. In contrast, SA aims to achieve the distribution alignment both from source samples to the associate target class centers and from target samples to the associate source class centers, with awareness of their classes. To validate the effectiveness of the proposed method, extensive ablation, comparison, and visualization experiments are conducted on an RS-SSDA benchmark built upon four widely-used RS scene classification datasets. Experimental results indicate that in comparison with some state-of-the-art methods, our BSCA achieves the superior cross-domain classification performance with compact feature representation and low-entropy classification boundary. Our code will be available at https://github.com/hw2hwei/BSCA.

5.
Protein Sci ; 32(2): e4548, 2023 02.
Artigo em Inglês | MEDLINE | ID: mdl-36539305

RESUMO

The recent breakthroughs in structure prediction, where methods such as AlphaFold demonstrated near-atomic accuracy, herald a paradigm shift in structural biology. The 200 million high-accuracy models released in the AlphaFold Database are expected to guide protein science in the coming decades. Partitioning these AlphaFold models into domains and assigning them to an evolutionary hierarchy provide an efficient way to gain functional insights into proteins. However, classifying such a large number of predicted structures challenges the infrastructure of current structure classifications, including our Evolutionary Classification of protein Domains (ECOD). Better computational tools are urgently needed to parse and classify domains from AlphaFold models automatically. Here we present a Domain Parser for AlphaFold Models (DPAM) that can automatically recognize globular domains from these models based on inter-residue distances in 3D structures, predicted aligned errors, and ECOD domains found by sequence (HHsuite) and structural (Dali) similarity searches. Based on a benchmark of 18,759 AlphaFold models, we demonstrate that DPAM can recognize 98.8% of domains and assign correct boundaries for 87.5%, significantly outperforming structure-based domain parsers and homology-based domain assignment using ECOD domains found by HHsuite or Dali. Application of DPAM to the massive AlphaFold models will enable efficient classification of domains, providing evolutionary contexts and facilitating functional studies.


Assuntos
Proteínas , Software , Bases de Dados de Proteínas , Proteínas/química , Domínios Proteicos , Evolução Molecular
6.
Curr Protoc Bioinformatics ; 61(1): e45, 2018 03.
Artigo em Inglês | MEDLINE | ID: mdl-30040199

RESUMO

ECOD is a database of evolutionary domains from structures deposited in the PDB. Domains in ECOD are classified by a mixed manual/automatic method wherein the bulk of newly deposited structures are classified automatically by protein-protein BLAST. Those structures that cannot be classified automatically are referred to manual curators who use a combination of alignment results, functional analysis, and close reading of the literature to generate novel assignments. ECOD differs from other structural domain resources in that it is continually updated, classifying thousands of proteins per week. ECOD recognizes homology as its key organizing concept, rather than structural or sequence similarity alone. Such a classification scheme provides functional information about proteins of interest by placing them in the correct evolutionary context among all proteins of known structure. This unit demonstrates how to access ECOD via the Web and how to search the database by sequence or structure. It also details the distributable data files available for large-scale bioinformatics users. © 2018 by John Wiley & Sons, Inc.


Assuntos
Biologia Computacional/métodos , Bases de Dados de Proteínas , Domínios Proteicos , Proteínas/química , Homologia de Sequência de Aminoácidos , Homologia Estrutural de Proteína , Sequência de Aminoácidos , Alinhamento de Sequência
7.
J Affect Disord ; 190: 867-879, 2016 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-26071797

RESUMO

Psychopathology as the investigation and classification of experience, behavior and symptoms in psychiatric patients is an old discipline that ranges back to the end of the 19th century. Since then different approaches to psychopathology have been suggested. Recent investigations showing abnormalities in the brain on different levels raise the question how the gap between brain and psyche, between neural abnormalities and alteration in experience and behavior can be bridged. Historical approaches like descriptive (Jaspers) and structural (Minkoswki) psychopathology as well as the more current phenomenological psychopathology (Paarnas, Fuchs, Sass, Stanghellini) remain on the side of the psyche giving detailed description of the phenomenal level of experience while leaving open the link to the brain. In contrast, the recently introduced Research Domain Classification (RDoC) aims at explicitly linking brain and psyche by starting from so-called 'neuro-behavioral constructs'. How does Spatiotemporal Psychopathology, as demonstrated in the first paper on depression, stand in relation to these approaches? In a nutshell, Spatiotemporal Psychopathology aims to bridge the gap between brain and psyche. Specifically, as demonstrated in depression in the first paper, the focus is on the spatiotemporal features of the brain's intrinsic activity and how they are transformed into corresponding spatiotemporal features in experience on the phenomenal level and behavioral changes, which can well account for the symptoms in these patients. This second paper focuses on some of the theoretical background assumptions in Spatiotemporal Psychopathology by directly comparing it to descriptive, structural, and phenomenological psychopathology as well as to RDoC.


Assuntos
Encéfalo/fisiopatologia , Depressão/fisiopatologia , Depressão/psicologia , Descanso/fisiologia , Humanos , Psicopatologia , Análise Espaço-Temporal
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA