On knowing a gene: A distributional hypothesis of gene function.

Kwon, Jason J; Pan, Joshua; Gonzalez, Guadalupe; Hahn, William C; Zitnik, Marinka

Kwon, Jason J; Pan, Joshua; Gonzalez, Guadalupe; Hahn, William C; Zitnik, Marinka.

Afiliação

Kwon JJ; Dana-Farber Cancer Institute and Harvard Medical School, Department of Medical Oncology, Boston, MA 02215, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
Pan J; Dana-Farber Cancer Institute and Harvard Medical School, Department of Medical Oncology, Boston, MA 02215, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
Gonzalez G; Department of Computing, Faculty of Engineering, Imperial College, London SW7 2AZ, UK.
Hahn WC; Dana-Farber Cancer Institute and Harvard Medical School, Department of Medical Oncology, Boston, MA 02215, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA. Electronic address: william_hahn@dfci.harvard.edu.
Zitnik M; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Harvard Medical School, Department of Biomedical Informatics, Boston, MA 02115, USA; Harvard Data Science Initiative, Harvard University, Cambridge, MA 02138, USA; Kempner Institute for the Study of Natural and Artificial Intelligence, Ha

Cell Syst ; 15(6): 488-496, 2024 Jun 19.

Article em En | MEDLINE | ID: mdl-38810640

ABSTRACT

ABSTRACT

As words can have multiple meanings that depend on sentence context, genes can have various functions that depend on the surrounding biological system. This pleiotropic nature of gene function is limited by ontologies, which annotate gene functions without considering biological contexts. We contend that the gene function problem in genetics may be informed by recent technological leaps in natural language processing, in which representations of word semantics can be automatically learned from diverse language contexts. In contrast to efforts to model semantics as "is-a" relationships in the 1990s, modern distributional semantics represents words as vectors in a learned semantic space and fuels current advances in transformer-based models such as large language models and generative pre-trained transformers. A similar shift in thinking of gene functions as distributions over cellular contexts may enable a similar breakthrough in data-driven learning from large biological datasets to inform gene function.

Assuntos

Processamento de Linguagem Natural; Semântica; Humanos; Genes/genética; Ontologia Genética; Biologia Computacional/métodos; Animais

Palavras-chave

artificial intelligence; distributed representations; gene function; large language models; lexical semantics; machine learning; transformers; word embeddings

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Semântica / Processamento de Linguagem Natural Limite: Animals / Humans Idioma: En Revista: Cell Syst Ano de publicação: 2024 Tipo de documento: Article País de afiliação: Estados Unidos

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google