Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 3.561
Filtrar
Mais filtros

Intervalo de ano de publicação
1.
Cell ; 2024 Sep 19.
Artigo em Inglês | MEDLINE | ID: mdl-39326416

RESUMO

Interpretation of disease-causing genetic variants remains a challenge in human genetics. Current costs and complexity of deep mutational scanning methods are obstacles for achieving genome-wide resolution of variants in disease-related genes. Our framework, saturation mutagenesis-reinforced functional assays (SMuRF), offers simple and cost-effective saturation mutagenesis paired with streamlined functional assays to enhance the interpretation of unresolved variants. Applying SMuRF to neuromuscular disease genes FKRP and LARGE1, we generated functional scores for all possible coding single-nucleotide variants, which aid in resolving clinically reported variants of uncertain significance. SMuRF also demonstrates utility in predicting disease severity, resolving critical structural regions, and providing training datasets for the development of computational predictors. Overall, our approach enables variant-to-function insights for disease genes in a cost-effective manner that can be broadly implemented by standard research laboratories.

2.
Cell ; 172(3): 478-490.e15, 2018 01 25.
Artigo em Inglês | MEDLINE | ID: mdl-29373829

RESUMO

Understanding the sequence determinants that give rise to diversity among individuals and species is the central challenge of genetics. However, despite ever greater numbers of sequenced genomes, most genome-wide association studies cannot distinguish causal variants from linked passenger mutations spanning many genes. We report that this inherent challenge can be overcome in model organisms. By pushing the advantages of inbred crossing to its practical limit in Saccharomyces cerevisiae, we improved the statistical resolution of linkage analysis to single nucleotides. This "super-resolution" approach allowed us to map 370 causal variants across 26 quantitative traits. Missense, synonymous, and cis-regulatory mutations collectively gave rise to phenotypic diversity, providing mechanistic insight into the basis of evolutionary divergence. Our data also systematically unmasked complex genetic architectures, revealing that multiple closely linked driver mutations frequently act on the same quantitative trait. Single-nucleotide mapping thus complements traditional deletion and overexpression screening paradigms and opens new frontiers in quantitative genetics.


Assuntos
Ligação Genética , Mutação , Fenótipo , Polimorfismo Genético , Mapeamento Cromossômico/métodos , Estudo de Associação Genômica Ampla/métodos , Característica Quantitativa Herdável , Saccharomyces cerevisiae/genética
3.
Trends Genet ; 40(7): 587-600, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38658256

RESUMO

Population-scale sequencing efforts have catalogued substantial genetic variation in humans such that variant discovery dramatically outpaces interpretation. We discuss how single-cell sequencing is poised to reveal genetic mechanisms at a rate that may soon approach that of variant discovery. The functional genomics toolkit is sufficiently modular to systematically profile almost any type of variation within increasingly diverse contexts and with molecularly comprehensive and unbiased readouts. As a result, we can construct deep phenotypic atlases of variant effects that span the entire regulatory cascade. The same conceptual approach to interpreting genetic variation should be applied to engineering therapeutic cell states. In this way, variant mechanism discovery and cell state engineering will become reciprocating and iterative processes towards genomic medicine.


Assuntos
Variação Genética , Análise de Célula Única , Humanos , Análise de Célula Única/métodos , Genômica/métodos , Genoma Humano/genética , Fenótipo
4.
Am J Hum Genet ; 111(9): 2031-2043, 2024 Sep 05.
Artigo em Inglês | MEDLINE | ID: mdl-39173626

RESUMO

In silico variant effect predictions are available for nearly all missense variants but played a minimal role in clinical variant classification because they were deemed to provide only supporting evidence. Recently, the ClinGen Sequence Variant Interpretation (SVI) Working Group updated recommendations for variant effect prediction use. By analyzing control pathogenic and benign variants across all genes, they were able to compute evidence strength for predictor score intervals with some intervals generating moderate, strong, or even very strong evidence. However, this genome-wide approach could obscure heterogeneous predictor performance in different genes. We quantified the gene-by-gene performance of two top predictors, REVEL and BayesDel, by analyzing control variants in each predictor score interval in 3,668 disease-relevant genes. Approximately 10% of intervals had sufficient control variants for analysis, and ∼70% of these intervals exceeded the maximum number of incorrect predictions implied by the SVI recommendations. These trending discordant intervals arose owing to the divergence of the gene-specific distribution of predictions from the genome-wide distribution, suggesting that gene-specific calibration is needed in many cases. Approximately 22% of ClinVar missense variants of uncertain significance in genes we analyzed (REVEL = 100,629, BayesDel = 71,928) had predictions in trending discordant intervals. Thus, genome-wide calibrations could result in many variants receiving inappropriate evidence strength. To facilitate a review of the SVI's calibrations, we developed a web application enabling visualization of gene-specific predictions and trending concordant and discordant intervals.


Assuntos
Estudo de Associação Genômica Ampla , Humanos , Estudo de Associação Genômica Ampla/métodos , Genoma Humano , Mutação de Sentido Incorreto , Variação Genética , Calibragem , Software , Bases de Dados Genéticas
5.
Am J Hum Genet ; 111(2): 350-363, 2024 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-38237594

RESUMO

Our ability to determine the clinical impact of variants in 3' untranslated regions (UTRs) of genes remains poor. We provide a thorough analysis of 3' UTR variants from several datasets. Variants in putative regulatory elements, including RNA-binding protein motifs, eCLIP peaks, and microRNA sites, are up to 16 times more likely than variants not in these elements to have gene expression and phenotype associations. Variants in regulatory motifs result in allele-specific protein binding in cell lines and allele-specific gene expression differences in population studies. In addition, variants in shared regions of alternatively polyadenylated isoforms and those proximal to polyA sites are more likely to affect gene expression and phenotype. Finally, pathogenic 3' UTR variants in ClinVar are up to 20 times more likely than benign variants to fall in a regulatory site. We incorporated these findings into RegVar, a software tool that interprets regulatory elements and annotations for any 3' UTR variant and predicts whether the variant is likely to affect gene expression or phenotype. This tool will help prioritize variants for experimental studies and identify pathogenic variants in individuals.


Assuntos
MicroRNAs , Humanos , Regiões 3' não Traduzidas/genética , MicroRNAs/genética , Sequências Reguladoras de Ácido Nucleico/genética , Linhagem Celular , Ligação Proteica
6.
Am J Hum Genet ; 111(10): 2164-2175, 2024 Oct 03.
Artigo em Inglês | MEDLINE | ID: mdl-39226898

RESUMO

Variants that alter gene splicing are estimated to comprise up to a third of all disease-causing variants, yet they are hard to predict from DNA sequencing data alone. To overcome this, many groups are incorporating RNA-based analyses, which are resource intensive, particularly for diagnostic laboratories. There are thousands of functionally validated variants that induce mis-splicing; however, this information is not consolidated, and they are under-represented in ClinVar, which presents a barrier to variant interpretation and can result in duplication of validation efforts. To address this issue, we developed SpliceVarDB, an online database consolidating over 50,000 variants assayed for their effects on splicing in over 8,000 human genes. We evaluated over 500 published data sources and established a spliceogenicity scale to standardize, harmonize, and consolidate variant validation data generated by a range of experimental protocols. According to the strength of their supporting evidence, variants were classified as "splice-altering" (∼25%), "not splice-altering" (∼25%), and "low-frequency splice-altering" (∼50%), which correspond to weak or indeterminate evidence of spliceogenicity. Importantly, 55% of the splice-altering variants in SpliceVarDB are outside the canonical splice sites (5.6% are deep intronic). These variants can support the variant curation diagnostic pathway and can be used to provide the high-quality data necessary to develop more accurate in silico splicing predictors. The variants are accessible through an online platform, SpliceVarDB, with additional features for visualization, variant information, in silico predictions, and validation metrics. SpliceVarDB is a very large collection of splice-altering variants and is available at https://splicevardb.org.


Assuntos
Bases de Dados Genéticas , Splicing de RNA , Humanos , Splicing de RNA/genética , Variação Genética , Processamento Alternativo/genética , Software
7.
Am J Hum Genet ; 2024 Sep 17.
Artigo em Inglês | MEDLINE | ID: mdl-39317201

RESUMO

The ClinGen Hereditary Breast, Ovarian, and Pancreatic Cancer (HBOP) Variant Curation Expert Panel (VCEP) is composed of internationally recognized experts in clinical genetics, molecular biology, and variant interpretation. This VCEP made specifications for the American College of Medical Genetics and Association for Molecular Pathology (ACMG/AMP) guidelines for the ataxia telangiectasia mutated (ATM) gene according to the ClinGen protocol. These gene-specific rules for ATM were modified from the ACMG/AMP guidelines and were tested against 33 ATM variants of various types and classifications in a pilot curation phase. The pilot revealed a majority agreement between the HBOP VCEP classifications and the ClinVar-deposited classifications. Six pilot variants had conflicting interpretations in ClinVar, and re-evaluation with the VCEP's ATM-specific rules resulted in four that were classified as benign, one as likely pathogenic, and one as a variant of uncertain significance (VUS) by the VCEP, improving the certainty of interpretations in the public domain. Overall, 28 of the 33 pilot variants were not VUS, leading to an 85% classification rate. The ClinGen-approved, modified rules demonstrated value for improved interpretation of variants in ATM.

8.
Hum Mol Genet ; 33(5): 426-434, 2024 Feb 18.
Artigo em Inglês | MEDLINE | ID: mdl-37956408

RESUMO

BACKGROUND: Pathogenic germline variants in BRCA1-Associated Protein 1 (BAP1) cause BAP1 tumor predisposition syndrome (BAP1-TPDS). Carriers run especially a risk of uveal (UM) and cutaneous melanoma, malignant mesothelioma, and clear cell renal carcinoma. Approximately half of increasingly reported BAP1 variants lack accurate classification. Correct interpretation of pathogenicity can improve prognosis of the patients through tumor screening with better understanding of BAP1-TPDS. METHODS: We edited five rare BAP1 variants with differing functional characteristics identified from patients with UM in HAP1 cells using CRISPR-Cas9 and assayed their effect on cell adhesion/spreading (at 4 h) and proliferation (at 48 h), measured as cell index (CI), using xCELLigence real-time analysis system. RESULTS: In BAP1 knockout HAP1 cultures, cell number was half of wild type (WT) cultures at 48 h (p = 0.00021), reaching confluence later, and CI was 78% reduced (p < 0.0001). BAP1-TPDS-associated null variants c.67+1G>T and c.1780_1781insT, and a likely pathogenic missense variant c.281A>G reduced adhesion (all p ≤ 0.015) and proliferation by 74%-83% (all p ≤ 0.032). Another likely pathogenic missense variant c.680G>A reduced both by at least 50% (all p ≤ 0.032), whereas cells edited with likely benign one c.1526C>T grew similarly to WT. CONCLUSIONS: BAP1 is essential for optimal fitness of HAP1 cells. Pathogenic and likely pathogenic BAP1 variants reduced cell fitness, reflected in adhesion/spreading and proliferation properties. Further, moderate effects were quantifiable. Variant modelling in HAP1 with CRISPR-Cas9 enabled functional analysis of coding and non-coding region variants in an endogenous expression system.


Assuntos
Neoplasias Renais , Melanoma , Neoplasias Cutâneas , Neoplasias Uveais , Humanos , Melanoma/patologia , Virulência , Predisposição Genética para Doença , Mutação em Linhagem Germinativa/genética , Ubiquitina Tiolesterase/genética , Ubiquitina Tiolesterase/metabolismo , Proteínas Supressoras de Tumor/genética
9.
Trends Genet ; 39(6): 442-450, 2023 06.
Artigo em Inglês | MEDLINE | ID: mdl-36858880

RESUMO

Genomic studies of human disorders are often performed by distinct research communities (i.e., focused on rare diseases, common diseases, or cancer). Despite underlying differences in the mechanistic origin of different disease categories, these studies share the goal of identifying causal genomic events that are critical for the clinical manifestation of the disease phenotype. Moreover, these studies face common challenges, including understanding the complex genetic architecture of the disease, deciphering the impact of variants on multiple scales, and interpreting noncoding mutations. Here, we highlight these challenges in depth and argue that properly addressing them will require a more unified vocabulary and approach across disease communities. Toward this goal, we present a unified perspective on relating variant impact to various genomic disorders.


Assuntos
Genoma , Genômica , Humanos , Mutação , Fenótipo
10.
Annu Rev Genomics Hum Genet ; 24: 151-176, 2023 08 25.
Artigo em Inglês | MEDLINE | ID: mdl-37285546

RESUMO

DECIPHER (Database of Genomic Variation and Phenotype in Humans Using Ensembl Resources) shares candidate diagnostic variants and phenotypic data from patients with genetic disorders to facilitate research and improve the diagnosis, management, and therapy of rare diseases. The platform sits at the boundary between genomic research and the clinical community. DECIPHER aims to ensure that the most up-to-date data are made rapidly available within its interpretation interfaces to improve clinical care. Newly integrated cardiac case-control data that provide evidence of gene-disease associations and inform variant interpretation exemplify this mission. New research resources are presented in a format optimized for use by a broad range of professionals supporting the delivery of genomic medicine. The interfaces within DECIPHER integrate and contextualize variant and phenotypic data, helping to determine a robust clinico-molecular diagnosis for rare-disease patients, which combines both variant classification and clinical fit. DECIPHER supports discovery research, connecting individuals within the rare-disease community to pursue hypothesis-driven research.


Assuntos
Genômica , Genômica/métodos , Humanos , Doenças Raras/genética , Alelos , Guias de Prática Clínica como Assunto , Variações do Número de Cópias de DNA , Bases de Dados Genéticas
11.
Am J Hum Genet ; 110(9): 1496-1508, 2023 09 07.
Artigo em Inglês | MEDLINE | ID: mdl-37633279

RESUMO

Predicted loss of function (pLoF) variants are often highly deleterious and play an important role in disease biology, but many pLoF variants may not result in loss of function (LoF). Here we present a framework that advances interpretation of pLoF variants in research and clinical settings by considering three categories of LoF evasion: (1) predicted rescue by secondary sequence properties, (2) uncertain biological relevance, and (3) potential technical artifacts. We also provide recommendations on adjustments to ACMG/AMP guidelines' PVS1 criterion. Applying this framework to all high-confidence pLoF variants in 22 genes associated with autosomal-recessive disease from the Genome Aggregation Database (gnomAD v.2.1.1) revealed predicted LoF evasion or potential artifacts in 27.3% (304/1,113) of variants. The major reasons were location in the last exon, in a homopolymer repeat, in a low proportion expressed across transcripts (pext) scored region, or the presence of cryptic in-frame splice rescues. Variants predicted to evade LoF or to be potential artifacts were enriched for ClinVar benign variants. PVS1 was downgraded in 99.4% (162/163) of pLoF variants predicted as likely not LoF/not LoF, with 17.2% (28/163) downgraded as a result of our framework, adding to previous guidelines. Variant pathogenicity was affected (mostly from likely pathogenic to VUS) in 20 (71.4%) of these 28 variants. This framework guides assessment of pLoF variants beyond standard annotation pipelines and substantially reduces false positive rates, which is key to ensure accurate LoF variant prediction in both a research and clinical setting.


Assuntos
Padrões de Herança , Humanos , Éxons , Incerteza
12.
Am J Hum Genet ; 110(10): 1769-1786, 2023 10 05.
Artigo em Inglês | MEDLINE | ID: mdl-37729906

RESUMO

Defects in hydroxymethylbilane synthase (HMBS) can cause acute intermittent porphyria (AIP), an acute neurological disease. Although sequencing-based diagnosis can be definitive, ∼⅓ of clinical HMBS variants are missense variants, and most clinically reported HMBS missense variants are designated as "variants of uncertain significance" (VUSs). Using saturation mutagenesis, en masse selection, and sequencing, we applied a multiplexed validated assay to both the erythroid-specific and ubiquitous isoforms of HMBS, obtaining confident functional impact scores for >84% of all possible amino acid substitutions. The resulting variant effect maps generally agreed with biochemical expectations and provide further evidence that HMBS can function as a monomer. Additionally, the maps implicated specific residues as having roles in active site dynamics, which was further supported by molecular dynamics simulations. Most importantly, these maps can help discriminate pathogenic from benign HMBS variants, proactively providing evidence even for yet-to-be-observed clinical missense variants.


Assuntos
Hidroximetilbilano Sintase , Porfiria Aguda Intermitente , Humanos , Hidroximetilbilano Sintase/química , Hidroximetilbilano Sintase/genética , Hidroximetilbilano Sintase/metabolismo , Mutação de Sentido Incorreto/genética , Porfiria Aguda Intermitente/diagnóstico , Porfiria Aguda Intermitente/genética , Substituição de Aminoácidos , Simulação de Dinâmica Molecular
13.
Am J Hum Genet ; 110(6): 940-949, 2023 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-37236177

RESUMO

While pathogenic variants can significantly increase disease risk, it is still challenging to estimate the clinical impact of rare missense variants more generally. Even in genes such as BRCA2 or PALB2, large cohort studies find no significant association between breast cancer and rare missense variants collectively. Here, we introduce REGatta, a method to estimate clinical risk from variants in smaller segments of individual genes. We first define these regions by using the density of pathogenic diagnostic reports and then calculate the relative risk in each region by using over 200,000 exome sequences in the UK Biobank. We apply this method in 13 genes with established roles across several monogenic disorders. In genes with no significant difference at the gene level, this approach significantly separates disease risk for individuals with rare missense variants at higher or lower risk (BRCA2 regional model OR = 1.46 [1.12, 1.79], p = 0.0036 vs. BRCA2 gene model OR = 0.96 [0.85, 1.07] p = 0.4171). We find high concordance between these regional risk estimates and high-throughput functional assays of variant impact. We compare our method with existing methods and the use of protein domains (Pfam) as regions and find REGatta better identifies individuals at elevated or reduced risk. These regions provide useful priors and are potentially useful for improving risk assessment for genes associated with monogenic diseases.


Assuntos
Neoplasias da Mama , Predisposição Genética para Doença , Humanos , Feminino , Proteína BRCA2/genética , Mutação de Sentido Incorreto , Análise de Sequência de DNA , Neoplasias da Mama/genética , Neoplasias da Mama/patologia , Estudos de Coortes
14.
Am J Hum Genet ; 110(1): 92-104, 2023 01 05.
Artigo em Inglês | MEDLINE | ID: mdl-36563679

RESUMO

Variant interpretation remains a major challenge in medical genetics. We developed Meta-Domain HotSpot (MDHS) to identify mutational hotspots across homologous protein domains. We applied MDHS to a dataset of 45,221 de novo mutations (DNMs) from 31,058 individuals with neurodevelopmental disorders (NDDs) and identified three significantly enriched missense DNM hotspots in the ion transport protein domain family (PF00520). The 37 unique missense DNMs that drive enrichment affect 25 genes, 19 of which were previously associated with NDDs. 3D protein structure modeling supports the hypothesis of function-altering effects of these mutations. Hotspot genes have a unique expression pattern in tissue, and we used this pattern alongside in silico predictors and population constraint information to identify candidate NDD-associated genes. We also propose a lenient version of our method, which identifies 32 hotspot positions across 16 different protein domains. These positions are enriched for likely pathogenic variation in clinical databases and DNMs in other genetic disorders.


Assuntos
Transtornos do Neurodesenvolvimento , Humanos , Domínios Proteicos/genética , Mutação/genética , Transtornos do Neurodesenvolvimento/genética
15.
Am J Hum Genet ; 110(5): 863-879, 2023 05 04.
Artigo em Inglês | MEDLINE | ID: mdl-37146589

RESUMO

Deleterious mutations in the X-linked gene encoding ornithine transcarbamylase (OTC) cause the most common urea cycle disorder, OTC deficiency. This rare but highly actionable disease can present with severe neonatal onset in males or with later onset in either sex. Individuals with neonatal onset appear normal at birth but rapidly develop hyperammonemia, which can progress to cerebral edema, coma, and death, outcomes ameliorated by rapid diagnosis and treatment. Here, we develop a high-throughput functional assay for human OTC and individually measure the impact of 1,570 variants, 84% of all SNV-accessible missense mutations. Comparison to existing clinical significance calls, demonstrated that our assay distinguishes known benign from pathogenic variants and variants with neonatal onset from late-onset disease presentation. This functional stratification allowed us to identify score ranges corresponding to clinically relevant levels of impairment of OTC activity. Examining the results of our assay in the context of protein structure further allowed us to identify a 13 amino acid domain, the SMG loop, whose function appears to be required in human cells but not in yeast. Finally, inclusion of our data as PS3 evidence under the current ACMG guidelines, in a pilot reclassification of 34 variants with complete loss of activity, would change the classification of 22 from variants of unknown significance to clinically actionable likely pathogenic variants. These results illustrate how large-scale functional assays are especially powerful when applied to rare genetic diseases.


Assuntos
Hiperamonemia , Doença da Deficiência de Ornitina Carbomoiltransferase , Ornitina Carbamoiltransferase , Humanos , Substituição de Aminoácidos , Hiperamonemia/etiologia , Hiperamonemia/genética , Mutação de Sentido Incorreto/genética , Ornitina Carbamoiltransferase/genética , Doença da Deficiência de Ornitina Carbomoiltransferase/genética , Doença da Deficiência de Ornitina Carbomoiltransferase/diagnóstico , Doença da Deficiência de Ornitina Carbomoiltransferase/terapia
16.
Brief Bioinform ; 25(5)2024 Jul 25.
Artigo em Inglês | MEDLINE | ID: mdl-39234953

RESUMO

The internal ribosome entry site (IRES) is a cis-regulatory element that can initiate translation in a cap-independent manner. It is often related to cellular processes and many diseases. Thus, identifying the IRES is important for understanding its mechanism and finding potential therapeutic strategies for relevant diseases since identifying IRES elements by experimental method is time-consuming and laborious. Many bioinformatics tools have been developed to predict IRES, but all these tools are based on structure similarity or machine learning algorithms. Here, we introduced a deep learning model named DeepIRES for precisely identifying IRES elements in messenger RNA (mRNA) sequences. DeepIRES is a hybrid model incorporating dilated 1D convolutional neural network blocks, bidirectional gated recurrent units, and self-attention module. Tenfold cross-validation results suggest that DeepIRES can capture deeper relationships between sequence features and prediction results than other baseline models. Further comparison on independent test sets illustrates that DeepIRES has superior and robust prediction capability than other existing methods. Moreover, DeepIRES achieves high accuracy in predicting experimental validated IRESs that are collected in recent studies. With the application of a deep learning interpretable analysis, we discover some potential consensus motifs that are related to IRES activities. In summary, DeepIRES is a reliable tool for IRES prediction and gives insights into the mechanism of IRES elements.


Assuntos
Aprendizado Profundo , Sítios Internos de Entrada Ribossomal , RNA Mensageiro , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Biologia Computacional/métodos , RNA Viral/genética , RNA Viral/metabolismo , Humanos , Redes Neurais de Computação , Algoritmos
17.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38388680

RESUMO

CRISPR Cas-9 is a groundbreaking genome-editing tool that harnesses bacterial defense systems to alter DNA sequences accurately. This innovative technology holds vast promise in multiple domains like biotechnology, agriculture and medicine. However, such power does not come without its own peril, and one such issue is the potential for unintended modifications (Off-Target), which highlights the need for accurate prediction and mitigation strategies. Though previous studies have demonstrated improvement in Off-Target prediction capability with the application of deep learning, they often struggle with the precision-recall trade-off, limiting their effectiveness and do not provide proper interpretation of the complex decision-making process of their models. To address these limitations, we have thoroughly explored deep learning networks, particularly the recurrent neural network based models, leveraging their established success in handling sequence data. Furthermore, we have employed genetic algorithm for hyperparameter tuning to optimize these models' performance. The results from our experiments demonstrate significant performance improvement compared with the current state-of-the-art in Off-Target prediction, highlighting the efficacy of our approach. Furthermore, leveraging the power of the integrated gradient method, we make an effort to interpret our models resulting in a detailed analysis and understanding of the underlying factors that contribute to Off-Target predictions, in particular the presence of two sub-regions in the seed region of single guide RNA which extends the established biological hypothesis of Off-Target effects. To the best of our knowledge, our model can be considered as the first model combining high efficacy, interpretability and a desirable balance between precision and recall.


Assuntos
Sistemas CRISPR-Cas , Aprendizado Profundo , Edição de Genes/métodos , RNA Guia de Sistemas CRISPR-Cas , Redes Neurais de Computação
18.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38279650

RESUMO

As the application of large language models (LLMs) has broadened into the realm of biological predictions, leveraging their capacity for self-supervised learning to create feature representations of amino acid sequences, these models have set a new benchmark in tackling downstream challenges, such as subcellular localization. However, previous studies have primarily focused on either the structural design of models or differing strategies for fine-tuning, largely overlooking investigations into the nature of the features derived from LLMs. In this research, we propose different ESM2 representation extraction strategies, considering both the character type and position within the ESM2 input sequence. Using model dimensionality reduction, predictive analysis and interpretability techniques, we have illuminated potential associations between diverse feature types and specific subcellular localizations. Particularly, the prediction of Mitochondrion and Golgi apparatus prefer segments feature closer to the N-terminal, and phosphorylation site-based features could mirror phosphorylation properties. We also evaluate the prediction performance and interpretability robustness of Random Forest and Deep Neural Networks with varied feature inputs. This work offers novel insights into maximizing LLMs' utility, understanding their mechanisms, and extracting biological domain knowledge. Furthermore, we have made the code, feature extraction API, and all relevant materials available at https://github.com/yujuan-zhang/feature-representation-for-LLMs.


Assuntos
Biologia Computacional , Redes Neurais de Computação , Biologia Computacional/métodos , Sequência de Aminoácidos , Transporte Proteico
19.
Bioessays ; 46(9): e2400026, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-38991978

RESUMO

Receptor tyrosine kinases exhibit ligand-induced activity and uptake into cells via endocytosis. In the case of epidermal growth factor (EGF) receptor (EGFR), the resulting endosomes are trafficked to the perinuclear region, where dephosphorylation of receptors occurs, which are subsequently directed to degradation. Traveling endosomes bearing phosphorylated EGFRs are subjected to the activity of cytoplasmic phosphatases as well as interactions with the endoplasmic reticulum (ER). The peri-nuclear region harbors ER-embedded phosphatases, a component of the EGFR-bearing endosome-ER contact site. The ER is also emerging as a central player in spatiotemporal control of endosomal motility, positioning, tubulation, and fission. Past studies strongly suggest that the physical interaction between the ER and endosomes forms a reaction "unit" for EGFR dephosphorylation. Independently, endosomes have been implicated to enable quantization of EGFR signals by modulation of the phosphorylation levels. Here, we review the distinct mechanisms by which endosomes form the logistical means for signal quantization and speculate on the role of the ER.


Assuntos
Retículo Endoplasmático , Endossomos , Receptores ErbB , Transdução de Sinais , Animais , Humanos , Endocitose , Retículo Endoplasmático/metabolismo , Endossomos/metabolismo , Receptores ErbB/metabolismo , Fosforilação
20.
Proc Natl Acad Sci U S A ; 120(15): e2216698120, 2023 04 11.
Artigo em Inglês | MEDLINE | ID: mdl-37023129

RESUMO

Discovering DNA regulatory sequence motifs and their relative positions is vital to understanding the mechanisms of gene expression regulation. Although deep convolutional neural networks (CNNs) have achieved great success in predicting cis-regulatory elements, the discovery of motifs and their combinatorial patterns from these CNN models has remained difficult. We show that the main difficulty is due to the problem of multifaceted neurons which respond to multiple types of sequence patterns. Since existing interpretation methods were mainly designed to visualize the class of sequences that can activate the neuron, the resulting visualization will correspond to a mixture of patterns. Such a mixture is usually difficult to interpret without resolving the mixed patterns. We propose the NeuronMotif algorithm to interpret such neurons. Given any convolutional neuron (CN) in the network, NeuronMotif first generates a large sample of sequences capable of activating the CN, which typically consists of a mixture of patterns. Then, the sequences are "demixed" in a layer-wise manner by backward clustering of the feature maps of the involved convolutional layers. NeuronMotif can output the sequence motifs, and the syntax rules governing their combinations are depicted by position weight matrices organized in tree structures. Compared to existing methods, the motifs found by NeuronMotif have more matches to known motifs in the JASPAR database. The higher-order patterns uncovered for deep CNs are supported by the literature and ATAC-seq footprinting. Overall, NeuronMotif enables the deciphering of cis-regulatory codes from deep CNs and enhances the utility of CNN in genome interpretation.


Assuntos
Algoritmos , Redes Neurais de Computação , Motivos de Nucleotídeos/genética , Sequências Reguladoras de Ácido Nucleico/genética , Bases de Dados Factuais
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA