Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 15 de 15
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Bioinformatics ; 29(11): 1382-9, 2013 Jun 01.
Artículo en Inglés | MEDLINE | ID: mdl-23559640

RESUMEN

MOTIVATION: Genomic studies have revealed a substantial heritable component of the transcriptional state of the cell. To fully understand the genetic regulation of gene expression variability, it is important to study the effect of genotype in the context of external factors such as alternative environmental conditions. In model systems, explicit environmental perturbations have been considered for this purpose, allowing to directly test for environment-specific genetic effects. However, such experiments are limited to species that can be profiled in controlled environments, hampering their use in important systems such as human. Moreover, even in seemingly tightly regulated experimental conditions, subtle environmental perturbations cannot be ruled out, and hence unknown environmental influences are frequent. Here, we propose a model-based approach to simultaneously infer unmeasured environmental factors from gene expression profiles and use them in genetic analyses, identifying environment-specific associations between polymorphic loci and individual gene expression traits. RESULTS: In extensive simulation studies, we show that our method is able to accurately reconstruct environmental factors and their interactions with genotype in a variety of settings. We further illustrate the use of our model in a real-world dataset in which one environmental factor has been explicitly experimentally controlled. Our method is able to accurately reconstruct the true underlying environmental factor even if it is not given as an input, allowing to detect genuine genotype-environment interactions. In addition to the known environmental factor, we find unmeasured factors involved in novel genotype-environment interactions. Our results suggest that interactions with both known and unknown environmental factors significantly contribute to gene expression variability. AVAILABILITY: and implementation: Software available at http://pmbio.github.io/envGPLVM/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Perfilación de la Expresión Génica , Regulación de la Expresión Génica , Interacción Gen-Ambiente , Regulación Fúngica de la Expresión Génica , Genotipo , Humanos , Modelos Lineales , Modelos Genéticos , Sitios de Carácter Cuantitativo
2.
Brain ; 136(Pt 11): 3305-32, 2013 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-24065725

RESUMEN

Amyotrophic lateral sclerosis is heterogeneous with high variability in the speed of progression even in cases with a defined genetic cause such as superoxide dismutase 1 (SOD1) mutations. We reported that SOD1(G93A) mice on distinct genetic backgrounds (C57 and 129Sv) show consistent phenotypic differences in speed of disease progression and life-span that are not explained by differences in human SOD1 transgene copy number or the burden of mutant SOD1 protein within the nervous system. We aimed to compare the gene expression profiles of motor neurons from these two SOD1(G93A) mouse strains to discover the molecular mechanisms contributing to the distinct phenotypes and to identify factors underlying fast and slow disease progression. Lumbar spinal motor neurons from the two SOD1(G93A) mouse strains were isolated by laser capture microdissection and transcriptome analysis was conducted at four stages of disease. We identified marked differences in the motor neuron transcriptome between the two mice strains at disease onset, with a dramatic reduction of gene expression in the rapidly progressive (129Sv-SOD1(G93A)) compared with the slowly progressing mutant SOD1 mice (C57-SOD1(G93A)) (1276 versus 346; Q-value ≤ 0.01). Gene ontology pathway analysis of the transcriptional profile from 129Sv-SOD1(G93A) mice showed marked downregulation of specific pathways involved in mitochondrial function, as well as predicted deficiencies in protein degradation and axonal transport mechanisms. In contrast, the transcriptional profile from C57-SOD1(G93A) mice with the more benign disease course, revealed strong gene enrichment relating to immune system processes compared with 129Sv-SOD1(G93A) mice. Motor neurons from the more benign mutant strain demonstrated striking complement activation, over-expressing genes normally involved in immune cell function. We validated through immunohistochemistry increased expression of the C3 complement subunit and major histocompatibility complex I within motor neurons. In addition, we demonstrated that motor neurons from the slowly progressing mice activate a series of genes with neuroprotective properties such as angiogenin and the nuclear factor (erythroid-derived 2)-like 2 transcriptional regulator. In contrast, the faster progressing mice show dramatically reduced expression at disease onset of cell pathways involved in neuroprotection. This study highlights a set of key gene and molecular pathway indices of fast or slow disease progression which may prove useful in identifying potential disease modifiers responsible for the heterogeneity of human amyotrophic lateral sclerosis and which may represent valid therapeutic targets for ameliorating the disease course in humans.


Asunto(s)
Esclerosis Amiotrófica Lateral/genética , Progresión de la Enfermedad , Neuronas Motoras/patología , Superóxido Dismutasa/genética , Transcriptoma/genética , Esclerosis Amiotrófica Lateral/patología , Animales , Modelos Animales de Enfermedad , Femenino , Ratones , Ratones de la Cepa 129 , Ratones Endogámicos C57BL , Ratones Transgénicos , Neuronas Motoras/metabolismo , Mutación/genética , Fenotipo , Superóxido Dismutasa-1 , Factores de Tiempo
3.
Cell Syst ; 15(3): 286-294.e2, 2024 Mar 20.
Artículo en Inglés | MEDLINE | ID: mdl-38428432

RESUMEN

Pretrained protein sequence language models have been shown to improve the performance of many prediction tasks and are now routinely integrated into bioinformatics tools. However, these models largely rely on the transformer architecture, which scales quadratically with sequence length in both run-time and memory. Therefore, state-of-the-art models have limitations on sequence length. To address this limitation, we investigated whether convolutional neural network (CNN) architectures, which scale linearly with sequence length, could be as effective as transformers in protein language models. With masked language model pretraining, CNNs are competitive with, and occasionally superior to, transformers across downstream applications while maintaining strong performance on sequences longer than those allowed in the current state-of-the-art transformer models. Our work suggests that computational efficiency can be improved without sacrificing performance, simply by using a CNN architecture instead of a transformer, and emphasizes the importance of disentangling pretraining task and model architecture. A record of this paper's transparent peer review process is included in the supplemental information.


Asunto(s)
Biología Computacional , Redes Neurales de la Computación , Secuencia de Aminoácidos , Revisión por Pares
4.
bioRxiv ; 2024 Feb 12.
Artículo en Inglés | MEDLINE | ID: mdl-38405697

RESUMEN

Clustering is commonly used in single-cell RNA-sequencing (scRNA-seq) pipelines to characterize cellular heterogeneity. However, current methods face two main limitations. First, they require user-specified heuristics which add time and complexity to bioinformatic workflows; second, they rely on post-selective differential expression analyses to identify marker genes driving cluster differences, which has been shown to be subject to inflated false discovery rates. We address these challenges by introducing nonparametric clustering of single-cell populations (NCLUSION): an infinite mixture model that leverages Bayesian sparse priors to identify marker genes while simultaneously performing clustering on single-cell expression data. NCLUSION uses a scalable variational inference algorithm to perform these analyses on datasets with up to millions of cells. By analyzing publicly available scRNA-seq studies, we demonstrate that NCLUSION (i) matches the performance of other state-of-the-art clustering techniques with significantly reduced runtime and (ii) provides statistically robust and biologically relevant transcriptomic signatures for each of the clusters it identifies. Overall, NCLUSION represents a reliable hypothesis-generating tool for understanding patterns of expression variation present in single-cell populations.

5.
Nat Med ; 2024 Jul 22.
Artículo en Inglés | MEDLINE | ID: mdl-39039250

RESUMEN

The analysis of histopathology images with artificial intelligence aims to enable clinical decision support systems and precision medicine. The success of such applications depends on the ability to model the diverse patterns observed in pathology images. To this end, we present Virchow, the largest foundation model for computational pathology to date. In addition to the evaluation of biomarker prediction and cell identification, we demonstrate that a large foundation model enables pan-cancer detection, achieving 0.95 specimen-level area under the (receiver operating characteristic) curve across nine common and seven rare cancers. Furthermore, we show that with less training data, the pan-cancer detector built on Virchow can achieve similar performance to tissue-specific clinical-grade models in production and outperform them on some rare variants of cancer. Virchow's performance gains highlight the value of a foundation model and open possibilities for many high-impact applications with limited amounts of labeled training data.

6.
Acta Neuropathol ; 125(1): 95-109, 2013 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-23143228

RESUMEN

A consistent clinical feature of amyotrophic lateral sclerosis (ALS) is the sparing of eye movements and the function of external sphincters, with corresponding preservation of motor neurons in the brainstem oculomotor nuclei, and of Onuf's nucleus in the sacral spinal cord. Studying the differences in properties of neurons that are vulnerable and resistant to the disease process in ALS may provide insights into the mechanisms of neuronal degeneration, and identify targets for therapeutic manipulation. We used microarray analysis to determine the differences in gene expression between oculomotor and spinal motor neurons, isolated by laser capture microdissection from the midbrain and spinal cord of neurologically normal human controls. We compared these to transcriptional profiles of oculomotor nuclei and spinal cord from rat and mouse, obtained from the GEO omnibus database. We show that oculomotor neurons have a distinct transcriptional profile, with significant differential expression of 1,757 named genes (q < 0.001). Differentially expressed genes are enriched for the functional categories of synaptic transmission, ubiquitin-dependent proteolysis, mitochondrial function, transcriptional regulation, immune system functions, and the extracellular matrix. Marked differences are seen, across the three species, in genes with a function in synaptic transmission, including several glutamate and GABA receptor subunits. Using patch clamp recording in acute spinal and brainstem slices, we show that resistant oculomotor neurons show a reduced AMPA-mediated inward calcium current, and a higher GABA-mediated chloride current, than vulnerable spinal motor neurons. The findings suggest that reduced susceptibility to excitotoxicity, mediated in part through enhanced GABAergic transmission, is an important determinant of the relative resistance of oculomotor neurons to degeneration in ALS.


Asunto(s)
Esclerosis Amiotrófica Lateral/genética , Regulación de la Expresión Génica/genética , Médula Espinal/metabolismo , Transmisión Sináptica/genética , Anciano , Esclerosis Amiotrófica Lateral/metabolismo , Esclerosis Amiotrófica Lateral/fisiopatología , Femenino , Predisposición Genética a la Enfermedad , Humanos , Masculino , Persona de Mediana Edad , Neuronas Motoras/metabolismo , Neuronas Motoras/patología , Degeneración Nerviosa/genética , Degeneración Nerviosa/prevención & control , Receptores AMPA/genética , Receptores AMPA/metabolismo , Médula Espinal/patología , Ácido gamma-Aminobutírico/genética , Ácido gamma-Aminobutírico/metabolismo
7.
PLoS Comput Biol ; 8(1): e1002330, 2012 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-22241974

RESUMEN

Expression quantitative trait loci (eQTL) studies are an integral tool to investigate the genetic component of gene expression variation. A major challenge in the analysis of such studies are hidden confounding factors, such as unobserved covariates or unknown subtle environmental perturbations. These factors can induce a pronounced artifactual correlation structure in the expression profiles, which may create spurious false associations or mask real genetic association signals. Here, we report PANAMA (Probabilistic ANAlysis of genoMic dAta), a novel probabilistic model to account for confounding factors within an eQTL analysis. In contrast to previous methods, PANAMA learns hidden factors jointly with the effect of prominent genetic regulators. As a result, this new model can more accurately distinguish true genetic association signals from confounding variation. We applied our model and compared it to existing methods on different datasets and biological systems. PANAMA consistently performs better than alternative methods, and finds in particular substantially more trans regulators. Importantly, our approach not only identifies a greater number of associations, but also yields hits that are biologically more plausible and can be better reproduced between independent studies. A software implementation of PANAMA is freely available online at http://ml.sheffield.ac.uk/qtl/.


Asunto(s)
Algoritmos , Mapeo Cromosómico/métodos , Regulación de la Expresión Génica/genética , Variación Genética/genética , Modelos Genéticos , Modelos Estadísticos , Sitios de Carácter Cuantitativo/genética , Animales , Simulación por Computador , Factores de Confusión Epidemiológicos , Interpretación Estadística de Datos , Humanos , Sensibilidad y Especificidad
8.
Nat Biomed Eng ; 2(1): 38-47, 2018 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-29998038

RESUMEN

The CRISPR-Cas9 system provides unprecedented genome editing capabilities. However, off-target effects lead to sub-optimal usage and additionally are a bottleneck in the development of therapeutic uses. Herein, we introduce the first machine learning-based approach to off-target prediction, yielding a state-of-the-art model for CRISPR-Cas9 that outperforms all other guide design services. Our approach, Elevation, consists of two interdependent machine learning models-one for scoring individual guide-target pairs, and another which aggregates these guide-target scores into a single, overall summary guide score. Through systematic investigation, we demonstrate that Elevation performs substantially better than competing approaches on both tasks. Additionally, we are the first to systematically evaluate approaches on the guide summary score problem; we show that the most widely-used method performs no better than random at times, whereas Elevation consistently outperformed it, sometimes by an order of magnitude. We also introduce an evaluation method that balances errors between active and inactive guides, thereby encapsulating a range of practical use cases; Elevation is consistently superior to other methods across the entire range. Finally, because of the large scale and computational demands of off-target prediction, we have developed a cloud-based service for quick retrieval. This service provides end-to-end guide design by also incorporating our previously reported on-target model, Azimuth. (https://crispr.ml:please treat this web site as confidential until publication).

9.
Nat Biotechnol ; 36(2): 179-189, 2018 02.
Artículo en Inglés | MEDLINE | ID: mdl-29251726

RESUMEN

Combinatorial genetic screening using CRISPR-Cas9 is a useful approach to uncover redundant genes and to explore complex gene networks. However, current methods suffer from interference between the single-guide RNAs (sgRNAs) and from limited gene targeting activity. To increase the efficiency of combinatorial screening, we employ orthogonal Cas9 enzymes from Staphylococcus aureus and Streptococcus pyogenes. We used machine learning to establish S. aureus Cas9 sgRNA design rules and paired S. aureus Cas9 with S. pyogenes Cas9 to achieve dual targeting in a high fraction of cells. We also developed a lentiviral vector and cloning strategy to generate high-complexity pooled dual-knockout libraries to identify synthetic lethal and buffering gene pairs across multiple cell types, including MAPK pathway genes and apoptotic genes. Our orthologous approach also enabled a screen combining gene knockouts with transcriptional activation, which revealed genetic interactions with TP53. The "Big Papi" (paired aureus and pyogenes for interactions) approach described here will be widely applicable for the study of combinatorial phenotypes.


Asunto(s)
Sistemas CRISPR-Cas/genética , Epistasis Genética/genética , Pruebas Genéticas , ARN Guía de Kinetoplastida/genética , Apoptosis/genética , Técnicas de Inactivación de Genes , Marcación de Gen , Humanos , Aprendizaje Automático , Quinasas de Proteína Quinasa Activadas por Mitógenos/genética , Transducción de Señal/genética , Staphylococcus aureus/genética , Streptococcus pyogenes/genética , Proteína p53 Supresora de Tumor/genética
10.
J Comput Biol ; 24(6): 524-535, 2017 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-28056190

RESUMEN

Genome-wide association studies commonly examine one trait at a time. Occasionally they examine several related traits with the hope of increasing power; in such a setting, the traits are not generally smoothly varying in any way such as time or space. However, for function-valued traits, the trait is often smoothly varying along the axis of interest, such as space or time. For instance, in the case of longitudinal traits such as growth curves, the axis of interest is time; for spatially varying traits such as chromatin accessibility, it would be position along the genome. Although there have been efforts to perform genome-wide association studies with such function-valued traits, the statistical approaches developed for this purpose often have limitations such as requiring the trait to behave linearly in time or space, or constraining the genetic effect itself to be constant or linear in time. Herein, we present a flexible model for this problem-the Partitioned Gaussian Process-which removes many such limitations and is especially effective as the number of time points increases. The theoretical basis of this model provides machinery for handling missing and unaligned function values such as would occur when not all individuals are measured at the same time points. Furthermore, we make use of algebraic refactorizations to substantially reduce the time complexity of our model beyond the naive implementation. Finally, we apply our approach and several others to synthetic data before closing, with some directions for improved modeling and statistical testing.


Asunto(s)
Estudio de Asociación del Genoma Completo/métodos , Modelos Genéticos , Modelos Estadísticos , Carácter Cuantitativo Heredable , Análisis de Secuencia de ADN/métodos , Simulación por Computador , Humanos , Distribución Normal , Estadísticas no Paramétricas
11.
Nat Biotechnol ; 34(2): 184-191, 2016 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-26780180

RESUMEN

CRISPR-Cas9-based genetic screens are a powerful new tool in biology. By simply altering the sequence of the single-guide RNA (sgRNA), one can reprogram Cas9 to target different sites in the genome with relative ease, but the on-target activity and off-target effects of individual sgRNAs can vary widely. Here, we use recently devised sgRNA design rules to create human and mouse genome-wide libraries, perform positive and negative selection screens and observe that the use of these rules produced improved results. Additionally, we profile the off-target activity of thousands of sgRNAs and develop a metric to predict off-target sites. We incorporate these findings from large-scale, empirical data to improve our computational design rules and create optimized sgRNA libraries that maximize on-target activity and minimize off-target effects to enable more effective and efficient genetic screens and genome engineering.


Asunto(s)
Sistemas CRISPR-Cas/genética , Ingeniería Genética/métodos , Genómica/métodos , ARN Guía de Kinetoplastida/genética , Animales , Línea Celular Tumoral , Resistencia a Medicamentos/genética , Biblioteca de Genes , Genoma/genética , Humanos , Ratones
12.
Nat Med ; 22(6): 606-13, 2016 06.
Artículo en Inglés | MEDLINE | ID: mdl-27183217

RESUMEN

Human leukocyte antigen class I (HLA)-restricted CD8(+) T lymphocyte (CTL) responses are crucial to HIV-1 control. Although HIV can evade these responses, the longer-term impact of viral escape mutants remains unclear, as these variants can also reduce intrinsic viral fitness. To address this, we here developed a metric to determine the degree of HIV adaptation to an HLA profile. We demonstrate that transmission of viruses that are pre-adapted to the HLA molecules expressed in the recipient is associated with impaired immunogenicity, elevated viral load and accelerated CD4(+) T cell decline. Furthermore, the extent of pre-adaptation among circulating viruses explains much of the variation in outcomes attributed to the expression of certain HLA alleles. Thus, viral pre-adaptation exploits 'holes' in the immune response. Accounting for these holes may be key for vaccine strategies seeking to elicit functional responses from viral variants, and to HIV cure strategies that require broad CTL responses to achieve successful eradication of HIV reservoirs.


Asunto(s)
Adaptación Fisiológica/inmunología , Linfocitos T CD8-positivos/inmunología , Infecciones por VIH/transmisión , VIH-1/inmunología , Antígenos de Histocompatibilidad Clase I/inmunología , Evasión Inmune/inmunología , Vacunas contra el SIDA/inmunología , África Austral , Colombia Británica , Recuento de Linfocito CD4 , Estudios de Cohortes , Evolución Molecular , Infecciones por VIH/inmunología , VIH-1/genética , Humanos , Evasión Inmune/genética , Inmunidad Celular/inmunología , Modelos Lineales , Modelos Inmunológicos , Modelos de Riesgos Proporcionales , Receptores de Antígenos de Linfocitos T/inmunología , Carga Viral , Replicación Viral/genética
13.
Nat Commun ; 5: 4890, 2014 Sep 19.
Artículo en Inglés | MEDLINE | ID: mdl-25234577

RESUMEN

Linear mixed models (LMMs) are a powerful and established tool for studying genotype-phenotype relationships. A limitation of the LMM is that the model assumes Gaussian distributed residuals, a requirement that rarely holds in practice. Violations of this assumption can lead to false conclusions and loss in power. To mitigate this problem, it is common practice to pre-process the phenotypic values to make them as Gaussian as possible, for instance by applying logarithmic or other nonlinear transformations. Unfortunately, different phenotypes require different transformations, and choosing an appropriate transformation is challenging and subjective. Here we present an extension of the LMM that estimates an optimal transformation from the observed data. In simulations and applications to real data from human, mouse and yeast, we show that using transformations inferred by our model increases power in genome-wide association studies and increases the accuracy of heritability estimation and phenotype prediction.


Asunto(s)
Modelos Lineales , Modelos Genéticos , Animales , Simulación por Computador , Bases de Datos Factuales , Hongos/genética , Hongos/metabolismo , Estudios de Asociación Genética , Estudio de Asociación del Genoma Completo , Humanos , Ratones , Distribución Normal , Fenotipo , Polimorfismo de Nucleótido Simple , Levaduras
14.
Sci Rep ; 4: 6874, 2014 Nov 12.
Artículo en Inglés | MEDLINE | ID: mdl-25387525

RESUMEN

We examine improvements to the linear mixed model (LMM) that better correct for population structure and family relatedness in genome-wide association studies (GWAS). LMMs rely on the estimation of a genetic similarity matrix (GSM), which encodes the pairwise similarity between every two individuals in a cohort. These similarities are estimated from single nucleotide polymorphisms (SNPs) or other genetic variants. Traditionally, all available SNPs are used to estimate the GSM. In empirical studies across a wide range of synthetic and real data, we find that modifications to this approach improve GWAS performance as measured by type I error control and power. Specifically, when only population structure is present, a GSM constructed from SNPs that well predict the phenotype in combination with principal components as covariates controls type I error and yields more power than the traditional LMM. In any setting, with or without population structure or family relatedness, a GSM consisting of a mixture of two component GSMs, one constructed from all SNPs and another constructed from SNPs that well predict the phenotype again controls type I error and yields more power than the traditional LMM. Software implementing these improvements and the experimental comparisons are available at http://microsoft.com/science.


Asunto(s)
Estudio de Asociación del Genoma Completo/estadística & datos numéricos , Modelos Lineales , Polimorfismo de Nucleótido Simple , Programas Informáticos , Algoritmos , Animales , Genotipo , Humanos , Ratones , Modelos Genéticos , Fenotipo
15.
Elife ; 2: e01123, 2013 Oct 29.
Artículo en Inglés | MEDLINE | ID: mdl-24171102

RESUMEN

HIV-1 sequence diversity is affected by selection pressures arising from host genomic factors. Using paired human and viral data from 1071 individuals, we ran >3000 genome-wide scans, testing for associations between host DNA polymorphisms, HIV-1 sequence variation and plasma viral load (VL), while considering human and viral population structure. We observed significant human SNP associations to a total of 48 HIV-1 amino acid variants (p<2.4 × 10(-12)). All associated SNPs mapped to the HLA class I region. Clinical relevance of host and pathogen variation was assessed using VL results. We identified two critical advantages to the use of viral variation for identifying host factors: (1) association signals are much stronger for HIV-1 sequence variants than VL, reflecting the 'intermediate phenotype' nature of viral variation; (2) association testing can be run without any clinical data. The proposed genome-to-genome approach highlights sites of genomic conflict and is a strategy generally applicable to studies of host-pathogen interaction. DOI:http://dx.doi.org/10.7554/eLife.01123.001.


Asunto(s)
Genoma Humano , Genoma Viral , Infecciones por VIH/genética , VIH-1/genética , Polimorfismo de Nucleótido Simple , Carga Viral/genética , Alelos , Estudio de Asociación del Genoma Completo , Infecciones por VIH/inmunología , Infecciones por VIH/virología , VIH-1/inmunología , Antígenos de Histocompatibilidad Clase I/genética , Antígenos de Histocompatibilidad Clase I/inmunología , Interacciones Huésped-Patógeno/genética , Interacciones Huésped-Patógeno/inmunología , Humanos , Carga Viral/inmunología
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA