RESUMEN
Type 1 regulatory T cells (Tr1 cells) are induced by interleukin-27 (IL-27) and have critical roles in the control of autoimmunity and resolution of inflammation. We found that the transcription factors IRF1 and BATF were induced early on after treatment with IL-27 and were required for the differentiation and function of Tr1 cells in vitro and in vivo. Epigenetic and transcriptional analyses revealed that both transcription factors influenced chromatin accessibility and expression of the genes required for Tr1 cell function. IRF1 and BATF deficiencies uniquely altered the chromatin landscape, suggesting that these factors serve a pioneering function during Tr1 cell differentiation.
Asunto(s)
Factores de Transcripción con Cremalleras de Leucina de Carácter Básico/metabolismo , Diferenciación Celular/inmunología , Cromatina/metabolismo , Factor 1 Regulador del Interferón/metabolismo , Linfocitos T Reguladores/inmunología , Linfocitos T Reguladores/metabolismo , Animales , Enfermedades Autoinmunes/genética , Enfermedades Autoinmunes/inmunología , Enfermedades Autoinmunes/metabolismo , Autoinmunidad , Factores de Transcripción con Cremalleras de Leucina de Carácter Básico/genética , Diferenciación Celular/genética , Cromatina/genética , Análisis por Conglomerados , Citocinas/metabolismo , Citocinas/farmacología , Perfilación de la Expresión Génica , Regulación de la Expresión Génica , Redes Reguladoras de Genes , Factor 1 Regulador del Interferón/genética , Ratones , Ratones Noqueados , Regiones Promotoras Genéticas , Subgrupos de Linfocitos T/efectos de los fármacos , Subgrupos de Linfocitos T/inmunología , Subgrupos de Linfocitos T/metabolismo , Linfocitos T Reguladores/citología , Linfocitos T Reguladores/efectos de los fármacos , Factores de Transcripción/metabolismo , TranscriptomaRESUMEN
Innate lymphoid cells (ILCs) promote tissue homeostasis and immune defense but also contribute to inflammatory diseases. ILCs exhibit phenotypic and functional plasticity in response to environmental stimuli, yet the transcriptional regulatory networks (TRNs) that control ILC function are largely unknown. Here, we integrate gene expression and chromatin accessibility data to infer regulatory interactions between transcription factors (TFs) and genes within intestinal type 1, 2, and 3 ILC subsets. We predicted the "core" TFs driving ILC identities, organized TFs into cooperative modules controlling distinct gene programs, and validated roles for c-MAF and BCL6 as regulators affecting type 1 and type 3 ILC lineages. The ILC network revealed alternative-lineage-gene repression, a mechanism that may contribute to reported plasticity between ILC subsets. By connecting TFs to genes, the TRNs suggest means to selectively regulate ILC effector functions, while our network approach is broadly applicable to identifying regulators in other in vivo cell populations.
Asunto(s)
Intestinos/fisiología , Subgrupos Linfocitarios/fisiología , Linfocitos/fisiología , Animales , Diferenciación Celular , Linaje de la Célula , Plasticidad de la Célula , Ensamble y Desensamble de Cromatina , Represión Epigenética , Redes Reguladoras de Genes , Inmunidad Innata , Inmunomodulación , Ratones , Ratones Endogámicos C57BL , Ratones Transgénicos , Proteínas Proto-Oncogénicas c-bcl-6/genética , Proteínas Proto-Oncogénicas c-maf/genética , TranscriptomaRESUMEN
Understanding the changes in diverse molecular pathways underlying the development of breast tumors is critical for improving diagnosis, treatment, and drug development. Here, we used RNA-profiling of canine mammary tumors (CMTs) coupled with a robust analysis framework to model molecular changes in human breast cancer. Our study leveraged a key advantage of the canine model, the frequent presence of multiple naturally occurring tumors at diagnosis, thus providing samples spanning normal tissue and benign and malignant tumors from each patient. We showed human breast cancer signals, at both expression and mutation level, are evident in CMTs. Profiling multiple tumors per patient enabled by the CMT model allowed us to resolve statistically robust transcription patterns and biological pathways specific to malignant tumors versus those arising in benign tumors or shared with normal tissues. We showed that multiple histological samples per patient is necessary to effectively capture these progression-related signatures, and that carcinoma-specific signatures are predictive of survival for human breast cancer patients. To catalyze and support similar analyses and use of the CMT model by other biomedical researchers, we provide FREYA, a robust data processing pipeline and statistical analyses framework.
RESUMEN
MOTIVATION: Gene regulatory networks define regulatory relationships between transcription factors and target genes within a biological system, and reconstructing them is essential for understanding cellular growth and function. Methods for inferring and reconstructing networks from genomics data have evolved rapidly over the last decade in response to advances in sequencing technology and machine learning. The scale of data collection has increased dramatically; the largest genome-wide gene expression datasets have grown from thousands of measurements to millions of single cells, and new technologies are on the horizon to increase to tens of millions of cells and above. RESULTS: In this work, we present the Inferelator 3.0, which has been significantly updated to integrate data from distinct cell types to learn context-specific regulatory networks and aggregate them into a shared regulatory network, while retaining the functionality of the previous versions. The Inferelator is able to integrate the largest single-cell datasets and learn cell-type-specific gene regulatory networks. Compared to other network inference methods, the Inferelator learns new and informative Saccharomyces cerevisiae networks from single-cell gene expression data, measured by recovery of a known gold standard. We demonstrate its scaling capabilities by learning networks for multiple distinct neuronal and glial cell types in the developing Mus musculus brain at E18 from a large (1.3 million) single-cell gene expression dataset with paired single-cell chromatin accessibility data. AVAILABILITY AND IMPLEMENTATION: The inferelator software is available on GitHub (https://github.com/flatironinstitute/inferelator) under the MIT license and has been released as python packages with associated documentation (https://inferelator.readthedocs.io/). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Redes Reguladoras de Genes , Programas Informáticos , Animales , Ratones , Genómica , Genoma , CromatinaRESUMEN
Congenital heart disease (CHD) is the most frequent birth defect, affecting 0.8% of live births. Many cases occur sporadically and impair reproductive fitness, suggesting a role for de novo mutations. Here we compare the incidence of de novo mutations in 362 severe CHD cases and 264 controls by analysing exome sequencing of parent-offspring trios. CHD cases show a significant excess of protein-altering de novo mutations in genes expressed in the developing heart, with an odds ratio of 7.5 for damaging (premature termination, frameshift, splice site) mutations. Similar odds ratios are seen across the main classes of severe CHD. We find a marked excess of de novo mutations in genes involved in the production, removal or reading of histone 3 lysine 4 (H3K4) methylation, or ubiquitination of H2BK120, which is required for H3K4 methylation. There are also two de novo mutations in SMAD2, which regulates H3K27 methylation in the embryonic left-right organizer. The combination of both activating (H3K4 methylation) and inactivating (H3K27 methylation) chromatin marks characterizes 'poised' promoters and enhancers, which regulate expression of key developmental genes. These findings implicate de novo point mutations in several hundreds of genes that collectively contribute to approximately 10% of severe CHD.
Asunto(s)
Cardiopatías/congénito , Cardiopatías/genética , Histonas/metabolismo , Adulto , Estudios de Casos y Controles , Niño , Cromatina/química , Cromatina/metabolismo , Análisis Mutacional de ADN , Elementos de Facilitación Genéticos/genética , Exoma/genética , Femenino , Genes del Desarrollo/genética , Cardiopatías/metabolismo , Histonas/química , Humanos , Lisina/química , Lisina/metabolismo , Masculino , Metilación , Mutación , Oportunidad Relativa , Regiones Promotoras Genéticas/genéticaRESUMEN
Differential binding of transcription factors (TFs) at cis-regulatory loci drives the differentiation and function of diverse cellular lineages. Understanding the regulatory interactions that underlie cell fate decisions requires characterizing TF binding sites (TFBS) across multiple cell types and conditions. Techniques, e.g. ChIP-Seq can reveal genome-wide patterns of TF binding, but typically requires laborious and costly experiments for each TF-cell-type (TFCT) condition of interest. Chromosomal accessibility assays can connect accessible chromatin in one cell type to many TFs through sequence motif mapping. Such methods, however, rarely take into account that the genomic context preferred by each factor differs from TF to TF, and from cell type to cell type. To address the differences in TF behaviors, we developed Mocap, a method that integrates chromatin accessibility, motif scores, TF footprints, CpG/GC content, evolutionary conservation and other factors in an ensemble of TFCT-specific classifiers. We show that integration of genomic features, such as CpG islands improves TFBS prediction in some TFCT. Further, we describe a method for mapping new TFCT, for which no ChIP-seq data exists, onto our ensemble of classifiers and show that our cross-sample TFBS prediction method outperforms several previously described methods.
Asunto(s)
Cromatina/metabolismo , Ectodermo/metabolismo , Endodermo/metabolismo , Mesodermo/metabolismo , Programas Informáticos , Factores de Transcripción/metabolismo , Transcripción Genética , Composición de Base , Sitios de Unión , Línea Celular , Cromatina/química , Inmunoprecipitación de Cromatina , Islas de CpG , Bases de Datos Genéticas , Ectodermo/citología , Endodermo/citología , Humanos , Mesodermo/citología , Motivos de Nucleótidos , Especificidad de Órganos , Unión Proteica , Factores de Transcripción/genéticaRESUMEN
Multiple studies have confirmed the contribution of rare de novo copy number variations to the risk for autism spectrum disorders. But whereas de novo single nucleotide variants have been identified in affected individuals, their contribution to risk has yet to be clarified. Specifically, the frequency and distribution of these mutations have not been well characterized in matched unaffected controls, and such data are vital to the interpretation of de novo coding mutations observed in probands. Here we show, using whole-exome sequencing of 928 individuals, including 200 phenotypically discordant sibling pairs, that highly disruptive (nonsense and splice-site) de novo mutations in brain-expressed genes are associated with autism spectrum disorders and carry large effects. On the basis of mutation rates in unaffected individuals, we demonstrate that multiple independent de novo single nucleotide variants in the same gene among unrelated probands reliably identifies risk alleles, providing a clear path forward for gene discovery. Among a total of 279 identified de novo coding mutations, there is a single instance in probands, and none in siblings, in which two independent nonsense variants disrupt the same gene, SCN2A (sodium channel, voltage-gated, type II, α subunit), a result that is highly unlikely by chance.
Asunto(s)
Trastorno Autístico/genética , Exoma/genética , Exones/genética , Predisposición Genética a la Enfermedad/genética , Mutación/genética , Proteínas del Tejido Nervioso/genética , Canales de Sodio/genética , Alelos , Codón sin Sentido/genética , Heterogeneidad Genética , Humanos , Canal de Sodio Activado por Voltaje NAV1.2 , Sitios de Empalme de ARN/genética , HermanosRESUMEN
Parasitic protozoa of the flagellate order Kinetoplastida represent one of the deepest branches of the eukaryotic tree. Among this group of organisms, the mechanism of RNA interference (RNAi) has been investigated in Trypanosoma brucei and to a lesser degree in Leishmania (Viannia) spp. The pathway is triggered by long double-stranded RNA (dsRNA) and in T. brucei requires a set of five core genes, including a single Argonaute (AGO) protein, T. brucei AGO1 (TbAGO1). The five genes are conserved in Leishmania (Viannia) spp. but are absent in other major kinetoplastid species, such as Trypanosoma cruzi and Leishmania major. In T. brucei small interfering RNAs (siRNAs) are methylated at the 3' end, whereas Leishmania (Viannia) sp. siRNAs are not. Here we report that T. brucei HEN1, an ortholog of the metazoan HEN1 2'-O-methyltransferases, is required for methylation of siRNAs. Loss of TbHEN1 causes a reduction in the length of siRNAs. The shorter siRNAs in hen1(-/-) parasites are single stranded and associated with TbAGO1, and a subset carry a nontemplated uridine at the 3' end. These findings support a model wherein TbHEN1 methylates siRNA 3' ends after they are loaded into TbAGO1 and this methylation protects siRNAs from uridylation and 3' trimming. Moreover, expression of TbHEN1 in Leishmania (Viannia) panamensis did not result in siRNA 3' end methylation, further emphasizing mechanistic differences in the trypanosome and Leishmania RNAi mechanisms.
Asunto(s)
Metiltransferasas/metabolismo , Proteínas Protozoarias/metabolismo , Procesamiento Postranscripcional del ARN , ARN Protozoario/metabolismo , ARN Interferente Pequeño/metabolismo , Trypanosoma brucei brucei/metabolismo , Secuencia de Aminoácidos , Leishmania/genética , Leishmania/metabolismo , Metiltransferasas/química , Metiltransferasas/genética , Datos de Secuencia Molecular , Mutación , Proteínas Protozoarias/química , Proteínas Protozoarias/genética , Trypanosoma brucei brucei/enzimologíaRESUMEN
Among trypanosomatid protozoa the mechanism of RNA interference (RNAi) has been investigated in Trypanosoma brucei and to a lesser extent in Leishmania braziliensis. Although these two parasitic organisms belong to the same family, they are evolutionarily distantly related raising questions about the conservation of the RNAi pathway. Here we carried out an in-depth analysis of small interfering RNAs (siRNAs) associated with L. braziliensis Argonaute1 (LbrAGO1). In contrast to T. brucei, Leishmania siRNAs are sensitive to 3' end oxidation, indicating the absence of blocking groups, and the Leishmania genome does not code for a HEN1 RNA 2'-O-methyltransferase, which modifies small RNA 3' ends. Consistent with this observation, ~20% of siRNA 3' ends carry non-templated uridines. Thus siRNA biogenesis, and most likely their metabolism, is different in these organisms. Similarly to T. brucei, putative mobile elements and repeats constitute the major Leishmania siRNA-producing loci and AGO1 ablation leads to accumulation of long transcripts derived from putative mobile elements. However, contrary to T. brucei, no siRNAs were detected from other genomic regions with the potential to form double-stranded RNA, namely sites of convergent transcription and inverted repeats. Thus, our results indicate that organism-specific diversification has occurred in the RNAi pathway during evolution of the trypanosomatid lineage.
Asunto(s)
Variación Genética , Leishmania braziliensis/genética , ARN Interferente Pequeño/genética , Proteínas Argonautas/genética , Regulación de la Expresión Génica , ARN Interferente Pequeño/química , Trypanosoma brucei brucei/genéticaRESUMEN
Exploiting sequence-structure-function relationships in biotechnology requires improved methods for aligning proteins that have low sequence similarity to previously annotated proteins. We develop two deep learning methods to address this gap, TM-Vec and DeepBLAST. TM-Vec allows searching for structure-structure similarities in large sequence databases. It is trained to accurately predict TM-scores as a metric of structural similarity directly from sequence pairs without the need for intermediate computation or solution of structures. Once structurally similar proteins have been identified, DeepBLAST can structurally align proteins using only sequence information by identifying structurally homologous regions between proteins. It outperforms traditional sequence alignment methods and performs similarly to structure-based alignment methods. We show the merits of TM-Vec and DeepBLAST on a variety of datasets, including better identification of remotely homologous proteins compared with state-of-the-art sequence alignment and structure prediction methods.
RESUMEN
Duplicated pseudogenes in the human genome are disabled copies of functioning parent genes. They result from block duplication events occurring throughout evolutionary history. Relatively recent duplications (with sequence similarity≥90% and length≥1 kb) are termed segmental duplications (SDs); here, we analyze the interrelationship of SDs and pseudogenes. We present a decision-tree approach to classify pseudogenes based on their (and their parents') characteristics in relation to SDs. The classification identifies 140 novel pseudogenes and makes possible improved annotation for the 3172 pseudogenes located in SDs. In particular, it reveals that many pseudogenes in SDs likely did not arise directly from parent genes, but are the result of a multi-step process. In these cases, the initial duplication or retrotransposition of a parent gene gives rise to a 'parent pseudogene', followed by further duplication creating duplicated-duplicated or duplicated-processed pseudogenes, respectively. Moreover, we can precisely identify these parent pseudogenes by overlap with ancestral SD loci. Finally, a comparison of nucleotide substitutions per site in a pseudogene with its surrounding SD region allows us to estimate the time difference between duplication and disablement events, and this suggests that most duplicated pseudogenes in SDs were likely disabled around the time of the original duplication.
Asunto(s)
Genoma Humano , Seudogenes , Duplicaciones Segmentarias en el Genoma , Evolución Molecular , Duplicación de Gen , Sitios Genéticos , HumanosRESUMEN
BACKGROUND: Genetic association studies, thus far, have focused on the analysis of individual main effects of SNP markers. Nonetheless, there is a clear need for modeling epistasis or gene-gene interactions to better understand the biologic basis of existing associations. Tree-based methods have been widely studied as tools for building prediction models based on complex variable interactions. An understanding of the power of such methods for the discovery of genetic associations in the presence of complex interactions is of great importance. Here, we systematically evaluate the power of three leading algorithms: random forests (RF), Monte Carlo logic regression (MCLR), and multifactor dimensionality reduction (MDR). METHODS: We use the algorithm-specific variable importance measures (VIMs) as statistics and employ permutation-based resampling to generate the null distribution and associated p values. The power of the three is assessed via simulation studies. Additionally, in a data analysis, we evaluate the associations between individual SNPs in pro-inflammatory and immunoregulatory genes and the risk of non-Hodgkin lymphoma. RESULTS: The power of RF is highest in all simulation models, that of MCLR is similar to RF in half, and that of MDR is consistently the lowest. CONCLUSIONS: Our study indicates that the power of RF VIMs is most reliable. However, in addition to tuning parameters, the power of RF is notably influenced by the type of variable (continuous vs. categorical) and the chosen VIM.
Asunto(s)
Minería de Datos/métodos , Epistasis Genética , Estudios de Asociación Genética , Algoritmos , Simulación por Computador , Sitios Genéticos , Genoma Humano , Haplotipos , Humanos , Linfoma no Hodgkin/genética , Método de Montecarlo , Polimorfismo de Nucleótido SimpleRESUMEN
Pseudofam (http://pseudofam.pseudogene.org) is a database of pseudogene families based on the protein families from the Pfam database. It provides resources for analyzing the family structure of pseudogenes including query tools, statistical summaries and sequence alignments. The current version of Pseudofam contains more than 125,000 pseudogenes identified from 10 eukaryotic genomes and aligned within nearly 3000 families (approximately one-third of the total families in PfamA). Pseudofam uses a large-scale parallelized homology search algorithm (implemented as an extension of the PseudoPipe pipeline) to identify pseudogenes. Each identified pseudogene is assigned to its parent protein family and subsequently aligned to each other by transferring the parent domain alignments from the Pfam family. Pseudogenes are also given additional annotation based on an ontology, reflecting their mode of creation and subsequent history. In particular, our annotation highlights the association of pseudogene families with genomic features, such as segmental duplications. In addition, pseudogene families are associated with key statistics, which identify outlier families with an unusual degree of pseudogenization. The statistics also show how the number of genes and pseudogenes in families correlates across different species. Overall, they highlight the fact that housekeeping families tend to be enriched with a large number of pseudogenes.
Asunto(s)
Bases de Datos Genéticas , Seudogenes , Animales , Interpretación Estadística de Datos , Genómica , Humanos , Internet , Proteínas/clasificación , Proteínas/genética , Alineación de SecuenciaRESUMEN
ATAC-seq has become a leading technology for probing the chromatin landscape of single and aggregated cells. Distilling functional regions from ATAC-seq presents diverse analysis challenges. Methods commonly used to analyze chromatin accessibility datasets are adapted from algorithms designed to process different experimental technologies, disregarding the statistical and biological differences intrinsic to the ATAC-seq technology. Here, we present a Bayesian statistical approach that uses latent space models to better model accessible regions, termed ChromA. ChromA annotates chromatin landscape by integrating information from replicates, producing a consensus de-noised annotation of chromatin accessibility. ChromA can analyze single cell ATAC-seq data, correcting many biases generated by the sparse sampling inherent in single cell technologies. We validate ChromA on multiple technologies and biological systems, including mouse and human immune cells, establishing ChromA as a top performing general platform for mapping the chromatin landscape in different cellular populations from diverse experimental designs.
Asunto(s)
Cromatina/genética , Genómica/métodos , Modelos Genéticos , Algoritmos , Animales , Teorema de Bayes , Secuenciación de Inmunoprecipitación de Cromatina , Biblioteca de Genes , Humanos , Cadenas de Markov , Ratones , Anotación de Secuencia Molecular , Análisis de la Célula IndividualRESUMEN
BACKGROUND: Pseudogenes provide a record of the molecular evolution of genes. As glycolysis is such a highly conserved and fundamental metabolic pathway, the pseudogenes of glycolytic enzymes comprise a standardized genomic measuring stick and an ideal platform for studying molecular evolution. One of the glycolytic enzymes, glyceraldehyde-3-phosphate dehydrogenase (GAPDH), has already been noted to have one of the largest numbers of associated pseudogenes, among all proteins. RESULTS: We assembled the first comprehensive catalog of the processed and duplicated pseudogenes of glycolytic enzymes in many vertebrate model-organism genomes, including human, chimpanzee, mouse, rat, chicken, zebrafish, pufferfish, fruitfly, and worm (available at http://pseudogene.org/glycolysis/). We found that glycolytic pseudogenes are predominantly processed, i.e. retrotransposed from the mRNA of their parent genes. Although each glycolytic enzyme plays a unique role, GAPDH has by far the most pseudogenes, perhaps reflecting its large number of non-glycolytic functions or its possession of a particularly retrotranspositionally active sub-sequence. Furthermore, the number of GAPDH pseudogenes varies significantly among the genomes we studied: none in zebrafish, pufferfish, fruitfly, and worm, 1 in chicken, 50 in chimpanzee, 62 in human, 331 in mouse, and 364 in rat. Next, we developed a simple method of identifying conserved syntenic blocks (consistently applicable to the wide range of organisms in the study) by using orthologous genes as anchors delimiting a conserved block between a pair of genomes. This approach showed that few glycolytic pseudogenes are shared between primate and rodent lineages. Finally, by estimating pseudogene ages using Kimura's two-parameter model of nucleotide substitution, we found evidence for bursts of retrotranspositional activity approximately 42, 36, and 26 million years ago in the human, mouse, and rat lineages, respectively. CONCLUSION: Overall, we performed a consistent analysis of one group of pseudogenes across multiple genomes, finding evidence that most of them were created within the last 50 million years, subsequent to the divergence of rodent and primate lineages.
Asunto(s)
Evolución Molecular , Gliceraldehído-3-Fosfato Deshidrogenasas/genética , Seudogenes , Retroelementos , Vertebrados/genética , Animales , Hibridación Genómica Comparativa , Análisis Mutacional de ADN , Genoma , SinteníaRESUMEN
The Pseudogene.org knowledgebase serves as a comprehensive repository for pseudogene annotation. The definition of a pseudogene varies within the literature, resulting in significantly different approaches to the problem of identification. Consequently, it is difficult to maintain a consistent collection of pseudogenes in detail necessary for their effective use. Our database is designed to address this issue. It integrates a variety of heterogeneous resources and supports a subset structure that highlights specific groups of pseudogenes that are of interest to the research community. Tools are provided for the comparison of sets and the creation of layered set unions, enabling researchers to derive a current 'consensus' set of pseudogenes. Additional features include versatile search, the capacity for robust interaction with other databases, the ability to reconstruct older versions of the database (accounting for changing genome builds) and an underlying object-oriented interface designed for researchers with a minimal knowledge of programming. At the present time, the database contains more than 100,000 pseudogenes spanning 64 prokaryote and 11 eukaryote genomes, including a collection of human annotations compiled from 16 sources.
Asunto(s)
Bases de Datos Genéticas , Seudogenes , Humanos , Internet , Programas Informáticos , Interfaz Usuario-ComputadorRESUMEN
Purpose: Decisions to continue or suspend therapy with immune checkpoint inhibitors are commonly guided by tumor dynamics seen on serial imaging. However, immunotherapy responses are uniquely challenging to interpret because tumors often shrink slowly or can appear transiently enlarged due to inflammation. We hypothesized that monitoring tumor cell death in real time by quantifying changes in circulating tumor DNA (ctDNA) levels could enable early assessment of immunotherapy efficacy.Experimental Design: We compared longitudinal changes in ctDNA levels with changes in radiographic tumor size and with survival outcomes in 28 patients with metastatic non-small cell lung cancer (NSCLC) receiving immune checkpoint inhibitor therapy. CtDNA was quantified by determining the allele fraction of cancer-associated somatic mutations in plasma using a multigene next-generation sequencing assay. We defined a ctDNA response as a >50% decrease in mutant allele fraction from baseline, with a second confirmatory measurement.Results: Strong agreement was observed between ctDNA response and radiographic response (Cohen's kappa, 0.753). Median time to initial response among patients who achieved responses in both categories was 24.5 days by ctDNA versus 72.5 days by imaging. Time on treatment was significantly longer for ctDNA responders versus nonresponders (median, 205.5 vs. 69 days; P < 0.001). A ctDNA response was associated with superior progression-free survival [hazard ratio (HR), 0.29; 95% CI, 0.09-0.89; P = 0.03], and superior overall survival (HR, 0.17; 95% CI, 0.05-0.62; P = 0.007).Conclusions: A drop in ctDNA level is an early marker of therapeutic efficacy and predicts prolonged survival in patients treated with immune checkpoint inhibitors for NSCLC. Clin Cancer Res; 24(8); 1872-80. ©2018 AACR.
Asunto(s)
Biomarcadores de Tumor , ADN Tumoral Circulante , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/terapia , Antineoplásicos Inmunológicos/uso terapéutico , Antígeno B7-H1/antagonistas & inhibidores , Progresión de la Enfermedad , Humanos , Inmunoterapia , Neoplasias Pulmonares/diagnóstico , Neoplasias Pulmonares/inmunología , Mutación , Pronóstico , Receptor de Muerte Celular Programada 1/antagonistas & inhibidores , Análisis de Supervivencia , Factores de Tiempo , Tomografía Computarizada por Rayos X , Resultado del TratamientoRESUMEN
BACKGROUND: Tiling microarrays are becoming an essential technology in the functional genomics toolbox. They have been applied to the tasks of novel transcript identification, elucidation of transcription factor binding sites, detection of methylated DNA and several other applications in several model organisms. These experiments are being conducted at increasingly finer resolutions as the microarray technology enjoys increasingly greater feature densities. The increased densities naturally lead to increased data analysis requirements. Specifically, the most widely employed algorithm for tiling array analysis involves smoothing observed signals by computing pseudomedians within sliding windows, a O(n2logn) calculation in each window. This poor time complexity is an issue for tiling array analysis and could prove to be a real bottleneck as tiling microarray experiments become grander in scope and finer in resolution. RESULTS: We therefore implemented Monahan's HLQEST algorithm that reduces the runtime complexity for computing the pseudomedian of n numbers to O(nlogn) from O(n2logn). For a representative tiling microarray dataset, this modification reduced the smoothing procedure's runtime by nearly 90%. We then leveraged the fact that elements within sliding windows remain largely unchanged in overlapping windows (as one slides across genomic space) to further reduce computation by an additional 43%. This was achieved by the application of skip lists to maintaining a sorted list of values from window to window. This sorted list could be maintained with simple O(log n) inserts and deletes. We illustrate the favorable scaling properties of our algorithms with both time complexity analysis and benchmarking on synthetic datasets. CONCLUSION: Tiling microarray analyses that rely upon a sliding window pseudomedian calculation can require many hours of computation. We have eased this requirement significantly by implementing efficient algorithms that scale well with genomic feature density. This result not only speeds the current standard analyses, but also makes possible ones where many iterations of the filter may be required, such as might be required in a bootstrap or parameter estimation setting. Source code and executables are available at http://tiling.gersteinlab.org/pseudomedian/.
Asunto(s)
Algoritmos , Mapeo Cromosómico/métodos , Perfilación de la Expresión Génica/métodos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Análisis de Secuencia de ADN/métodos , Procesamiento de Señales Asistido por ComputadorRESUMEN
BACKGROUND: Relating features of protein sequences to structural hinges is important for identifying domain boundaries, understanding structure-function relationships, and designing flexibility into proteins. Efforts in this field have been hampered by the lack of a proper dataset for studying characteristics of hinges. RESULTS: Using the Molecular Motions Database we have created a Hinge Atlas of manually annotated hinges and a statistical formalism for calculating the enrichment of various types of residues in these hinges. CONCLUSION: We found various correlations between hinges and sequence features. Some of these are expected; for instance, we found that hinges tend to occur on the surface and in coils and turns and to be enriched with small and hydrophilic residues. Others are less obvious and intuitive. In particular, we found that hinges tend to coincide with active sites, but unlike the latter they are not at all conserved in evolution. We evaluate the potential for hinge prediction based on sequence. Motions play an important role in catalysis and protein-ligand interactions. Hinge bending motions comprise the largest class of known motions. Therefore it is important to relate the hinge location to sequence features such as residue type, physicochemical class, secondary structure, solvent exposure, evolutionary conservation, and proximity to active sites. To do this, we first generated the Hinge Atlas, a set of protein motions with the hinge locations manually annotated, and then studied the coincidence of these features with the hinge location. We found that all of the features have bearing on the hinge location. Most interestingly, we found that hinges tend to occur at or near active sites and yet unlike the latter are not conserved. Less surprisingly, we found that hinge residues tend to be small, not hydrophobic or aliphatic, and occur in turns and random coils on the surface. A functional sequence based hinge predictor was made which uses some of the data generated in this study. The Hinge Atlas is made available to the community for further flexibility studies.
Asunto(s)
Bases de Datos de Proteínas , Almacenamiento y Recuperación de la Información/métodos , Modelos Químicos , Modelos Moleculares , Proteínas/química , Proteínas/ultraestructura , Análisis de Secuencia de Proteína/métodos , Algoritmos , Simulación por Computador , Conformación Proteica , Alineación de Secuencia/métodos , Relación Estructura-ActividadRESUMEN
Pseudogenes, in the case of protein-coding genes, are gene copies that have lost the ability to code for a protein; they are typically identified through annotation of disabled, decayed or incomplete protein-coding sequences. Processed pseudogenes (PPsigs) are made through mRNA retrotransposition. There is overwhelming genomic evidence for thousands of human PPsigs and also dozens of human processed genes that comprise complete retrotransposed copies of other genes. Here, we survey for an intermediate entity, the transcribed processed pseudogene (TPPsig), which is disabled but nonetheless transcribed. TPPsigs may affect expression of paralogous genes, as observed in the case of the mouse makorin1-p1 TPPsig. To elucidate their role, we identified human TPPsigs by mapping expressed sequences onto PPsigs and, reciprocally, extracting TPPsigs from known mRNAs. We consider only those PPsigs that are homologous to either non-mammalian eukaryotic proteins or protein domains of known structure, and require detection of identical coding-sequence disablements in both the expressed and genomic sequences. Oligonucleotide microarray data provide further expression verification. Overall, we find 166-233 TPPsigs ( approximately 4-6% of PPsigs). Proteins/transcripts with the highest numbers of homologous TPPsigs generally have many homologous PPsigs and are abundantly expressed. TPPsigs are significantly over-represented near both the 5' and 3' ends of genes; this suggests that TPPsigs can be formed through gene-promoter co-option, or intrusion into untranslated regions. However, roughly half of the TPPsigs are located away from genes in the intergenic DNA and thus may be co-opting cryptic promoters of undesignated origin. Furthermore, TPPsigs are unlike other PPsigs and processed genes in the following ways: (i) they do not show a significant tendency to either deposit on or originate from the X chromosome; (ii) only 5% of human TPPsigs have potential orthologs in mouse. This latter finding indicates that the vast majority of TPPsigs is lineage specific. This is likely linked to well-documented extensive lineage-specific SINE/LINE activity. The list of TPPsigs is available at: http://www.biology.mcgill.ca/faculty/harrison/tppg/bppg.tov (or) http:pseudogene.org.