RESUMO
Colorectal cancers (CRCs) form a heterogenous group classified into epigenetic and transcriptional subtypes. The basis for the epigenetic subtypes, exemplified by varying degrees of promoter DNA hypermethylation, and its relation to the transcriptional subtypes is not well understood. We link cancer-specific transcription factor (TF) expression alterations to methylation alterations near TF-binding sites at promoter and enhancer regions in CRCs and their premalignant precursor lesions to provide mechanistic insights into the origins and evolution of the CRC molecular subtypes. A gradient of TF expression changes forms a basis for the subtypes of abnormal DNA methylation, termed CpG-island promoter DNA methylation phenotypes (CIMPs), in CRCs and other cancers. CIMP is tightly correlated with cancer-specific hypermethylation at enhancers, which we term CpG-enhancer methylation phenotype (CEMP). Coordinated promoter and enhancer methylation appears to be driven by downregulation of TFs with common binding sites at the hypermethylated enhancers and promoters. The altered expression of TFs related to hypermethylator subtypes occurs early during CRC development, detectable in premalignant adenomas. TF-based profiling further identifies patients with worse overall survival. Importantly, altered expression of these TFs discriminates the transcriptome-based consensus molecular subtypes (CMS), thus providing a common basis for CIMP and CMS subtypes.
Assuntos
Neoplasias Colorretais , Lesões Pré-Cancerosas , Humanos , Fatores de Transcrição , Regulação da Expressão Gênica , Metilação de DNA , Epigênese GenéticaRESUMO
The search for chemical hit material is a lengthy and increasingly expensive drug discovery process. To improve it, ligand-based quantitative structure-activity relationship models have been broadly applied to optimize primary and secondary compound properties. Although these models can be deployed as early as the stage of molecule design, they have a limited applicability domainâif the structures of interest differ substantially from the chemical space on which the model was trained, a reliable prediction will not be possible. Image-informed ligand-based models partly solve this shortcoming by focusing on the phenotype of a cell caused by small molecules, rather than on their structure. While this enables chemical diversity expansion, it limits the application to compounds physically available and imaged. Here, we employ an active learning approach to capitalize on both of these methods' strengths and boost the model performance of a mitochondrial toxicity assay (Glu/Gal). Specifically, we used a phenotypic Cell Painting screen to build a chemistry-independent model and adopted the results as the main factor in selecting compounds for experimental testing. With the additional Glu/Gal annotation for selected compounds we were able to dramatically improve the chemistry-informed ligand-based model with respect to the increased recognition of compounds from a 10% broader chemical space.
Assuntos
Aprendizado Profundo , Relação Quantitativa Estrutura-Atividade , Ligantes , Descoberta de Drogas/métodosRESUMO
BACKGROUND: Alternative gene splicing is a common phenomenon in which a single gene gives rise to multiple transcript isoforms. The process is strictly guided and involves a multitude of proteins and regulatory complexes. Unfortunately, aberrant splicing events do occur which have been linked to genetic disorders, such as several types of cancer and neurodegenerative diseases (Fan et al., Theor Biol Med Model 3:19, 2006). Therefore, understanding the mechanism of alternative splicing and identifying the difference in splicing events between diseased and healthy tissue is crucial in biomedical research with the potential of applications in personalized medicine as well as in drug development. RESULTS: We propose a linear mixed model, Random Effects for the Identification of Differential Splicing (REIDS), for the identification of alternative splicing events. Based on a set of scores, an exon score and an array score, a decision regarding alternative splicing can be made. The model enables the ability to distinguish a differential expressed gene from a differential spliced exon. The proposed model was applied to three case studies concerning both exon and HTA arrays. CONCLUSION: The REIDS model provides a work flow for the identification of alternative splicing events relying on the established linear mixed model. The model can be applied to different types of arrays.
Assuntos
Processamento Alternativo , Modelos Genéticos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Transcriptoma , Área Sob a Curva , Neoplasias do Colo/genética , Neoplasias do Colo/metabolismo , Neoplasias do Colo/patologia , Éxons , Humanos , Proteínas com Domínio LIM/genética , Proteínas dos Microfilamentos/genética , Isoformas de Proteínas/genética , Curva ROCRESUMO
MOTIVATION: Post-sequencing DNA analysis typically consists of read mapping followed by variant calling. Especially for whole genome sequencing, this computational step is very time-consuming, even when using multithreading on a multi-core machine. RESULTS: We present Halvade, a framework that enables sequencing pipelines to be executed in parallel on a multi-node and/or multi-core compute infrastructure in a highly efficient manner. As an example, a DNA sequencing analysis pipeline for variant calling has been implemented according to the GATK Best Practices recommendations, supporting both whole genome and whole exome sequencing. Using a 15-node computer cluster with 360 CPU cores in total, Halvade processes the NA12878 dataset (human, 100 bp paired-end reads, 50× coverage) in <3 h with very high parallel efficiency. Even on a single, multi-core machine, Halvade attains a significant speedup compared with running the individual tools with multithreading.
Assuntos
Análise de Sequência de DNA/métodos , Software , Genoma Humano , HumanosRESUMO
MOTIVATION: In virology, massively parallel sequencing (MPS) opens many opportunities for studying viral quasi-species, e.g. in HIV-1- and HCV-infected patients. This is essential for understanding pathways to resistance, which can substantially improve treatment. Although MPS platforms allow in-depth characterization of sequence variation, their measurements still involve substantial technical noise. For Illumina sequencing, single base substitutions are the main error source and impede powerful assessment of low-frequency mutations. Fortunately, base calls are complemented with quality scores (Qs) that are useful for differentiating errors from the real low-frequency mutations. RESULTS: A variant calling tool, Q-cpileup, is proposed, which exploits the Qs of nucleotides in a filtering strategy to increase specificity. The tool is imbedded in an open-source pipeline, VirVarSeq, which allows variant calling starting from fastq files. Using both plasmid mixtures and clinical samples, we show that Q-cpileup is able to reduce the number of false-positive findings. The filtering strategy is adaptive and provides an optimized threshold for individual samples in each sequencing run. Additionally, linkage information is kept between single-nucleotide polymorphisms as variants are called at the codon level. This enables virologists to have an immediate biological interpretation of the reported variants with respect to their antiviral drug responses. A comparison with existing SNP caller tools reveals that calling variants at the codon level with Q-cpileup results in an outstanding sensitivity while maintaining a good specificity for variants with frequencies down to 0.5%. AVAILABILITY: The VirVarSeq is available, together with a user's guide and test data, at sourceforge: http://sourceforge.net/projects/virtools/?source=directory.
Assuntos
Algoritmos , Variação Genética/genética , Genômica/métodos , Hepacivirus/genética , Hepatite C/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Software , Genoma Viral , Hepatite C/virologia , HumanosRESUMO
BACKGROUND: Next generation sequencing enables studying heterogeneous populations of viral infections. When the sequencing is done at high coverage depth ("deep sequencing"), low frequency variants can be detected. Here we present QQ-SNV (http://sourceforge.net/projects/qqsnv), a logistic regression classifier model developed for the Illumina sequencing platforms that uses the quantiles of the quality scores, to distinguish true single nucleotide variants from sequencing errors based on the estimated SNV probability. To train the model, we created a dataset of an in silico mixture of five HIV-1 plasmids. Testing of our method in comparison to the existing methods LoFreq, ShoRAH, and V-Phaser 2 was performed on two HIV and four HCV plasmid mixture datasets and one influenza H1N1 clinical dataset. RESULTS: For default application of QQ-SNV, variants were called using a SNV probability cutoff of 0.5 (QQ-SNV(D)). To improve the sensitivity we used a SNV probability cutoff of 0.0001 (QQ-SNV(HS)). To also increase specificity, SNVs called were overruled when their frequency was below the 80(th) percentile calculated on the distribution of error frequencies (QQ-SNV(HS-P80)). When comparing QQ-SNV versus the other methods on the plasmid mixture test sets, QQ-SNV(D) performed similarly to the existing approaches. QQ-SNV(HS) was more sensitive on all test sets but with more false positives. QQ-SNV(HS-P80) was found to be the most accurate method over all test sets by balancing sensitivity and specificity. When applied to a paired-end HCV sequencing study, with lowest spiked-in true frequency of 0.5%, QQ-SNV(HS-P80) revealed a sensitivity of 100% (vs. 40-60% for the existing methods) and a specificity of 100% (vs. 98.0-99.7% for the existing methods). In addition, QQ-SNV required the least overall computation time to process the test sets. Finally, when testing on a clinical sample, four putative true variants with frequency below 0.5% were consistently detected by QQ-SNV(HS-P80) from different generations of Illumina sequencers. CONCLUSIONS: We developed and successfully evaluated a novel method, called QQ-SNV, for highly efficient single nucleotide variant calling on Illumina deep sequencing virology data.
Assuntos
Infecções por HIV/genética , HIV-1/genética , Hepacivirus/genética , Hepatite C/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Polimorfismo de Nucleotídeo Único/genética , Software , Algoritmos , Análise por Conglomerados , Simulação por Computador , Genoma Viral , Infecções por HIV/virologia , Hepatite C/virologia , Humanos , Plasmídeos/genética , Análise de RegressãoRESUMO
BACKGROUND: Deep-sequencing allows for an in-depth characterization of sequence variation in complex populations. However, technology associated errors may impede a powerful assessment of low-frequency mutations. Fortunately, base calls are complemented with quality scores which are derived from a quadruplet of intensities, one channel for each nucleotide type for Illumina sequencing. The highest intensity of the four channels determines the base that is called. Mismatch bases can often be corrected by the second best base, i.e. the base with the second highest intensity in the quadruplet. A virus variant model-based clustering method, ViVaMBC, is presented that explores quality scores and second best base calls for identifying and quantifying viral variants. ViVaMBC is optimized to call variants at the codon level (nucleotide triplets) which enables immediate biological interpretation of the variants with respect to their antiviral drug responses. RESULTS: Using mixtures of HCV plasmids we show that our method accurately estimates frequencies down to 0.5%. The estimates are unbiased when average coverages of 25,000 are reached. A comparison with the SNP-callers V-Phaser2, ShoRAH, and LoFreq shows that ViVaMBC has a superb sensitivity and specificity for variants with frequencies above 0.4%. Unlike the competitors, ViVaMBC reports a higher number of false-positive findings with frequencies below 0.4% which might partially originate from picking up artificial variants introduced by errors in the sample and library preparation step. CONCLUSIONS: ViVaMBC is the first method to call viral variants directly at the codon level. The strength of the approach lies in modeling the error probabilities based on the quality scores. Although the use of second best base calls appeared very promising in our data exploration phase, their utility was limited. They provided a slight increase in sensitivity, which however does not warrant the additional computational cost of running the offline base caller. Apparently a lot of information is already contained in the quality scores enabling the model based clustering procedure to adjust the majority of the sequencing errors. Overall the sensitivity of ViVaMBC is such that technical constraints like PCR errors start to form the bottleneck for low frequency variant detection.
Assuntos
Algoritmos , Variação Genética/genética , Hepacivirus/genética , Hepatite C/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Mutação/genética , Software , Análise por Conglomerados , Genoma Viral , Genômica/métodos , Hepatite C/virologia , Humanos , Sensibilidade e Especificidade , Análise de Sequência de DNA/métodosRESUMO
Motor neuron degeneration in amyotrophic lateral sclerosis (ALS) has a familial cause in 10% of patients. Despite significant advances in the genetics of the disease, many families remain unexplained. We performed whole-genome sequencing in five family members from a pedigree with autosomal-dominant classical ALS. A family-based elimination approach was used to identify novel coding variants segregating with the disease. This list of variants was effectively shortened by genotyping these variants in 2 additional unaffected family members and 1500 unrelated population-specific controls. A novel rare coding variant in SPAG8 on chromosome 9p13.3 segregated with the disease and was not observed in controls. Mutations in SPAG8 were not encountered in 34 other unexplained ALS pedigrees, including 1 with linkage to chromosome 9p13.2-23.3. The shared haplotype containing the SPAG8 variant in this small pedigree was 22.7 Mb and overlapped with the core 9p21 linkage locus for ALS and frontotemporal dementia. Based on differences in coverage depth of known variable tandem repeat regions between affected and non-affected family members, the shared haplotype was found to contain an expanded hexanucleotide (GGGGCC)(n) repeat in C9orf72 in the affected members. Our results demonstrate that rare coding variants identified by whole-genome sequencing can tag a shared haplotype containing a non-coding pathogenic mutation and that changes in coverage depth can be used to reveal tandem repeat expansions. It also confirms (GGGGCC)n repeat expansions in C9orf72 as a cause of familial ALS.
Assuntos
Esclerose Lateral Amiotrófica/genética , Expansão das Repetições de DNA , Genoma Humano , Proteínas/genética , Adulto , Idade de Início , Idoso , Esclerose Lateral Amiotrófica/patologia , Proteína C9orf72 , Mapeamento Cromossômico , Feminino , Variação Genética , Haplótipos , Humanos , Masculino , Pessoa de Meia-Idade , Linhagem , Proteínas/metabolismoRESUMO
SUMMARY: Pipit is a gene-centric interactive visualization tool designed to study structural genomic variations. Through focusing on individual genes as the functional unit, researchers are able to study and generate hypotheses on the biological impact of different structural variations, for instance, the deletion of dosage-sensitive genes or the formation of fusion genes. Pipit is a cross-platform Java application that visualizes structural variation data from Genome Variation Format files. AVAILABILITY: Executables, source code, sample data, documentation and screencast are available at https://bitbucket.org/biovizleuven/pipit.
Assuntos
Variação Estrutural do Genoma , Software , Animais , Gráficos por Computador , Genes , Genoma , CamundongosRESUMO
Single nucleotide variants (SNVs) are, together with copy number variation, the primary source of variation in the human genome and are associated with phenotypic variation such as altered response to drug treatment and susceptibility to disease. Linking structural effects of non-synonymous SNVs to functional outcomes is a major issue in structural bioinformatics. The SNPeffect database (http://snpeffect.switchlab.org) uses sequence- and structure-based bioinformatics tools to predict the effect of protein-coding SNVs on the structural phenotype of proteins. It integrates aggregation prediction (TANGO), amyloid prediction (WALTZ), chaperone-binding prediction (LIMBO) and protein stability analysis (FoldX) for structural phenotyping. Additionally, SNPeffect holds information on affected catalytic sites and a number of post-translational modifications. The database contains all known human protein variants from UniProt, but users can now also submit custom protein variants for a SNPeffect analysis, including automated structure modeling. The new meta-analysis application allows plotting correlations between phenotypic features for a user-selected set of variants.
Assuntos
Bases de Dados de Proteínas , Polimorfismo de Nucleotídeo Único , Conformação Proteica , Proteínas/genética , Humanos , Internet , Metanálise como Assunto , FenótipoRESUMO
Despite substantial progress in cancer microbiome research, recognized confounders and advances in absolute microbiome quantification remain underused; this raises concerns regarding potential spurious associations. Here we study the fecal microbiota of 589 patients at different colorectal cancer (CRC) stages and compare observations with up to 15 published studies (4,439 patients and controls total). Using quantitative microbiome profiling based on 16S ribosomal RNA amplicon sequencing, combined with rigorous confounder control, we identified transit time, fecal calprotectin (intestinal inflammation) and body mass index as primary microbial covariates, superseding variance explained by CRC diagnostic groups. Well-established microbiome CRC targets, such as Fusobacterium nucleatum, did not significantly associate with CRC diagnostic groups (healthy, adenoma and carcinoma) when controlling for these covariates. In contrast, the associations of Anaerococcus vaginalis, Dialister pneumosintes, Parvimonas micra, Peptostreptococcus anaerobius, Porphyromonas asaccharolytica and Prevotella intermedia remained robust, highlighting their future target potential. Finally, control individuals (age 22-80 years, mean 57.7 years, standard deviation 11.3) meeting criteria for colonoscopy (for example, through a positive fecal immunochemical test) but without colonic lesions are enriched for the dysbiotic Bacteroides2 enterotype, emphasizing uncertainties in defining healthy controls in cancer microbiome research. Together, these results indicate the importance of quantitative microbiome profiling and covariate control for biomarker identification in CRC microbiome studies.
Assuntos
Neoplasias Colorretais , Fezes , Microbioma Gastrointestinal , RNA Ribossômico 16S , Humanos , Neoplasias Colorretais/microbiologia , Pessoa de Meia-Idade , Fezes/microbiologia , Feminino , Idoso , Masculino , RNA Ribossômico 16S/genética , Adulto , Microbioma Gastrointestinal/genética , Idoso de 80 Anos ou mais , Adulto Jovem , Microbiota/genética , Complexo Antígeno L1 Leucocitário/metabolismoRESUMO
Fabry disease is a lysosomal storage disorder caused by loss of α-galactosidase function. More than 500 Fabry disease mutants have been identified, the majority of which are structurally destabilized. A therapeutic strategy under development for lysosomal storage diseases consists of using pharmacological chaperones to stabilize the structure of the mutant protein, thereby promoting lysosomal delivery over retrograde degradation. The substrate analog 1-deoxygalactonojirimycin (DGJ) has been shown to restore activity of mutant α-galactosidase and is currently in clinical trial for treatment of Fabry disease. However, only â¼65% of tested mutants respond to treatment in cultured patient fibroblasts, and the structural underpinnings of DGJ response remain poorly explained. Using computational modeling and cell culture experiments, we show that the DGJ response is negatively affected by protein aggregation of α-galactosidase mutants, revealing a qualitative difference between misfolding-associated and aggregation-associated loss of function. A scoring function combining predicted thermodynamic stability and intrinsic aggregation propensity of mutants captures well their aggregation behavior under overexpression in HeLa cells. Interestingly, the same classifier performs well on DGJ response data of patient-derived cultured lymphoblasts, showing that protein aggregation is an important determinant of chemical chaperone efficiency under endogenous expression levels as well. Our observations reinforce the idea that treatment of aggregation-associated loss of function observed for the more severe α-galactosidase mutants could be enhanced by combining pharmacological chaperone treatment with the suppression of mutant aggregation, e.g. via proteostatic regulator compounds that increase cellular chaperone expression.
Assuntos
1-Desoxinojirimicina/análogos & derivados , Doença de Fabry/metabolismo , Regulação da Expressão Gênica/efeitos dos fármacos , Chaperonas Moleculares/biossíntese , Mutação de Sentido Incorreto , alfa-Galactosidase/metabolismo , 1-Desoxinojirimicina/farmacologia , Ativação Enzimática/efeitos dos fármacos , Doença de Fabry/tratamento farmacológico , Doença de Fabry/genética , Fibroblastos/metabolismo , Fibroblastos/patologia , Regulação da Expressão Gênica/genética , Células HeLa , Humanos , Chaperonas Moleculares/genética , alfa-Galactosidase/genéticaRESUMO
Protein aggregation results in beta-sheet-like assemblies that adopt either a variety of amorphous morphologies or ordered amyloid-like structures. These differences in structure also reflect biological differences; amyloid and amorphous beta-sheet aggregates have different chaperone affinities, accumulate in different cellular locations and are degraded by different mechanisms. Further, amyloid function depends entirely on a high intrinsic degree of order. Here we experimentally explored the sequence space of amyloid hexapeptides and used the derived data to build Waltz, a web-based tool that uses a position-specific scoring matrix to determine amyloid-forming sequences. Waltz allows users to identify and better distinguish between amyloid sequences and amorphous beta-sheet aggregates and allowed us to identify amyloid-forming regions in functional amyloids.
Assuntos
Amiloide/química , Algoritmos , Motivos de Aminoácidos , Sequência de Aminoácidos , Benchmarking , Estrutura Secundária de Proteína , Difração de Raios XRESUMO
Many p53 missense mutations possess dominant-negative activity and oncogenic gain of function. We report that for structurally destabilized p53 mutants, these effects result from mutant-induced coaggregation of wild-type p53 and its paralogs p63 and p73, thereby also inducing a heat-shock response. Aggregation of mutant p53 resulted from self-assembly of a conserved aggregation-nucleating sequence within the hydrophobic core of the DNA-binding domain, which becomes exposed after mutation. Suppressing the aggregation propensity of this sequence by mutagenesis abrogated gain of function and restored activity of wild-type p53 and its paralogs. In the p53 germline mutation database, tumors carrying aggregation-prone p53 mutations have a significantly lower frequency of wild-type allele loss as compared to tumors harboring nonaggregating mutations, suggesting a difference in clonal selection of aggregating mutants. Overall, our study reveals a novel disease mechanism for mutant p53 gain of function and suggests that, at least in some respects, cancer could be considered an aggregation-associated disease.
Assuntos
Proteína Supressora de Tumor p53/genética , Proteína Supressora de Tumor p53/metabolismo , Animais , Proteínas de Ligação a DNA/química , Proteínas de Ligação a DNA/genética , Proteínas de Ligação a DNA/metabolismo , Genes p53 , Humanos , Interações Hidrofóbicas e Hidrofílicas , Camundongos , Mutação , Neoplasias/genética , Neoplasias/metabolismo , Neoplasias/patologia , Proteínas Nucleares/química , Proteínas Nucleares/genética , Proteínas Nucleares/metabolismo , Conformação Proteica , Espectrometria de Massas por Ionização por Electrospray , Espectroscopia de Infravermelho com Transformada de Fourier , Transativadores/química , Transativadores/genética , Transativadores/metabolismo , Fatores de Transcrição , Células Tumorais Cultivadas , Proteína Tumoral p73 , Proteína Supressora de Tumor p53/química , Proteínas Supressoras de Tumor/química , Proteínas Supressoras de Tumor/genética , Proteínas Supressoras de Tumor/metabolismoRESUMO
BACKGROUND: No biomarkers that could guide patient selection for treatment with the anti-VEGF monoclonal antibody bevacizumab have been identified. We assessed whether genetic variants in the VEGF pathway could act as biomarkers for bevacizumab treatment outcome. METHODS: We investigated DNA from white patients from two phase 3 randomised studies. In AViTA, patients with metastatic pancreatic adenocarcinoma were randomly assigned to receive gemcitabine and erlotinib plus either bevacizumab or placebo. In AVOREN, patients with metastatic renal-cell carcinoma were randomly assigned to receive interferon alfa-2a plus either bevacizumab or placebo. We assessed the correlation of 138 SNPs in the VEGF pathway with progression-free survival and overall survival in a subpopulation of patients from AViTA. Significant findings were confirmed in a subpopulation of patients from AVOREN and functionally studied at the molecular level. FINDINGS: We investigated DNA of 154 patients from AViTA, of whom 77 received bevacizumab, and 110 patients from AVOREN, of whom 59 received bevacizumab. Only rs9582036, a SNP in VEGF receptor 1 (VEGFR1 or FLT1), was significantly associated with overall survival in the bevacizumab group of AViTA after correction for multiplicity (per-allele hazard ratio [HR] 2·1, 95% CI 1·45-3·06, p=0·00014). This SNP was also associated with progression-free survival (per-allele HR 1·89, 1·31-2·71, p=0·00081) in bevacizumab-treated patients from AViTA. AC and CC carriers of this SNP exhibited HRs for overall survival of 2·0 (1·19-3·36; p=0·0091) and 4·72 (2·08-10·68; p=0·0002) relative to AA carriers. No effects were seen in placebo-treated patients and a significant genotype by treatment interaction (p=0·041) was recorded, indicating that the VEGFR1 locus containing this SNP serves as a predictive marker for bevacizumab treatment outcome in AViTA. Fine-mapping experiments of this locus identified rs7993418, a synonymous SNP affecting tyrosine 1213 in the VEGFR1 tyrosine-kinase domain, as the functional variant underlying the association. This SNP causes a shift in codon usage, leading to increased VEGFR1 expression and downstream VEGFR1 signalling. This VEGFR1 locus correlated significantly with progression-free survival (HR 1·81, 1·08-3·05; p=0·033) but not overall survival (HR 0·91, 0·45-1·82, p=0·78) in the bevacizumab group in AVOREN. INTERPRETATION: A locus in VEGFR1 correlates with increased VEGFR1 expression and poor outcome of bevacizumab treatment. Prospective assessment is underway to validate the predictive value of this novel biomarker. FUNDING: F Hoffmann-La Roche.
Assuntos
Inibidores da Angiogênese/uso terapêutico , Anticorpos Monoclonais Humanizados/uso terapêutico , Carcinoma de Células Renais/tratamento farmacológico , Neoplasias Renais/tratamento farmacológico , Neoplasias Pancreáticas/tratamento farmacológico , Polimorfismo de Nucleotídeo Único , Fator A de Crescimento do Endotélio Vascular/antagonistas & inibidores , Receptor 1 de Fatores de Crescimento do Endotélio Vascular/genética , Adulto , Idoso , Sequência de Bases , Bevacizumab , Biomarcadores , Ensaios Clínicos Fase III como Assunto , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Dados de Sequência Molecular , Modelos de Riscos Proporcionais , Ensaios Clínicos Controlados Aleatórios como Assunto , Resultado do Tratamento , Fator A de Crescimento do Endotélio Vascular/fisiologiaRESUMO
OBJECTIVE: CD4+ T cells are implicated in rheumatoid arthritis (RA) pathology from the strong association between RA and certain HLA class II gene variants. This study was undertaken to examine the synovial T cell receptor (TCR) repertoire, T cell phenotypes, and T cell specificities in small joints of RA patients at time of diagnosis before therapeutic intervention. METHODS: Sixteen patients, of whom 11 patients were anti-citrullinated protein antibody (ACPA)-positive and 5 patients were ACPA-, underwent ultrasound-guided synovial biopsy of a small joint (n = 13) or arthroscopic synovial biopsy of a large joint (n = 3), followed by direct sorting of single T cells for paired sequencing of the αß TCR together with flow cytometry analysis. TCRs from expanded CD4+ T cell clones of 4 patients carrying an HLA-DRB1*04:01 allele were artificially reexpressed to study antigen specificity. RESULTS: T cell analysis demonstrated CD4+ dominance and the presence of peripheral helper T-like cells in both patient groups. We identified >4,000 unique TCR sequences, as well as 225 clonal expansions. Additionally, T cells with double α-chains were a recurring feature. We identified a biased gene usage of the Vß chain segment TRBV20-1 in CD4+ cells from ACPA+ patients. In vitro stimulation of T cell lines expressing selected TCRs with an extensive panel of citrullinated and viral peptides identified several different virus-specific TCRs (e.g., human cytomegalovirus and human herpesvirus 2). Still, the majority of clones remained orphans with unknown specificity. CONCLUSION: Minimally invasive biopsies of the RA synovium allow for single-cell TCR sequencing and phenotyping. Clonally expanded, viral-reactive T cells account for part of the diverse CD4+ T cell repertoire. TRBV20-1 bias in ACPA+ patients suggests recognition of common antigens.
Assuntos
Artrite Reumatoide , Humanos , Membrana Sinovial/patologia , Linfócitos T CD4-Positivos , Receptores de Antígenos de Linfócitos T/genética , Cadeias HLA-DRB1/genéticaRESUMO
We previously showed the existence of selective pressure against protein aggregation by the enrichment of aggregation-opposing 'gatekeeper' residues at strategic places along the sequence of proteins. Here we analyzed the relationship between protein lifetime and protein aggregation by combining experimentally determined turnover rates, expression data, structural data and chaperone interaction data on a set of more than 500 proteins. We find that selective pressure on protein sequences against aggregation is not homogeneous but that short-living proteins on average have a higher aggregation propensity and fewer chaperone interactions than long-living proteins. We also find that short-living proteins are more often associated to deposition diseases. These findings suggest that the efficient degradation of high-turnover proteins is sufficient to preclude aggregation, but also that factors that inhibit proteasomal activity, such as physiological ageing, will primarily affect the aggregation of short-living proteins.
Assuntos
Biologia Computacional/métodos , Proteínas/química , Proteínas/metabolismo , Bases de Dados de Proteínas , Suscetibilidade a Doenças , Evolução Molecular , Perfilação da Expressão Gênica , Humanos , Proteínas de Membrana , Análise Serial de Proteínas , Estabilidade Proteica , Proteínas/genética , Estatísticas não Paramétricas , Termodinâmica , Fatores de TempoRESUMO
Although protein-peptide interactions are estimated to constitute up to 40% of all protein interactions, relatively little information is available for the structural details of these interactions. Peptide-mediated interactions are a prime target for drug design because they are predominantly present in signaling and regulatory networks. A reliable data set of nonredundant protein-peptide complexes is indispensable as a basis for modeling and design, but current data sets for protein-peptide interactions are often biased towards specific types of interactions or are limited to interactions with small ligands. In PepX (http://pepx.switchlab.org), we have designed an unbiased and exhaustive data set of all protein-peptide complexes available in the Protein Data Bank with peptide lengths up to 35 residues. In addition, these complexes have been clustered based on their binding interfaces rather than sequence homology, providing a set of structurally diverse protein-peptide interactions. The final data set contains 505 unique protein-peptide interface clusters from 1431 complexes. Thorough annotation of each complex with both biological and structural information facilitates searching for and browsing through individual complexes and clusters. Moreover, we provide an additional source of data for peptide design by annotating peptides with naturally occurring backbone variations using fragment clusters from the BriX database.
Assuntos
Biologia Computacional/métodos , Bases de Dados de Proteínas , Mapeamento de Interação de Proteínas/métodos , Animais , Biologia Computacional/tendências , Humanos , Armazenamento e Recuperação da Informação/métodos , Internet , Ligantes , Peptídeos/química , Estrutura Terciária de Proteína , Proteínas/química , Transdução de Sinais , SoftwareRESUMO
BACKGROUND: Multiplexing of samples in single-cell RNA-seq studies allows a significant reduction of the experimental costs, straightforward identification of doublets, increased cell throughput, and reduction of sample-specific batch effects. Recently published multiplexing techniques using oligo-conjugated antibodies or -lipids allow barcoding sample-specific cells, a process called "hashing." RESULTS: Here, we compare the hashing performance of TotalSeq-A and -C antibodies, custom synthesized lipids and MULTI-seq lipid hashes in four cell lines, both for single-cell RNA-seq and single-nucleus RNA-seq. We also compare TotalSeq-B antibodies with CellPlex reagents (10x Genomics) on human PBMCs and TotalSeq-B with different lipids on primary mouse tissues. Hashing efficiency was evaluated using the intrinsic genetic variation of the cell lines and mouse strains. Antibody hashing was further evaluated on clinical samples using PBMCs from healthy and SARS-CoV-2 infected patients, where we demonstrate a more affordable approach for large single-cell sequencing clinical studies, while simultaneously reducing batch effects. CONCLUSIONS: Benchmarking of different hashing strategies and computational pipelines indicates that correct demultiplexing can be achieved with both lipid- and antibody-hashed human cells and nuclei, with MULTISeqDemux as the preferred demultiplexing function and antibody-based hashing as the most efficient protocol on cells. On nuclei datasets, lipid hashing delivers the best results. Lipid hashing also outperforms antibodies on cells isolated from mouse brain. However, antibodies demonstrate better results on tissues like spleen or lung.
Assuntos
COVID-19/sangue , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Animais , Anticorpos/química , Estudos de Casos e Controles , Linhagem Celular Tumoral , Núcleo Celular/química , Humanos , Lipídeos/química , Camundongos Endogâmicos BALB C , Camundongos Endogâmicos C57BL , Neutrófilos/química , Neutrófilos/imunologia , Neutrófilos/virologiaRESUMO
INTRODUCTION: Several common breast cancer genetic susceptibility variants have recently been identified. We aimed to determine how these variants combine with a subset of other known risk factors to influence breast cancer risk in white women of European ancestry using case-control studies participating in the Breast Cancer Association Consortium. METHODS: We evaluated two-way interactions between each of age at menarche, ever having had a live birth, number of live births, age at first birth and body mass index (BMI) and each of 12 single nucleotide polymorphisms (SNPs) (10q26-rs2981582 (FGFR2), 8q24-rs13281615, 11p15-rs3817198 (LSP1), 5q11-rs889312 (MAP3K1), 16q12-rs3803662 (TOX3), 2q35-rs13387042, 5p12-rs10941679 (MRPS30), 17q23-rs6504950 (COX11), 3p24-rs4973768 (SLC4A7), CASP8-rs17468277, TGFB1-rs1982073 and ESR1-rs3020314). Interactions were tested for by fitting logistic regression models including per-allele and linear trend main effects for SNPs and risk factors, respectively, and single-parameter interaction terms for linear departure from independent multiplicative effects. RESULTS: These analyses were applied to data for up to 26,349 invasive breast cancer cases and up to 32,208 controls from 21 case-control studies. No statistical evidence of interaction was observed beyond that expected by chance. Analyses were repeated using data from 11 population-based studies, and results were very similar. CONCLUSIONS: The relative risks for breast cancer associated with the common susceptibility variants identified to date do not appear to vary across women with different reproductive histories or body mass index (BMI). The assumption of multiplicative combined effects for these established genetic and other risk factors in risk prediction models appears justified.