RESUMO
MOTIVATION: Computational identification of copy number variants (CNVs) in sequencing data is a challenging task. Existing CNV-detection methods account for various sources of variation and perform different normalization strategies. However, their applicability and predictions are restricted to specific enrichment protocols. Here, we introduce a novel tool named varAmpliCNV, specifically designed for CNV-detection in amplicon-based targeted resequencing data (Haloplex™ enrichment protocol) in the absence of matched controls. VarAmpliCNV utilizes principal component analysis (PCA) and/or metric dimensional scaling (MDS) to control variances of amplicon associated read counts enabling effective detection of CNV signals. RESULTS: Performance of VarAmpliCNV was compared against three existing methods (ConVaDING, ONCOCNV and DECoN) on data of 167 samples run with an aortic aneurysm gene panel (n = 30), including 9 positive control samples. Additionally, we validated the performance on a large deafness gene panel (n = 145) run on 138 samples, containing 4 positive controls. VarAmpliCNV achieved higher sensitivity (100%) and specificity (99.78%) in comparison to competing methods. In addition, unsupervised clustering of CNV segments and visualization plots of amplicons spanning these regions are included as a downstream strategy to filter out false positives. AVAILABILITY AND IMPLEMENTATION: The tool is freely available through galaxy toolshed and at: https://hub.docker.com/r/cmgantwerpen/varamplicnv. Supplementary Data File S1: https://tinyurl.com/2yzswyhh; Supplementary Data File S2: https://tinyurl.com/ycyf2fb4. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Algoritmos , Variações do Número de Cópias de DNA , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodosRESUMO
BACKGROUND: Although carpal tunnel syndrome (CTS) is the most common form of peripheral entrapment neuropathy, its pathogenesis remains largely unknown. An estimated heritability index of 0.46 and an increased familial occurrence indicate that genetic factors must play a role in the pathogenesis. METHODS AND RESULTS: We report on a family in which CTS occurred in subsequent generations at an unusually young age. Additional clinical features included brachydactyly and short Achilles tendons resulting in toe walking in childhood. Using exome sequencing, we identified a heterozygous variant (c.5009T>G; p.Phe1670Cys) in the fibrillin-2 (FBN2) gene that co-segregated with the phenotype in the family. Functional assays showed that the missense variant impaired integrin-mediated cell adhesion and migration. Moreover, we observed an increased transforming growth factor-ß signalling and fibrosis in the carpal tissues of affected individuals. A variant burden test in a large cohort of patients with CTS revealed a significantly increased frequency of rare (6.7% vs 2.5%-3.4%, p<0.001) and high-impact (6.9% vs 2.7%, p<0.001) FBN2 variants in patient alleles compared with controls. CONCLUSION: The identification of a novel FBN2 variant (p.Phe1670Cys) in a unique family with early onset CTS, together with the observed increased frequency of rare and high-impact FBN2 variants in patients with sporadic CTS, strongly suggest a role of FBN2 in the pathogenesis of CTS.
Assuntos
Síndrome do Túnel Carpal/genética , Fibrilina-2/genética , Tendão do Calcâneo/anormalidades , Estatura/genética , Síndrome do Túnel Carpal/diagnóstico por imagem , Síndrome do Túnel Carpal/etiologia , Humanos , Masculino , Mutação de Sentido Incorreto , LinhagemRESUMO
Motivation: Computational gene prioritization can aid in disease gene identification. Here, we propose pBRIT (prioritization using Bayesian Ridge regression and Information Theoretic model), a novel adaptive and scalable prioritization tool, integrating Pubmed abstracts, Gene Ontology, Sequence similarities, Mammalian and Human Phenotype Ontology, Pathway, Interactions, Disease Ontology, Gene Association database and Human Genome Epidemiology database, into the prediction model. We explore and address effects of sparsity and inter-feature dependencies within annotation sources, and the impact of bias towards specific annotations. Results: pBRIT models feature dependencies and sparsity by an Information-Theoretic (data driven) approach and applies intermediate integration based data fusion. Following the hypothesis that genes underlying similar diseases will share functional and phenotype characteristics, it incorporates Bayesian Ridge regression to learn a linear mapping between functional and phenotype annotations. Genes are prioritized on phenotypic concordance to the training genes. We evaluated pBRIT against nine existing methods, and on over 2000 HPO-gene associations retrieved after construction of pBRIT data sources. We achieve maximum AUC scores ranging from 0.92 to 0.96 against benchmark datasets and of 0.80 against the time-stamped HPO entries, indicating good performance with high sensitivity and specificity. Our model shows stable performance with regard to changes in the underlying annotation data, is fast and scalable for implementation in routine pipelines. Availability and implementation: http://biomina.be/apps/pbrit/; https://bitbucket.org/medgenua/pbrit. Supplementary information: Supplementary data are available at Bioinformatics online.
Assuntos
Ontologias Biológicas , Biologia Computacional/métodos , Armazenamento e Recuperação da Informação/métodos , Fenótipo , Software , Animais , Teorema de Bayes , Genômica/métodos , Humanos , Análise de Sequência de DNA/métodosRESUMO
BACKGROUND: Gene Ontology (GO) is a popular standard in the annotation of gene products and provides information related to genes across all species. The structure of GO is dynamic and is updated on a daily basis. However, the popular existing methods use outdated versions of GO. Moreover, these tools are slow to process large datasets consisting of more than 20,000 genes. RESULTS: We have developed GOParGenPy, a platform independent software tool to generate the binary data matrix showing the GO class membership, including parental classes, of a set of GO annotated genes. GOParGenPy is at least an order of magnitude faster than popular tools for Gene Ontology analysis and it can handle larger datasets than the existing tools. It can use any available version of the GO structure and allows the user to select the source of GO annotation. GO structure selection is critical for analysis, as we show that GO classes have rapid turnover between different GO structure releases. CONCLUSIONS: GOParGenPy is an easy to use software tool which can generate sparse or full binary matrices from GO annotated gene sets. The obtained binary matrix can then be used with any analysis environment and with any analysis methods.
Assuntos
Ontologia Genética , Genes , Anotação de Sequência Molecular/métodos , Proteínas/genética , Software , Inteligência Artificial , Anotação de Sequência Molecular/classificação , Proteínas/química , Proteínas/classificação , Ferramenta de Busca/métodos , Software/classificação , Vocabulário ControladoRESUMO
On average a human cell type expresses around 10,000 different protein coding genes synthesizing all the different molecular forms of the protein product (proteoforms) found in a cell. In a typical shotgun bottom up proteomic approach, the proteins are enzymatically cleaved, producing several 100,000â¯s of different peptides that are analyzed with liquid chromatography-tandem mass spectrometry (LC-MSMS). One of the major consequences of this high sample complexity is that coelution of peptides cannot be avoided. Moreover, low abundant peptides are difficult to identify as they have a lower chance of being selected for fragmentation due to ion-suppression effects and the semi-stochastic nature of the precursor selection in data-dependent shotgun proteomic analysis where peptides are selected for fragmentation analysis one-by-one as they elute from the column. In the current study we explore a simple novel approach that has the potential to counter some of the effect of coelution of peptides and improves the number of peptide identifications in a bottom-up proteomic analysis. In this method, peptides from a HeLa cell digest were eluted from the reverse phase column using three different elution solvents (acetonitrile, methanol and acetone) in three replicate reversed phase LC-MS/MS shotgun proteomic analysis. Results were compared with three technical replicates using the same solvent, which is common practice in proteomic analysis. In total, we see an increase of up to 10% in unique protein and up to 30% in unique peptide identifications from the combined analysis using different elution solvents when compared to the combined identifications from the three replicates of the same solvent. In addition, the overlap of unique peptide identifications common in all three LC-MS analyses in our approach is only 23% compared to 50% in the replicates using the same solvent. The method presented here thus provides an easy to implement method to significantly reduce the effects of coelution and ion suppression of peptides and improve protein coverage in shotgun proteomics. Data are available via ProteomeXchange with identifier PXD011908.
Assuntos
Cromatografia Líquida/métodos , Proteoma/química , Proteômica/métodos , Espectrometria de Massas em Tandem/métodos , Células HeLa , Humanos , Peptídeos/químicaRESUMO
Bicuspid aortic valve (BAV) is a common congenital heart defect (population incidence, 1-2%)1-3 that frequently presents with ascending aortic aneurysm (AscAA)4. BAV/AscAA shows autosomal dominant inheritance with incomplete penetrance and male predominance. Causative gene mutations (for example, NOTCH1, SMAD6) are known for ≤1% of nonsyndromic BAV cases with and without AscAA5-8, impeding mechanistic insight and development of therapeutic strategies. Here, we report the identification of variants in ROBO4 (which encodes a factor known to contribute to endothelial performance) that segregate with disease in two families. Targeted sequencing of ROBO4 showed enrichment for rare variants in BAV/AscAA probands compared with controls. Targeted silencing of ROBO4 or mutant ROBO4 expression in endothelial cell lines results in impaired barrier function and a synthetic repertoire suggestive of endothelial-to-mesenchymal transition. This is consistent with BAV/AscAA-associated findings in patients and in animal models deficient for ROBO4. These data identify a novel endothelial etiology for this common human disease phenotype.
Assuntos
Aneurisma da Aorta Torácica/genética , Valva Aórtica/anormalidades , Doenças das Valvas Cardíacas/genética , Mutação/genética , Receptores de Superfície Celular/genética , Animais , Doença da Válvula Aórtica Bicúspide , Células Cultivadas , Modelos Animais de Doenças , Células Endoteliais/fisiologia , Feminino , Humanos , Masculino , Camundongos , Camundongos Endogâmicos C57BL , Camundongos Knockout , Fenótipo , Peixe-ZebraRESUMO
Intellectual disability (ID) affects approximately 1-2% of the general population and is characterized by impaired cognitive abilities. ID is both clinically as well as genetically heterogeneous, up to 2000 genes are estimated to be involved in the emergence of the disease with various clinical presentations. For many genes, only a few patients have been reported and causality of some genes has been questioned upon the discovery of apparent loss-of-function mutations in healthy controls. Description of additional patients strengthens the evidence for the involvement of a gene in the disease and can clarify the clinical phenotype associated with mutations in a particular gene. Here, we present two large four-generation families with a total of 11 males affected with ID caused by mutations in ZNF711, thereby expanding the total number of families with ID and a ZNF711 mutation to four. Patients with mutations in ZNF711 all present with mild to moderate ID and poor speech accompanied by additional features in some patients, including autistic features and mild facial dysmorphisms, suggesting that ZNF711 mutations cause non-syndromic ID.
Assuntos
Transtornos da Articulação/genética , Transtorno do Espectro Autista/genética , Proteínas de Ligação a DNA/genética , Genes Ligados ao Cromossomo X , Predisposição Genética para Doença , Deficiência Intelectual/genética , Mutação , Adolescente , Adulto , Transtornos da Articulação/diagnóstico , Transtornos da Articulação/fisiopatologia , Transtorno do Espectro Autista/diagnóstico , Transtorno do Espectro Autista/fisiopatologia , Sequência de Bases , Criança , Exoma , Feminino , Expressão Gênica , Estudo de Associação Genômica Ampla , Humanos , Deficiência Intelectual/diagnóstico , Deficiência Intelectual/fisiopatologia , Masculino , Pessoa de Meia-Idade , Linhagem , Fenótipo , Análise de Sequência de DNA , Índice de Gravidade de DoençaRESUMO
UNLABELLED: Taphrina deformans is a fungus responsible for peach leaf curl, an important plant disease. It is phylogenetically assigned to the Taphrinomycotina subphylum, which includes the fission yeast and the mammalian pathogens of the genus Pneumocystis. We describe here the genome of T. deformans in the light of its dual plant-saprophytic/plant-parasitic lifestyle. The 13.3-Mb genome contains few identifiable repeated elements (ca. 1.5%) and a relatively high GC content (49.5%). A total of 5,735 protein-coding genes were identified, among which 83% share similarities with other fungi. Adaptation to the plant host seems reflected in the genome, since the genome carries genes involved in plant cell wall degradation (e.g., cellulases and cutinases), secondary metabolism, the hallmark glyoxylate cycle, detoxification, and sterol biosynthesis, as well as genes involved in the biosynthesis of plant hormones. Genes involved in lipid metabolism may play a role in its virulence. Several locus candidates for putative MAT cassettes and sex-related genes akin to those of Schizosaccharomyces pombe were identified. A mating-type-switching mechanism similar to that found in ascomycetous yeasts could be in effect. Taken together, the findings are consistent with the alternate saprophytic and parasitic-pathogenic lifestyles of T. deformans. IMPORTANCE: Peach leaf curl is an important plant disease which causes significant losses of fruit production. We report here the genome sequence of the causative agent of the disease, the fungus Taphrina deformans. The genome carries characteristic genes that are important for the plant infection process. These include (i) proteases that allow degradation of the plant tissues; (ii) secondary metabolites which are products favoring interaction of the fungus with the environment, including the host; (iii) hormones that are responsible for the symptom of severely distorted leaves on the host; and (iv) drug detoxification enzymes that confer resistance to fungicides. The availability of the genome allows the design of new drug targets as well as the elaboration of specific management strategies to fight the disease.