Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 143
Filter
Add more filters

Publication year range
1.
Cell ; 158(1): 213-25, 2014 Jul 03.
Article in English | MEDLINE | ID: mdl-24995987

ABSTRACT

The availability of diverse genomes makes it possible to predict gene function based on shared evolutionary history. This approach can be challenging, however, for pathways whose components do not exhibit a shared history but rather consist of distinct "evolutionary modules." We introduce a computational algorithm, clustering by inferred models of evolution (CLIME), which inputs a eukaryotic species tree, homology matrix, and pathway (gene set) of interest. CLIME partitions the gene set into disjoint evolutionary modules, simultaneously learning the number of modules and a tree-based evolutionary history that defines each module. CLIME then expands each module by scanning the genome for new components that likely arose under the inferred evolutionary model. Application of CLIME to ∼1,000 annotated human pathways and to the proteomes of yeast, red algae, and malaria reveals unanticipated evolutionary modularity and coevolving components. CLIME is freely available and should become increasingly powerful with the growing wealth of eukaryotic genomes.


Subject(s)
Algorithms , Cluster Analysis , Phylogeny , Humans , Mitochondria/metabolism , Plasmodium falciparum/genetics , Plasmodium falciparum/metabolism , Proteome/analysis , Rhodophyta/genetics , Rhodophyta/metabolism , Signal Transduction , Yeasts/genetics , Yeasts/metabolism
2.
PLoS Comput Biol ; 20(4): e1011995, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38656999

ABSTRACT

Genomes contain conserved non-coding sequences that perform important biological functions, such as gene regulation. We present a phylogenetic method, PhyloAcc-C, that associates nucleotide substitution rates with changes in a continuous trait of interest. The method takes as input a multiple sequence alignment of conserved elements, continuous trait data observed in extant species, and a background phylogeny and substitution process. Gibbs sampling is used to assign rate categories (background, conserved, accelerated) to lineages and explore whether the assigned rate categories are associated with increases or decreases in the rate of trait evolution. We test our method using simulations and then illustrate its application using mammalian body size and lifespan data previously analyzed with respect to protein coding genes. Like other studies, we find processes such as tumor suppression, telomere maintenance, and p53 regulation to be related to changes in longevity and body size. In addition, we also find that skeletal genes, and developmental processes, such as sprouting angiogenesis, are relevant.


Subject(s)
Evolution, Molecular , Models, Genetic , Phylogeny , Animals , Longevity/genetics , Humans , Computational Biology/methods , Computer Simulation , Body Size/genetics , Nucleotides/genetics , Sequence Alignment/methods
3.
Mol Biol Evol ; 40(9)2023 09 01.
Article in English | MEDLINE | ID: mdl-37665177

ABSTRACT

An important goal of evolutionary genomics is to identify genomic regions whose substitution rates differ among lineages. For example, genomic regions experiencing accelerated molecular evolution in some lineages may provide insight into links between genotype and phenotype. Several comparative genomics methods have been developed to identify genomic accelerations between species, including a Bayesian method called PhyloAcc, which models shifts in substitution rate in multiple target lineages on a phylogeny. However, few methods consider the possibility of discordance between the trees of individual loci and the species tree due to incomplete lineage sorting, which might cause false positives. Here, we present PhyloAcc-GT, which extends PhyloAcc by modeling gene tree heterogeneity. Given a species tree, we adopt the multispecies coalescent model as the prior distribution of gene trees, use Markov chain Monte Carlo (MCMC) for inference, and design novel MCMC moves to sample gene trees efficiently. Through extensive simulations, we show that PhyloAcc-GT outperforms PhyloAcc and other methods in identifying target lineage-specific accelerations and detecting complex patterns of rate shifts, and is robust to specification of population size parameters. PhyloAcc-GT is usually more conservative than PhyloAcc in calling convergent rate shifts because it identifies more accelerations on ancestral than on terminal branches. We apply PhyloAcc-GT to two examples of convergent evolution: flightlessness in ratites and marine mammal adaptations, and show that PhyloAcc-GT is a robust tool to identify shifts in substitution rate associated with specific target lineages while accounting for incomplete lineage sorting.


Subject(s)
Biological Evolution , Models, Genetic , Animals , Bayes Theorem , Phylogeny , Genomics , Mammals
4.
Proc Natl Acad Sci U S A ; 118(13)2021 03 30.
Article in English | MEDLINE | ID: mdl-33766915

ABSTRACT

Microglial-derived inflammation has been linked to a broad range of neurodegenerative and neuropsychiatric conditions, including amyotrophic lateral sclerosis (ALS). Using single-cell RNA sequencing, a class of Disease-Associated Microglia (DAMs) have been characterized in neurodegeneration. However, the DAM phenotype alone is insufficient to explain the functional complexity of microglia, particularly with regard to regulating inflammation that is a hallmark of many neurodegenerative diseases. Here, we identify a subclass of microglia in mouse models of ALS which we term RIPK1-Regulated Inflammatory Microglia (RRIMs). RRIMs show significant up-regulation of classical proinflammatory pathways, including increased levels of Tnf and Il1b RNA and protein. We find that RRIMs are highly regulated by TNFα signaling and that the prevalence of these microglia can be suppressed by inhibiting receptor-interacting protein kinase 1 (RIPK1) activity downstream of the TNF receptor 1. These findings help to elucidate a mechanism by which RIPK1 kinase inhibition has been shown to provide therapeutic benefit in mouse models of ALS and may provide an additional biomarker for analysis in ongoing phase 2 clinical trials of RIPK1 inhibitors in ALS.


Subject(s)
Amyotrophic Lateral Sclerosis/enzymology , Inflammation/enzymology , Microglia/enzymology , Receptor-Interacting Protein Serine-Threonine Kinases/metabolism , Amyotrophic Lateral Sclerosis/genetics , Amyotrophic Lateral Sclerosis/pathology , Animals , Cell Cycle Proteins/genetics , Disease Models, Animal , Interleukin-1beta/metabolism , Membrane Transport Proteins/genetics , Mice , Mice, Mutant Strains , Microglia/pathology , Receptor-Interacting Protein Serine-Threonine Kinases/antagonists & inhibitors , Receptor-Interacting Protein Serine-Threonine Kinases/genetics , Single-Cell Analysis , Superoxide Dismutase-1/genetics , Transcriptome , Tumor Necrosis Factor-alpha/metabolism
5.
Brief Bioinform ; 22(3)2021 05 20.
Article in English | MEDLINE | ID: mdl-34020537

ABSTRACT

Deciphering microRNA (miRNA) targets is important for understanding the function of miRNAs as well as miRNA-based diagnostics and therapeutics. Given the highly cell-specific nature of miRNA regulation, recent computational approaches typically exploit expression data to identify the most physiologically relevant target messenger RNAs (mRNAs). Although effective, those methods usually require a large sample size to infer miRNA-mRNA interactions, thus limiting their applications in personalized medicine. In this study, we developed a novel miRNA target prediction algorithm called miRACLe (miRNA Analysis by a Contact modeL). It integrates sequence characteristics and RNA expression profiles into a random contact model, and determines the target preferences by relative probability of effective contacts in an individual-specific manner. Evaluation by a variety of measures shows that fitting TargetScan, a frequently used prediction tool, into the framework of miRACLe can improve its predictive power with a significant margin and consistently outperform other state-of-the-art methods in prediction accuracy, regulatory potential and biological relevance. Notably, the superiority of miRACLe is robust to various biological contexts, types of expression data and validation datasets, and the computation process is fast and efficient. Additionally, we show that the model can be readily applied to other sequence-based algorithms to improve their predictive power, such as DIANA-microT-CDS, miRanda-mirSVR and MirTarget4. MiRACLe is publicly available at https://github.com/PANWANG2014/miRACLe.


Subject(s)
Databases, Nucleic Acid , Gene Expression Regulation , MicroRNAs , Models, Genetic , Transcriptome , HeLa Cells , Humans , MicroRNAs/biosynthesis , MicroRNAs/genetics
6.
Bioinformatics ; 38(7): 1938-1946, 2022 03 28.
Article in English | MEDLINE | ID: mdl-35020805

ABSTRACT

MOTIVATION: Polygenic risk score (PRS) has been widely exploited for genetic risk prediction due to its accuracy and conceptual simplicity. We introduce a unified Bayesian regression framework, NeuPred, for PRS construction, which accommodates varying genetic architectures and improves overall prediction accuracy for complex diseases by allowing for a wide class of prior choices. To take full advantage of the framework, we propose a summary-statistics-based cross-validation strategy to automatically select suitable chromosome-level priors, which demonstrates a striking variability of the prior preference of each chromosome, for the same complex disease, and further significantly improves the prediction accuracy. RESULTS: Simulation studies and real data applications with seven disease datasets from the Wellcome Trust Case Control Consortium cohort and eight groups of large-scale genome-wide association studies demonstrate that NeuPred achieves substantial and consistent improvements in terms of predictive r2 over existing methods. In addition, NeuPred has similar or advantageous computational efficiency compared with the state-of-the-art Bayesian methods. AVAILABILITY AND IMPLEMENTATION: The R package implementing NeuPred is available at https://github.com/shuangsong0110/NeuPred. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Genome-Wide Association Study , Multifactorial Inheritance , Humans , Bayes Theorem , Genome-Wide Association Study/methods , Computer Simulation , Case-Control Studies
7.
Bioinformatics ; 37(24): 4737-4743, 2021 12 11.
Article in English | MEDLINE | ID: mdl-34260700

ABSTRACT

MOTIVATION: Identification and interpretation of non-coding variations that affect disease risk remain a paramount challenge in genome-wide association studies (GWAS) of complex diseases. Experimental efforts have provided comprehensive annotations of functional elements in the human genome. On the other hand, advances in computational biology, especially machine learning approaches, have facilitated accurate predictions of cell-type-specific functional annotations. Integrating functional annotations with GWAS signals has advanced the understanding of disease mechanisms. In previous studies, functional annotations were treated as static of a genomic region, ignoring potential functional differences imposed by different genotypes across individuals. RESULTS: We develop a computational approach, Openness Weighted Association Studies (OWAS), to leverage and aggregate predictions of chromosome accessibility in personal genomes for prioritizing GWAS signals. The approach relies on an analytical expression we derived for identifying disease associated genomic segments whose effects in the etiology of complex diseases are evaluated. In extensive simulations and real data analysis, OWAS identifies genes/segments that explain more heritability than existing methods, and has a better replication rate in independent cohorts than GWAS. Moreover, the identified genes/segments show tissue-specific patterns and are enriched in disease relevant pathways. We use rheumatic arthritis and asthma as examples to demonstrate how OWAS can be exploited to provide novel insights on complex diseases. AVAILABILITY AND IMPLEMENTATION: The R package OWAS that implements our method is available at https://github.com/shuangsong0110/OWAS. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Genome-Wide Association Study , Software , Humans , Genome-Wide Association Study/methods , Genotype , Genomics , Computational Biology
8.
Mol Biol Evol ; 36(5): 1086-1100, 2019 05 01.
Article in English | MEDLINE | ID: mdl-30851112

ABSTRACT

Conservation of DNA sequence over evolutionary time is a strong indicator of function, and gain or loss of sequence conservation can be used to infer changes in function across a phylogeny. Changes in evolutionary rates on particular lineages in a phylogeny can indicate shared functional shifts, and thus can be used to detect genomic correlates of phenotypic convergence. However, existing methods do not allow easy detection of patterns of rate variation, which causes challenges for detecting convergent rate shifts or other complex evolutionary scenarios. Here we introduce PhyloAcc, a new Bayesian method to model substitution rate changes in conserved elements across a phylogeny. The method assumes several categories of substitution rate for each branch on the phylogenetic tree, estimates substitution rates per category, and detects changes of substitution rate as the posterior probability of a category switch. Simulations show that PhyloAcc can detect genomic regions with rate shifts in multiple target species better than previous methods and has a higher accuracy of reconstructing complex patterns of substitution rate changes than prevalent Bayesian relaxed clock models. We demonstrate the utility of PhyloAcc in two classic examples of convergent phenotypes: loss of flight in birds and the transition to marine life in mammals. In each case, our approach reveals numerous examples of conserved nonexonic elements with accelerations specific to the phenotypically convergent lineages. Our method is widely applicable to any set of conserved elements where multiple rate changes are expected on a phylogeny.


Subject(s)
Evolution, Molecular , Genetic Techniques , Models, Genetic , Phylogeny , Animals , Bayes Theorem , Birds/genetics , Computer Simulation , Mammals/genetics , Software
9.
Entropy (Basel) ; 22(3)2020 Mar 02.
Article in English | MEDLINE | ID: mdl-33286064

ABSTRACT

Traditional hypothesis-margin researches focus on obtaining large margins and feature selection. In this work, we show that the robustness of margins is also critical and can be measured using entropy. In addition, our approach provides clear mathematical formulations and explanations to uncover feature interactions, which is often lack in large hypothesis-margin based approaches. We design an algorithm, termed IMMIGRATE (Iterative max-min entropy margin-maximization with interaction terms), for training the weights associated with the interaction terms. IMMIGRATE simultaneously utilizes both local and global information and can be used as a base learner in Boosting. We evaluate IMMIGRATE in a wide range of tasks, in which it demonstrates exceptional robustness and achieves the state-of-the-art results with high interpretability.

10.
Nucleic Acids Res ; 45(10): 5653-5665, 2017 Jun 02.
Article in English | MEDLINE | ID: mdl-28472449

ABSTRACT

Competing endogenous RNAs (ceRNAs) are RNA molecules that sequester shared microRNAs (miRNAs) thereby affecting the expression of other targets of the miRNAs. Whether genetic variants in ceRNA can affect its biological function and disease development is still an open question. Here we identified a large number of genetic variants that are associated with ceRNA's function using Geuvaids RNA-seq data for 462 individuals from the 1000 Genomes Project. We call these loci competing endogenous RNA expression quantitative trait loci or 'cerQTL', and found that a large number of them were unexplored in conventional eQTL mapping. We identified many cerQTLs that have undergone recent positive selection in different human populations, and showed that single nucleotide polymorphisms in gene 3΄UTRs at the miRNA seed binding regions can simultaneously regulate gene expression changes in both cis and trans by the ceRNA mechanism. We also discovered that cerQTLs are significantly enriched in traits/diseases associated variants reported from genome-wide association studies in the miRNA binding sites, suggesting that disease susceptibilities could be attributed to ceRNA regulation. Further in vitro functional experiments demonstrated that a cerQTL rs11540855 can regulate ceRNA function. These results provide a comprehensive catalog of functional non-coding regulatory variants that may be responsible for ceRNA crosstalk at the post-transcriptional level.


Subject(s)
Gene Expression Regulation , Gene Regulatory Networks , Genome, Human , MicroRNAs/genetics , Quantitative Trait Loci , RNA, Untranslated/genetics , 3' Untranslated Regions , Base Pairing , Binding Sites , Chromosome Mapping , Genome-Wide Association Study , Humans , MicroRNAs/metabolism , Polymorphism, Single Nucleotide , RNA, Untranslated/metabolism
11.
Proc Natl Acad Sci U S A ; 113(22): 6154-9, 2016 May 31.
Article in English | MEDLINE | ID: mdl-27185919

ABSTRACT

With the growing availability of digitized text data both publicly and privately, there is a great need for effective computational tools to automatically extract information from texts. Because the Chinese language differs most significantly from alphabet-based languages in not specifying word boundaries, most existing Chinese text-mining methods require a prespecified vocabulary and/or a large relevant training corpus, which may not be available in some applications. We introduce an unsupervised method, top-down word discovery and segmentation (TopWORDS), for simultaneously discovering and segmenting words and phrases from large volumes of unstructured Chinese texts, and propose ways to order discovered words and conduct higher-level context analyses. TopWORDS is particularly useful for mining online and domain-specific texts where the underlying vocabulary is unknown or the texts of interest differ significantly from available training corpora. When outputs from TopWORDS are fed into context analysis tools such as topic modeling, word embedding, and association pattern finding, the results are as good as or better than that from using outputs of a supervised segmentation method.

12.
Genome Res ; 25(8): 1147-57, 2015 Aug.
Article in English | MEDLINE | ID: mdl-26063738

ABSTRACT

The CRISPR/Cas9 system has revolutionized mammalian somatic cell genetics. Genome-wide functional screens using CRISPR/Cas9-mediated knockout or dCas9 fusion-mediated inhibition/activation (CRISPRi/a) are powerful techniques for discovering phenotype-associated gene function. We systematically assessed the DNA sequence features that contribute to single guide RNA (sgRNA) efficiency in CRISPR-based screens. Leveraging the information from multiple designs, we derived a new sequence model for predicting sgRNA efficiency in CRISPR/Cas9 knockout experiments. Our model confirmed known features and suggested new features including a preference for cytosine at the cleavage site. The model was experimentally validated for sgRNA-mediated mutation rate and protein knockout efficiency. Tested on independent data sets, the model achieved significant results in both positive and negative selection conditions and outperformed existing models. We also found that the sequence preference for CRISPRi/a is substantially different from that for CRISPR/Cas9 knockout and propose a new model for predicting sgRNA efficiency in CRISPRi/a experiments. These results facilitate the genome-wide design of improved sgRNA for both knockout and CRISPRi/a studies.


Subject(s)
Clustered Regularly Interspaced Short Palindromic Repeats , Computational Biology/methods , RNA, Guide, Kinetoplastida/metabolism , DNA/analysis , Gene Knockout Techniques , HL-60 Cells , Humans , Models, Genetic , Mutation Rate
14.
PLoS Comput Biol ; 13(7): e1005653, 2017 Jul.
Article in English | MEDLINE | ID: mdl-28719601

ABSTRACT

In recent years, there has been a huge rise in the number of publicly available transcriptional profiling datasets. These massive compendia comprise billions of measurements and provide a special opportunity to predict the function of unstudied genes based on co-expression to well-studied pathways. Such analyses can be very challenging, however, since biological pathways are modular and may exhibit co-expression only in specific contexts. To overcome these challenges we introduce CLIC, CLustering by Inferred Co-expression. CLIC accepts as input a pathway consisting of two or more genes. It then uses a Bayesian partition model to simultaneously partition the input gene set into coherent co-expressed modules (CEMs), while assigning the posterior probability for each dataset in support of each CEM. CLIC then expands each CEM by scanning the transcriptome for additional co-expressed genes, quantified by an integrated log-likelihood ratio (LLR) score weighted for each dataset. As a byproduct, CLIC automatically learns the conditions (datasets) within which a CEM is operative. We implemented CLIC using a compendium of 1774 mouse microarray datasets (28628 microarrays) or 1887 human microarray datasets (45158 microarrays). CLIC analysis reveals that of 910 canonical biological pathways, 30% consist of strongly co-expressed gene modules for which new members are predicted. For example, CLIC predicts a functional connection between protein C7orf55 (FMC1) and the mitochondrial ATP synthase complex that we have experimentally validated. CLIC is freely available at www.gene-clic.org. We anticipate that CLIC will be valuable both for revealing new components of biological pathways as well as the conditions in which they are active.


Subject(s)
Databases, Factual , Gene Expression Profiling/methods , Genomics/methods , Models, Biological , Software , Transcriptome , Algorithms , Cluster Analysis , Gene Regulatory Networks , Humans , Signal Transduction
15.
Nature ; 485(7398): 376-80, 2012 Apr 11.
Article in English | MEDLINE | ID: mdl-22495300

ABSTRACT

The spatial organization of the genome is intimately linked to its biological function, yet our understanding of higher order genomic structure is coarse, fragmented and incomplete. In the nucleus of eukaryotic cells, interphase chromosomes occupy distinct chromosome territories, and numerous models have been proposed for how chromosomes fold within chromosome territories. These models, however, provide only few mechanistic details about the relationship between higher order chromatin structure and genome function. Recent advances in genomic technologies have led to rapid advances in the study of three-dimensional genome organization. In particular, Hi-C has been introduced as a method for identifying higher order chromatin interactions genome wide. Here we investigate the three-dimensional organization of the human and mouse genomes in embryonic stem cells and terminally differentiated cell types at unprecedented resolution. We identify large, megabase-sized local chromatin interaction domains, which we term 'topological domains', as a pervasive structural feature of the genome organization. These domains correlate with regions of the genome that constrain the spread of heterochromatin. The domains are stable across different cell types and highly conserved across species, indicating that topological domains are an inherent property of mammalian genomes. Finally, we find that the boundaries of topological domains are enriched for the insulator binding protein CTCF, housekeeping genes, transfer RNAs and short interspersed element (SINE) retrotransposons, indicating that these factors may have a role in establishing the topological domain structure of the genome.


Subject(s)
Chromatin/genetics , Chromatin/metabolism , Genome , Animals , Binding Sites , CCCTC-Binding Factor , Cell Differentiation , Chromatin/chemistry , Chromosomes/chemistry , Chromosomes/genetics , Chromosomes/metabolism , Embryonic Stem Cells/metabolism , Evolution, Molecular , Female , Genes, Essential/genetics , Heterochromatin/chemistry , Heterochromatin/genetics , Heterochromatin/metabolism , Humans , Male , Mammals/genetics , Mice , RNA, Transfer/genetics , Repressor Proteins/metabolism , Short Interspersed Nucleotide Elements/genetics
16.
Proc Natl Acad Sci U S A ; 112(25): 7731-6, 2015 Jun 23.
Article in English | MEDLINE | ID: mdl-26056275

ABSTRACT

Despite the rapid accumulation of tumor-profiling data and transcription factor (TF) ChIP-seq profiles, efforts integrating TF binding with the tumor-profiling data to understand how TFs regulate tumor gene expression are still limited. To systematically search for cancer-associated TFs, we comprehensively integrated 686 ENCODE ChIP-seq profiles representing 150 TFs with 7484 TCGA tumor data in 18 cancer types. For efficient and accurate inference on gene regulatory rules across a large number and variety of datasets, we developed an algorithm, RABIT (regression analysis with background integration). In each tumor sample, RABIT tests whether the TF target genes from ChIP-seq show strong differential regulation after controlling for background effect from copy number alteration and DNA methylation. When multiple ChIP-seq profiles are available for a TF, RABIT prioritizes the most relevant ChIP-seq profile in each tumor. In each cancer type, RABIT further tests whether the TF expression and somatic mutation variations are correlated with differential expression patterns of its target genes across tumors. Our predicted TF impact on tumor gene expression is highly consistent with the knowledge from cancer-related gene databases and reveals many previously unidentified aspects of transcriptional regulation in tumor progression. We also applied RABIT on RNA-binding protein motifs and found that some alternative splicing factors could affect tumor-specific gene expression by binding to target gene 3'UTR regions. Thus, RABIT (rabit.dfci.harvard.edu) is a general platform for predicting the oncogenic role of gene expression regulators.


Subject(s)
Gene Expression Regulation, Neoplastic , Neoplasms/genetics , Transcription, Genetic , Humans
17.
Proteins ; 85(8): 1402-1412, 2017 Aug.
Article in English | MEDLINE | ID: mdl-28378911

ABSTRACT

In the prediction of protein structure from amino acid sequence, loops are challenging regions for computational methods. Since loops are often located on the protein surface, they can have significant roles in determining protein functions and binding properties. Loop prediction without the aid of a structural template requires extensive conformational sampling and energy minimization, which are computationally difficult. In this article we present a new de novo loop sampling method, the Parallely filtered Energy Targeted All-atom Loop Sampler (PETALS) to rapidly locate low energy conformations. PETALS explores both backbone and side-chain positions of the loop region simultaneously according to the energy function selected by the user, and constructs a nonredundant ensemble of low energy loop conformations using filtering criteria. The method is illustrated with the DFIRE potential and DiSGro energy function for loops, and shown to be highly effective at discovering conformations with near-native (or better) energy. Using the same energy function as the DiSGro algorithm, PETALS samples conformations with both lower RMSDs and lower energies. PETALS is also useful for assessing the accuracy of different energy functions. PETALS runs rapidly, requiring an average time cost of 10 minutes for a length 12 loop on a single 3.2 GHz processor core, comparable to the fastest existing de novo methods for generating an ensemble of conformations. Proteins 2017; 85:1402-1412. © 2017 Wiley Periodicals, Inc.


Subject(s)
Algorithms , Amino Acids/chemistry , Computational Biology/methods , Proteins/chemistry , Amino Acid Sequence , Computer Simulation , Models, Molecular , Protein Conformation, alpha-Helical , Protein Interaction Domains and Motifs , Thermodynamics
18.
Bioinformatics ; 32(18): 2729-36, 2016 09 15.
Article in English | MEDLINE | ID: mdl-27273672

ABSTRACT

MOTIVATION: Prediction and prioritization of human non-coding regulatory variants is critical for understanding the regulatory mechanisms of disease pathogenesis and promoting personalized medicine. Existing tools utilize functional genomics data and evolutionary information to evaluate the pathogenicity or regulatory functions of non-coding variants. However, different algorithms lead to inconsistent and even conflicting predictions. Combining multiple methods may increase accuracy in regulatory variant prediction. RESULTS: Here, we compiled an integrative resource for predictions from eight different tools on functional annotation of non-coding variants. We further developed a composite strategy to integrate multiple predictions and computed the composite likelihood of a given variant being regulatory variant. Benchmarked by multiple independent causal variants datasets, we demonstrated that our composite model significantly improves the prediction performance. AVAILABILITY AND IMPLEMENTATION: We implemented our model and scoring procedure as a tool, named PRVCS, which is freely available to academic and non-profit usage at http://jjwanglab.org/PRVCS CONTACT: wang.junwen@mayo.edu, jliu@stat.harvard.edu, or limx54@gmail.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Algorithms , Models, Theoretical , Molecular Sequence Annotation , Software , Biological Evolution , Genetic Variation , Humans , RNA, Untranslated
19.
Hum Genomics ; 10 Suppl 2: 22, 2016 07 25.
Article in English | MEDLINE | ID: mdl-27461247

ABSTRACT

BACKGROUND: Snail is a typical transcription factor that could induce epithelial-mesenchymal transition (EMT) and cancer progression. There are some related reports about the clinical significance of snail protein expression in gastric cancer. However, the published results were not completely consistent. This study was aimed to investigate snail expression and clinical significance in gastric cancer. RESULTS: A systematic review of PubMed, CNKI, Weipu, and Wanfang database before March 2015 was conducted. We established an inclusion criterion according to subjects, method of detection, and results evaluation of snail protein. Meta-analysis was conducted using RevMan4.2 software. And merged odds ratio (OR) and 95 % CI (95 % confidence interval) were calculated. Also, forest plots and funnel plot were used to assess the potential of publication bias. A total of 10 studies were recruited. The meta-analysis was conducted to evaluate the positive rate of snail protein expression. OR and 95 % CI for different groups were listed below: (1) gastric cancer and para-carcinoma tissue [OR = 6.15, 95 % CI (4.70, 8.05)]; (2) gastric cancer and normal gastric tissue [OR = 17.00, 95 % CI (10.08, 28.67)]; (3) non-lymph node metastasis and lymph node metastasis [OR = 0.40, 95 % CI (0.18, 0.93)]; (4) poor differentiated cancer, highly differentiated cancer, and moderate cancer [OR = 3.34, 95 % CI (2.22, 5.03)]; (5) clinical stage TI + TII and stage TIII + TIV [OR = 0.38, 95 % CI (0.23, 0.60)]; (6) superficial muscularis and deep muscularis [OR = 0.18, 95 % CI (0.11, 0.31)]. CONCLUSIONS: Our results indicated that the increase of snail protein expression may play an important role in the carcinogenesis, progression, and metastasis of gastric cancer. And this result might provide instruction for the diagnosis, therapy, and prognosis of gastric cancer.


Subject(s)
Gastric Mucosa/metabolism , Gene Expression Regulation, Neoplastic , Snail Family Transcription Factors/genetics , Stomach Neoplasms/genetics , Gene Regulatory Networks , Humans , Lymphatic Metastasis , Neoplasm Invasiveness , Neoplasm Staging , Odds Ratio , Prognosis , Signal Transduction/genetics , Snail Family Transcription Factors/metabolism , Stomach/pathology , Stomach Neoplasms/diagnosis , Stomach Neoplasms/metabolism
20.
Bioinformatics ; 31(11): 1842-4, 2015 Jun 01.
Article in English | MEDLINE | ID: mdl-25609796

ABSTRACT

UNLABELLED: Many statistical problems in bioinformatics and genetics can be formulated as the testing of associations between a categorical variable and a continuous variable. A dynamic slicing method was proposed for non-parametric dependence testing, which has been demonstrated to have higher powers compared with traditional methods such as Kolmogorov-Smirnov test. We introduce an R package dslice to facilitate the use of dynamic slicing method in bioinformatic applications such as quantitative trait loci study and gene set enrichment analysis. AVAILABILITY AND IMPLEMENTATION: dslice is implemented in Rcpp and available in the Comprehensive R Archive Network. The package is distributed under the GNU General Public License (version 2 or later).


Subject(s)
Gene Expression , Quantitative Trait Loci , Software , Animals , Computational Biology/methods , Genetic Variation , Humans , Mice , Phenotype , Statistics, Nonparametric
SELECTION OF CITATIONS
SEARCH DETAIL