Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 12 de 12
Filter
Add more filters










Publication year range
1.
Oncogenesis ; 3: e87, 2014 Feb 10.
Article in English | MEDLINE | ID: mdl-24513630

ABSTRACT

DICER1 is a critical gene in the biogenesis of mature microRNAs, short non-coding RNAs that derive from either -3p or -5p precursor microRNA strands. Germline mutations of DICER1 are associated with a range of human malignancies, including pleuropulmonary blastoma (PPB). Additional somatic 'hotspot' mutations in the microRNA processing ribonuclease IIIb (RNase IIIb) domain of DICER1 are reported in cancer, and which affect microRNA biogenesis, resulting in a -3p mature microRNA strand bias. Here, in a germline (exon11 c.1806_1810insATTGA) DICER1-mutated PPB, we first confirmed the presence of an additional somatic RNase IIIb hotspot mutation (exon25 c.5425G>A [p.G1809R]) by conventional sequencing. Second, we investigated serum levels of mature microRNAs at the time of PPB diagnosis, and compared the findings with serum results from a comprehensive range of pediatric cancer patients and controls (n=52). We identified a panel of 45 microRNAs that were present at elevated levels in the serum at the time of PPB diagnosis, with a significant majority noted be derived from the -3p strand (P=0.013). In addition, we identified a subset of 10 serum microRNAs (namely miR-125a-3p, miR-125b-2-3p, miR-380-5p, miR-125b-1-3p, let-7f-2-3p, let-7a-3p, let-7b-3p, miR-708-3p, miR-138-1-3p and miR-532-3p) that were most abundant in the PPB case. Serum levels of two representative microRNAs, miR-125a-3p and miR-125b-2-3p, were not elevated in DICER1 germline-mutated relatives. In the PPB case, serum levels of miR-125a-3p and miR-125b-2-3p increased before chemotherapy, and then showed an early reduction following treatment. These microRNAs may offer future utility as serum biomarkers for screening patients with known germline DICER1 mutations for early detection of PPB, and for potential disease-monitoring in cases with confirmed PPB.

2.
Br J Cancer ; 108(2): 450-60, 2013 Feb 05.
Article in English | MEDLINE | ID: mdl-23299538

ABSTRACT

BACKGROUND: When designing therapeutic short-interfering RNAs (siRNAs), off-target effects (OTEs) are usually predicted by computational quantification of messenger RNAs (mRNAs) that contain matches to the siRNA seed sequence in their 3' UTRs. It is assumed that the higher the number of predicted transcriptional OTEs, the greater the size of the actual OTE signature and the more detrimental the phenotypic consequences in target-negative cells. METHODS: We tested this general assumption by investigating the OTEs of potential therapeutic siRNAs targeting the human papillomavirus (HPV) type-16 E7 oncogene. We studied HPV-negative squamous epithelial cells, from normal cervix (NCx) and skin (HaCaT), which would be vulnerable to 'bystander' OTEs following transfection in vivo. RESULTS: We observed no correlation between the number of computationally predicted OTEs and the actual number of seed-dependent OTEs (P=0.76). On average only 20.5% of actual transcriptional OTEs were seed-dependent (i.e., predicted). The unpredicted OTEs included stimulation of innate immune pathways, as well as indirect (downstream) effects of other OTEs, which affected important cancer-associated pathways. Although most significant OTEs observed were seen in both NCx and HaCaT cells, only 0-5.9% of differentially expressed genes overlapped between the two cell types. CONCLUSION: These data do not support the assumption that actual OTEs correlate well with predicted OTEs.


Subject(s)
Human papillomavirus 16/genetics , Papillomavirus E7 Proteins/genetics , Uterine Cervical Neoplasms/virology , Carcinoma, Squamous Cell/genetics , Carcinoma, Squamous Cell/virology , Cell Line, Tumor , Cervix Uteri/cytology , Epithelial Cells/virology , Female , Humans , RNA Interference , RNA, Small Interfering , Skin/cytology , Uterine Cervical Neoplasms/genetics
3.
Nucleic Acids Res ; 30(7): 1575-84, 2002 Apr 01.
Article in English | MEDLINE | ID: mdl-11917018

ABSTRACT

Detection of protein families in large databases is one of the principal research objectives in structural and functional genomics. Protein family classification can significantly contribute to the delineation of functional diversity of homologous proteins, the prediction of function based on domain architecture or the presence of sequence motifs as well as comparative genomics, providing valuable evolutionary insights. We present a novel approach called TRIBE-MCL for rapid and accurate clustering of protein sequences into families. The method relies on the Markov cluster (MCL) algorithm for the assignment of proteins into families based on precomputed sequence similarity information. This novel approach does not suffer from the problems that normally hinder other protein sequence clustering algorithms, such as the presence of multi-domain proteins, promiscuous domains and fragmented proteins. The method has been rigorously tested and validated on a number of very large databases, including SwissProt, InterPro, SCOP and the draft human genome. Our results indicate that the method is ideally suited to the rapid and accurate detection of protein families on a large scale. The method has been used to detect and categorise protein families within the draft human genome and the resulting families have been used to annotate a large proportion of human proteins.


Subject(s)
Algorithms , Databases, Protein , Proteins/genetics , Amino Acid Sequence , Genome, Human , Humans , Internet , Molecular Sequence Data , Sequence Alignment , Sequence Homology, Amino Acid , Transcription Factor TFIIB , Transcription Factors/genetics
4.
Bioinformatics ; 17(9): 853-4, 2001 Sep.
Article in English | MEDLINE | ID: mdl-11590107

ABSTRACT

UNLABELLED: Graph layout is extensively used in the field of mathematics and computer science, however these ideas and methods have not been extended in a general fashion to the construction of graphs for biological data. To this end, we have implemented a version of the Fruchterman Rheingold graph layout algorithm, extensively modified for the purpose of similarity analysis in biology. This algorithm rapidly and effectively generates clear two (2D) or three-dimensional (3D) graphs representing similarity relationships such as protein sequence similarity. The implementation of the algorithm is general and applicable to most types of similarity information for biological data. AVAILABILITY: BioLayout is available for most UNIX platforms at the following web-site: http://www.ebi.ac.uk/research/cgg/services/layout.


Subject(s)
Algorithms , Computer Graphics , Amino Acid Sequence , Computer Graphics/statistics & numerical data , Computer Graphics/trends , Databases, Protein/statistics & numerical data , Databases, Protein/trends , Image Processing, Computer-Assisted/statistics & numerical data , Image Processing, Computer-Assisted/trends , Imaging, Three-Dimensional/statistics & numerical data , Imaging, Three-Dimensional/trends , Software/statistics & numerical data , Software/trends
5.
Pac Symp Biocomput ; : 384-95, 2001.
Article in English | MEDLINE | ID: mdl-11262957

ABSTRACT

We present an algorithm for large-scale document clustering of biological text, obtained from Medline abstracts. The algorithm is based on statistical treatment of terms, stemming, the idea of a 'go-list', unsupervised machine learning and graph layout optimization. The method is flexible and robust, controlled by a small number of parameter values. Experiments show that the resulting document clusters are meaningful as assessed by cluster-specific terms. Despite the statistical nature of the approach, with minimal semantic analysis, the terms provide a shallow description of the document corpus and support concept discovery.


Subject(s)
Abstracting and Indexing , Algorithms , MEDLINE , Molecular Biology , Animals , Artificial Intelligence , Cluster Analysis , Drosophila/embryology , Terminology as Topic
6.
Bioinformatics ; 17(1): 95-7, 2001 Jan.
Article in English | MEDLINE | ID: mdl-11222266

ABSTRACT

The mechanisms controlling gene regulation appear to be fundamentally different in eukaryotes and prokaryotes (Struhl (1999) CELL, 98, 1-4). To investigate this diversity further, we have analysed the distribution of all known transcription-associated proteins (TAPs), as reflected by sequence database annotations. Our results for the primary phylogenetic domains (Archaea, Bacteria and Eukaryota) show that TAP families are mostly taxon-specific and very few transcriptional regulators are common across these domains.


Subject(s)
Computational Biology , Proteins/genetics , Databases, Factual , Phylogeny , Proteins/classification , Transcription Factors/classification , Transcription Factors/genetics , Transcription, Genetic
7.
RNA ; 7(12): 1693-701, 2001 Dec.
Article in English | MEDLINE | ID: mdl-11780626

ABSTRACT

Domains rich in alternating arginine and serine residues (RS domains) are frequently found in metazoan proteins involved in pre-mRNA splicing. The RS domains of splicing factors associate with each other and are important for the formation of protein-protein interactions required for both constitutive and regulated splicing. The prevalence of the RS domain in splicing factors suggests that it might serve as a useful signature for the identification of new proteins that function in pre-mRNA processing, although it remains to be determined whether RS domains also participate in other cellular functions. Using database search and sequence clustering methods, we have identified and categorized RS domain proteins encoded within the entire genomes of Homo sapiens, Drosophila melanogaster, Caenorhabditis elegans, and Saccharomyces cerevisiae. This genome-wide survey revealed a surprising complexity of RS domain proteins in metazoans with functions associated with chromatin structure, transcription by RNA polymerase II, cell cycle, and cell structure, as well as pre-mRNA processing. Also identified were RS domain proteins in S. cerevisiae with functions associated with cell structure, osmotic regulation, and cell cycle progression. The results thus demonstrate an effective strategy for the genomic mining of RS domain proteins. The identification of many new proteins using this strategy has provided a database of factors that are candidates for forming RS domain-mediated interactions associated with different steps in pre-mRNA processing, in addition to other cellular functions.


Subject(s)
Amino Acid Motifs/genetics , Computational Biology/methods , Molecular Biology/methods , Protein Structure, Tertiary/genetics , Animals , Arginine/genetics , Caenorhabditis elegans/genetics , Cell Cycle , Chromatin/metabolism , Drosophila melanogaster/genetics , Evolution, Molecular , Genome , Humans , Phosphoprotein Phosphatases , Protein Kinases , RNA Polymerase II/metabolism , RNA Processing, Post-Transcriptional , Research Design , Saccharomyces cerevisiae/genetics , Serine/genetics , Transcription, Genetic
8.
Genome Biol ; 2(9): RESEARCH0034, 2001.
Article in English | MEDLINE | ID: mdl-11820254

ABSTRACT

BACKGROUND: It has recently been shown that the detection of gene fusion events across genomes can be used for predicting functional associations of proteins, including physical interaction or complex formation. To obtain such predictions we have made an exhaustive search for gene fusion events within 24 available completely sequenced genomes. RESULTS: Each genome was used as a query against the remaining 23 complete genomes to detect gene fusion events. Using an improved, fully automatic protocol, a total of 7,224 single-domain proteins that are components of gene fusions in other genomes were detected, many of which were identified for the first time. The total number of predicted pairwise functional associations is 39,730 for all genomes. Component pairs were identified by virtue of their similarity to 2,365 multidomain composite proteins. We also show for the first time that gene fusion is a complex evolutionary process with a number of contributory factors, including paralogy, genome size and phylogenetic distance. On average, 9% of genes in a given genome appear to code for single-domain, component proteins predicted to be functionally associated. These proteins are detected by an additional 4% of genes that code for fused, composite proteins. CONCLUSIONS: These results provide an exhaustive set of functionally associated genes and also delineate the power of fusion analysis for the prediction of protein interactions.


Subject(s)
Artificial Gene Fusion , Evolution, Molecular , Genome , Proteins/genetics , Proteins/metabolism , Recombination, Genetic/genetics , Algorithms , Animals , Bacterial Proteins/genetics , Bacterial Proteins/metabolism , Caenorhabditis elegans Proteins/genetics , Caenorhabditis elegans Proteins/metabolism , Computational Biology/methods , Drosophila Proteins/genetics , Drosophila Proteins/metabolism , Fungal Proteins/genetics , Fungal Proteins/metabolism , Gene Expression Profiling , Multigene Family/genetics , Phylogeny , Protein Binding , Recombinant Fusion Proteins/genetics , Recombinant Proteins/genetics , Reproducibility of Results , Two-Hybrid System Techniques
9.
Bioinformatics ; 16(10): 915-22, 2000 Oct.
Article in English | MEDLINE | ID: mdl-11120681

ABSTRACT

MOTIVATION: Sensitive detection and masking of low-complexity regions in protein sequences. Filtered sequences can be used in sequence comparison without the risk of matching compositionally biased regions. The main advantage of the method over similar approaches is the selective masking of single residue types without affecting other, possibly important, regions. RESULTS: A novel algorithm for low-complexity region detection and selective masking. The algorithm is based on multiple-pass Smith-Waterman comparison of the query sequence against twenty homopolymers with infinite gap penalties. The output of the algorithm is both the masked query sequence for further analysis, e.g. database searches, as well as the regions of low complexity. The detection of low-complexity regions is highly specific for single residue types. It is shown that this approach is sufficient for masking database query sequences without generating false positives. The algorithm is benchmarked against widely available algorithms using the 210 genes of Plasmodium falciparum chromosome 2, a dataset known to contain a large number of low-complexity regions. AVAILABILITY: CAST (version 1.0) executable binaries are available to academic users free of charge under license. Web site entry point, server and additional material: http://www.ebi.ac.uk/research/cgg/services/cast/


Subject(s)
Algorithms , DNA, Protozoan/chemistry , Plasmodium falciparum/genetics , Sequence Analysis, DNA/methods , Animals , DNA, Protozoan/genetics , Databases, Factual , Genes, Protozoan , Open Reading Frames
10.
Bioinformatics ; 16(5): 451-7, 2000 May.
Article in English | MEDLINE | ID: mdl-10871267

ABSTRACT

MOTIVATION: Efficient, accurate and automatic clustering of large protein sequence datasets, such as complete proteomes, into families, according to sequence similarity. Detection and correction of false positive and negative relationships with subsequent detection and resolution of multi-domain proteins. RESULTS: A new algorithm for the automatic clustering of protein sequence datasets has been developed. This algorithm represents all similarity relationships within the dataset in a binary matrix. Removal of false positives is achieved through subsequent symmetrification of the matrix using a Smith-Waterman dynamic programming alignment algorithm. Detection of multi-domain protein families and further false positive relationships within the symmetrical matrix is achieved through iterative processing of matrix elements with successive rounds of Smith-Waterman dynamic programming alignments. Recursive single-linkage clustering of the corrected matrix allows efficient and accurate family representation for each protein in the dataset. Initial clusters containing multi-domain families, are split into their constituent clusters using the information obtained by the multi-domain detection step. This algorithm can hence quickly and accurately cluster large protein datasets into families. Problems due to the presence of multi-domain proteins are minimized, allowing more precise clustering information to be obtained automatically. AVAILABILITY: GeneRAGE (version 1.0) executable binaries for most platforms may be obtained from the authors on request. The system is available to academic users free of charge under license.


Subject(s)
Algorithms , Proteins/chemistry , Proteins/genetics , Sequence Alignment/methods , Amino Acid Sequence , Bacterial Proteins/chemistry , Bacterial Proteins/genetics , Cluster Analysis , Databases, Factual , Fungal Proteins/chemistry , Fungal Proteins/genetics , Genome, Bacterial , Genome, Fungal , Protein Structure, Tertiary , Sequence Alignment/statistics & numerical data
11.
Yeast ; 17(1): 22-36, 2000 Apr.
Article in English | MEDLINE | ID: mdl-10797599

ABSTRACT

BACKGROUND: Knowledge of the amount of gene order and synteny conservation between two species gives insights to the extent and mechanisms of divergence. The vertebrate Fugu rubripes (pufferfish) has a small genome with little repetitive sequence which makes it attractive as a model genome. Genome compaction and synteny conservation between human and Fugu were studied using data from public databases. METHODS: Intron length and map positions of human and Fugu orthologues were compared to analyse relative genome compaction and synteny conservation respectively. The divergence of these two genomes by genome rearrangement was simulated and the results were compared to the real data. RESULTS: Analysis of 199 introns in 22 orthologous genes showed an eight-fold average size reduction in Fugu, consistent with the ratio of total genome sizes. There was no consistent pattern relating the size reduction in individual introns or genes to gene base composition in either species. For genes that are neighbours in Fugu (genes from the same cosmid or GenBank entry), 40-50% have conserved synteny with a human chromosome. This figure may be underestimated by as much as two-fold, due to problems caused by incomplete human genome sequence data and the existence of dispersed gene families. Some genes that are neighbours in Fugu have human orthologues that are several megabases and tens of genes apart. This is probably caused by small inversions or other intrachromosomal rearrangements. CONCLUSIONS: Comparison of observed data to computer simulations suggests that 4000-16 000 chromosomal rearrangements have occurred since Fugu and human shared a common ancestor, implying a faster rate of rearrangement than seen in human/mouse comparisons.


Subject(s)
Fishes/genetics , Genome, Human , Genome , Animals , Base Sequence , Computer Simulation , Conserved Sequence , Gene Library , Humans , Introns , Models, Genetic , Sequence Analysis, DNA , Sequence Homology, Nucleic Acid
12.
Nature ; 402(6757): 86-90, 1999 Nov 04.
Article in English | MEDLINE | ID: mdl-10573422

ABSTRACT

A large-scale effort to measure, detect and analyse protein-protein interactions using experimental methods is under way. These include biochemistry such as co-immunoprecipitation or crosslinking, molecular biology such as the two-hybrid system or phage display, and genetics such as unlinked noncomplementing mutant detection. Using the two-hybrid system, an international effort to analyse the complete yeast genome is in progress. Evidently, all these approaches are tedious, labour intensive and inaccurate. From a computational perspective, the question is how can we predict that two proteins interact from structure or sequence alone. Here we present a method that identifies gene-fusion events in complete genomes, solely based on sequence comparison. Because there must be selective pressure for certain genes to be fused over the course of evolution, we are able to predict functional associations of proteins. We show that 215 genes or proteins in the complete genomes of Escherichia coli, Haemophilus influenzae and Methanococcus jannaschii are involved in 64 unique fusion events. The approach is general, and can be applied even to genes of unknown function.


Subject(s)
Artificial Gene Fusion , Bacterial Proteins/genetics , Genome, Bacterial , Bacterial Proteins/metabolism , Bacterial Proteins/physiology , Escherichia coli/genetics , Escherichia coli/physiology , Haemophilus influenzae/genetics , Haemophilus influenzae/physiology , Methanococcus/genetics , Methanococcus/physiology , Protein Binding , Two-Hybrid System Techniques
SELECTION OF CITATIONS
SEARCH DETAIL