RESUMEN
Next-generation sequencing has allowed identification of millions of somatic mutations in human cancer cells. A key challenge in interpreting cancer genomes is to distinguish drivers of cancer development among available genetic mutations. To address this issue, we present the first web-based application, consensus cancer driver gene caller (C3), to identify the consensus driver genes using six different complementary strategies, i.e., frequency-based, machine learning-based, functional bias-based, clustering-based, statistics model-based, and network-based strategies. This application allows users to specify customized operations when calling driver genes, and provides solid statistical evaluations and interpretable visualizations on the integration results. C3 is implemented in Python and is freely available for public use at http://drivergene.rwebox.com/c3.
Asunto(s)
Algoritmos , Neoplasias/genética , Análisis por Conglomerados , Humanos , Internet , Aprendizaje AutomáticoRESUMEN
Aiming at the problem of easy tampering and difficult integrity authentication of paper text documents, this paper proposes a robust content authentication method for printed documents based on text watermarking scheme resisting print-and-scan attack. Firstly, an authentication watermark signal sequence related to content of text document is generated based on the Logistic chaotic map model; then, the authentication watermark signal sequence is embedded into printed paper document by using a robust text watermarking scheme; finally, the watermark information is extracted from scanned image of paper document, and compared with the authentication watermark information calculated in real time by the text document content obtained by OCR technology, thereby performing content integrity authentication of the paper text documents. Experimental results show that our method can achieve the robust content integrity authentication of paper text documents, and can also accurately locate the tampering position. In addition, the document after embedding the watermark information has a good visual effect, and the text watermarking scheme has a large information capacity.
Asunto(s)
Seguridad Computacional , Informática Médica/instrumentación , Algoritmos , Gráficos por Computador/normas , Compresión de Datos/métodos , Lenguaje , Informática Médica/métodos , Dinámicas no Lineales , Reconocimiento de Normas Patrones Automatizadas/métodos , Programas InformáticosRESUMEN
OBJECTIVE: To explore the expression of CD66c (CEACM6) in adult acute leukemia and its significance. METHODS: Acute leukemia cell lines HL-60, K562, LCL721.221 and Jurkat were cultured in vitro. RT-PCR and multi-parameter flow cytometry were applied to analysis of CD66c mRNA and protein expression respectively in the cell lines and patient' s bone marrow leukemic cells. Cytogenetic analysis for 199 bone marrow samples from leukemia patients and Minimal Residual Disease (MRD) detection for 25 CD66c positive B lineage ALL were performed. RESULTS: (1) CD66c expression both on cell surface and in plasma were negative in all the cell lines. (2) Four of 127 AML (3.15%) (mainly of M2 and M4), and 28 of 79 ALL (35.44%) (all of B linage ALL) were CD66c positive the subtypes of the ALL being common B-ALL (20/54) and pre B-ALL (8/11) including 8 Ph + B-linage ALL. (3) Six-month relapse rate was significantly different between the MRD positive and negative patients. (4) CD66c mRNA was strongly expressed in B-linage ALL. For the cell lines, only the HL60 cells weakly expressed CD66c mRNA. CONCLUSION: CD66c expression could be a useful bio-marker for the MRD analysis in ALL, and is closely associated with its transcription level.
Asunto(s)
Antígenos CD/biosíntesis , Antígeno Carcinoembrionario/biosíntesis , Moléculas de Adhesión Celular/biosíntesis , Leucemia Mieloide Aguda/metabolismo , Leucemia-Linfoma Linfoblástico de Células Precursoras/metabolismo , Adolescente , Adulto , Anciano , Antígeno Carcinoembrionario/genética , Proteínas Ligadas a GPI , Células HL-60 , Humanos , Células K562 , Masculino , Persona de Mediana Edad , Neoplasia Residual/metabolismo , ARN Mensajero/biosíntesisRESUMEN
Several "head-to-head" (or "bidirectional") gene pairs have been studied in individual experiments, but genome-wide analysis of this gene organization, especially in terms of transcriptional correlation and functional association, is still insufficient. We conducted a systematic investigation of head-to-head gene organization focusing on structural features, evolutionary conservation, expression correlation and functional association. Of the present 1,262, 1,071, and 491 head-to-head pairs identified in human, mouse, and rat genomes, respectively, pairs with 1- to 400-base pair distance between transcription start sites form the majority (62.36%, 64.15%, and 55.19% for human, mouse, and rat,respectively) of each dataset, and the largest group is always the one with a transcription start site distance of 101 to 200 base pairs. The phylogenetic analysis among Fugu, chicken, and human indicates a negative selection on the separation of head-to-head genes across vertebrate evolution, and thus the ancestral existence of this gene organization. The expression analysis shows that most of the human head-to-head genes are significantly correlated,and the correlation could be positive, negative, or alternative depending on the experimental conditions. Finally, head to-head genes statistically tend to perform similar functions, and gene pairs associated with the significant cofunctions seem to have stronger expression correlations. The findings indicate that the head-to-head gene organization is ancient and conserved, which subjects functionally related genes to correlated transcriptional regulation and thus provides an exquisite mechanism of transcriptional regulation based on gene organization. These results have significantly expanded the knowledge about head-to-head gene organization. Supplementary materials for this study are available at http://www.scbit.org/h2h.
Asunto(s)
Biología Computacional/métodos , Evolución Molecular , Animales , Pollos , Mapeo Cromosómico , Bases de Datos Genéticas , Ligamiento Genético , Genoma , Humanos , Ratones , Modelos Biológicos , Sistemas de Lectura Abierta , Filogenia , Ratas , Especificidad de la Especie , Biología de Sistemas , Transcripción GenéticaRESUMEN
BACKGROUND: Several high-throughput searches for potential natural antisense transcripts (NATs) have been performed recently, but most of the reports were focused on cis type. A thorough in silico analysis of human transcripts will help expand our knowledge of NATs. RESULTS: We have identified 568 NATs from human RefSeq RNA sequences. Among them, 403 NATs are reported for the first time, and at least 157 novel NATs are trans type. According to the pairing region of a sense and antisense RNA pair, hNATs are divided into 6 classes, of which about 87% involve 5' or 3' UTR sequences, supporting the regulatory role of UTRs. Among a total of 535 NAT pairs related with splice variants, 77.4% (414/535) have their pairing regions affected or completely eliminated by alternative splicing, suggesting significant relationship of alternative splicing and antisense-directed regulation. The extensive occurrence of splice variants in hNATs and other multiple pairing patterns results in a one-to-many relationship, allowing the formation of complex regulation networks. Based on microarray data from Stanford Microarray Database, two hNAT pairs were found to display significant inverse expression patterns before and after insulin injection. CONCLUSION: NATs might carry out more extensive and complex functions than previously thought. Combined with endogenous micro RNAs, hNATs could be regarded as a special group of transcripts contributing to the complex regulation networks.
Asunto(s)
Algoritmos , Mapeo Cromosómico/métodos , Proteoma/genética , ARN sin Sentido/genética , Alineación de Secuencia/métodos , Análisis de Secuencia de ARN/métodos , Factores de Transcripción/genética , Secuencia de Bases , Bases de Datos de Proteínas , Humanos , Datos de Secuencia MolecularRESUMEN
The genomic sequences of severe acute respiratory syndrome coronaviruses from human and palm civet of the 2003/2004 outbreak in the city of Guangzhou, China, were nearly identical. Phylogenetic analysis suggested an independent viral invasion from animal to human in this new episode. Combining all existing data but excluding singletons, we identified 202 single-nucleotide variations. Among them, 17 are polymorphic in palm civets only. The ratio of nonsynonymous/synonymous nucleotide substitution in palm civets collected 1 yr apart from different geographic locations is very high, suggesting a rapid evolving process of viral proteins in civet as well, much like their adaptation in the human host in the early 2002-2003 epidemic. Major genetic variations in some critical genes, particularly the Spike gene, seemed essential for the transition from animal-to-human transmission to human-to-human transmission, which eventually caused the first severe acute respiratory syndrome outbreak of 2002/2003.
Asunto(s)
Evolución Molecular , Síndrome Respiratorio Agudo Grave/virología , Coronavirus Relacionado al Síndrome Respiratorio Agudo Severo/genética , Viverridae/virología , Sustitución de Aminoácidos , Animales , China/epidemiología , Brotes de Enfermedades , Genes Virales , Humanos , Glicoproteínas de Membrana/genética , Filogenia , Polimorfismo de Nucleótido Simple , Coronavirus Relacionado al Síndrome Respiratorio Agudo Severo/aislamiento & purificación , Coronavirus Relacionado al Síndrome Respiratorio Agudo Severo/patogenicidad , Coronavirus Relacionado al Síndrome Respiratorio Agudo Severo/fisiología , Síndrome Respiratorio Agudo Grave/epidemiología , Síndrome Respiratorio Agudo Grave/transmisión , Especificidad de la Especie , Glicoproteína de la Espiga del Coronavirus , Proteínas del Envoltorio Viral/genética , Zoonosis/epidemiología , Zoonosis/transmisión , Zoonosis/virologíaRESUMEN
The function of a protein is closely correlated with its subcellular location. With the success of human genome project and the rapid increase in the number of newly found protein sequences entering into data banks, it is highly desirable to develop an automated method for predicting the subcellular location of proteins. The establishment of such a predictor will no doubt expedite the functionality determination of newly found proteins and the process of prioritizing genes and proteins identified by genomics efforts as potential molecular targets for drug design. Based on the concept of pseudo amino acid composition originally proposed by K. C. Chou (Proteins: Struct. Funct. Genet. 43: 246-255, 2001), the digital signal processing approach has been introduced to partially incorporate the sequence order effect. One of the remarkable merits by doing so is that many existing tools in mathematics and engineering can be straightforwardly used in predicting protein subcellular location. The results thus obtained are quite encouraging. It is anticipated that the digital signal processing may serve as a useful vehicle for many other protein science areas as well.
Asunto(s)
Aminoácidos/análisis , Células/metabolismo , Biología Computacional/métodos , Proteínas/química , Proteínas/metabolismo , Algoritmos , Células/citología , Humanos , Transporte de Proteínas , Procesos Estocásticos , Fracciones Subcelulares/química , Fracciones Subcelulares/metabolismoRESUMEN
AIM: To obtain the information of ligand-receptor binding between the S protein of SARS-CoV and CD13, identify the possible interacting domains or motifs related to binding sites, and provide clues for studying the functions of SARS proteins and designing anti-SARS drugs and vaccines. METHODS: On the basis of comparative genomics, the homology search, phylogenetic analyses, and multi-sequence alignment were used to predict CD13 related interacting domains and binding sites in the S protein of SARS-CoV. Molecular modeling and docking simulation methods were employed to address the interaction feature between CD13 and S protein of SARS-CoV in validating the bioinformatics predictions. RESULTS: Possible binding sites in the SARS-CoV S protein to CD13 have been mapped out by using bioinformatics analysis tools. The binding for one protein-protein interaction pair (D757-R761 motif of the SARS-CoV S protein to P585-A653 domain of CD13) has been simulated by molecular modeling and docking simulation methods. CONCLUSION: CD13 may be a possible receptor of the SARS-CoV S protein, which may be associated with the SARS infection. This study also provides a possible strategy for mapping the possible binding receptors of the proteins in a genome.
Asunto(s)
Antígenos CD13/metabolismo , Glicoproteínas de Membrana/metabolismo , Síndrome Respiratorio Agudo Grave/virología , Coronavirus Relacionado al Síndrome Respiratorio Agudo Severo/química , Proteínas del Envoltorio Viral/metabolismo , Secuencia de Aminoácidos , Sitios de Unión , Antígenos CD13/química , Antígenos CD13/genética , Dominio Catalítico , Biología Computacional , Humanos , Glicoproteínas de Membrana/química , Glicoproteínas de Membrana/genética , Datos de Secuencia Molecular , Unión Proteica , Mapeo de Interacción de Proteínas , Estructura Terciaria de Proteína , Coronavirus Relacionado al Síndrome Respiratorio Agudo Severo/genética , Alineación de Secuencia , Glicoproteína de la Espiga del Coronavirus , Proteínas del Envoltorio Viral/química , Proteínas del Envoltorio Viral/genéticaRESUMEN
AIM: To predict the probable genomic packaging signal of SARS-CoV by bioinformatics analysis. The derived packaging signal may be used to design antisense RNA and RNA interfere (RNAi) drugs treating SARS. METHODS: Based on the studies about the genomic packaging signals of MHV and BCoV, especially the information about primary and secondary structures, the putative genomic packaging signal of SARS-CoV were analyzed by using bioinformatic tools. Multi-alignment for the genomic sequences was performed among SARS-CoV, MHV, BCoV, PEDV and HCoV 229E. Secondary structures of RNA sequences were also predicted for the identification of the possible genomic packaging signals. Meanwhile, the N and M proteins of all five viruses were analyzed to study the evolutionary relationship with genomic packaging signals. RESULTS: The putative genomic packaging signal of SARS-CoV locates at the 3' end of ORF1b near that of MHV and BCoV, where is the most variable region of this gene. The RNA secondary structure of SARS-CoV genomic packaging signal is very similar to that of MHV and BCoV. The same result was also obtained in studying the genomic packaging signals of PEDV and HCoV 229E. Further more, the genomic sequence multi-alignment indicated that the locations of packaging signals of SARS-CoV, PEDV, and HCoV overlaped each other. It seems that the mutation rate of packaging signal sequences is much higher than the N protein, while only subtle variations for the M protein. CONCLUSIONS: The probable genomic packaging signal of SARS-CoV is analogous to that of MHV and BCoV, with the corresponding secondary RNA structure locating at the similar region of ORF1b. The positions where genomic packaging signals exist have suffered rounds of mutations, which may influence the primary structures of the N and M proteins consequently.