Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 16 de 16
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
País de afiliación
Intervalo de año de publicación
1.
BMC Bioinformatics ; 16: 395, 2015 Nov 25.
Artículo en Inglés | MEDLINE | ID: mdl-26608050

RESUMEN

BACKGROUND: Inferring gene regulatory network (GRN) has been an important topic in Bioinformatics. Many computational methods infer the GRN from high-throughput expression data. Due to the presence of time delays in the regulatory relationships, High-Order Dynamic Bayesian Network (HO-DBN) is a good model of GRN. However, previous GRN inference methods assume causal sufficiency, i.e. no unobserved common cause. This assumption is convenient but unrealistic, because it is possible that relevant factors have not even been conceived of and therefore un-measured. Therefore an inference method that also handles hidden common cause(s) is highly desirable. Also, previous methods for discovering hidden common causes either do not handle multi-step time delays or restrict that the parents of hidden common causes are not observed genes. RESULTS: We have developed a discrete HO-DBN learning algorithm that can infer also hidden common cause(s) from discrete time series expression data, with some assumptions on the conditional distribution, but is less restrictive than previous methods. We assume that each hidden variable has only observed variables as children and parents, with at least two children and possibly no parents. We also make the simplifying assumption that children of hidden variable(s) are not linked to each other. Moreover, our proposed algorithm can also utilize multiple short time series (not necessarily of the same length), as long time series are difficult to obtain. CONCLUSIONS: We have performed extensive experiments using synthetic data on GRNs of size up to 100, with up to 10 hidden nodes. Experiment results show that our proposed algorithm can recover the causal GRNs adequately given the incomplete data. Using the limited real expression data and small subnetworks of the YEASTRACT network, we have also demonstrated the potential of our algorithm on real data, though more time series expression data is needed.


Asunto(s)
Algoritmos , Teorema de Bayes , Biología Computacional/métodos , Perfilación de la Expresión Génica , Redes Reguladoras de Genes , Regulación de la Expresión Génica , Humanos
2.
Nucleic Acids Res ; 40(19): 9392-403, 2012 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-22904079

RESUMEN

In protein-DNA interactions, particularly transcription factor (TF) and transcription factor binding site (TFBS) bindings, associated residue variations form patterns denoted as subtypes. Subtypes may lead to changed binding preferences, distinguish conserved from flexible binding residues and reveal novel binding mechanisms. However, subtypes must be studied in the context of core bindings. While solving 3D structures would require huge experimental efforts, recent sequence-based associated TF-TFBS pattern discovery has shown to be promising, upon which a large-scale subtype study is possible and desirable. In this article, we investigate residue-varying subtypes based on associated TF-TFBS patterns. By re-categorizing the patterns with respect to varying TF amino acids, statistically significant (P values ≤ 0.005) subtypes leading to varying TFBS patterns are discovered without using TF family or domain annotations. Resultant subtypes have various biological meanings. The subtypes reflect familial and functional properties and exhibit changed binding preferences supported by 3D structures. Conserved residues critical for maintaining TF-TFBS bindings are revealed by analyzing the subtypes. In-depth analysis on the subtype pair PKVVIL-CACGTG versus PKVEIL-CAGCTG shows the V/E variation is indicative for distinguishing Myc from MRF families. Discovered from sequences only, the TF-TFBS subtypes are informative and promising for more biological findings, complementing and extending recent one-sided subtype and familial studies with comprehensive evidence.


Asunto(s)
ADN/química , Factores de Transcripción/química , Factores de Transcripción/clasificación , Sitios de Unión , Inmunoprecipitación de Cromatina , ADN/metabolismo , Bases de Datos de Proteínas , Modelos Moleculares , Motivos de Nucleótidos , Posición Específica de Matrices de Puntuación , Unión Proteica , Análisis de Secuencia de ADN , Factores de Transcripción/metabolismo
3.
Bioinformatics ; 27(4): 471-8, 2011 Feb 15.
Artículo en Inglés | MEDLINE | ID: mdl-21193520

RESUMEN

MOTIVATION: The bindings between transcription factors (TFs) and transcription factor binding sites (TFBSs) are fundamental protein-DNA interactions in transcriptional regulation. Extensive efforts have been made to better understand the protein-DNA interactions. Recent mining on exact TF-TFBS-associated sequence patterns (rules) has shown great potentials and achieved very promising results. However, exact rules cannot handle variations in real data, resulting in limited informative rules. In this article, we generalize the exact rules to approximate ones for both TFs and TFBSs, which are essential for biological variations. RESULTS: A progressive approach is proposed to address the approximation to alleviate the computational requirements. Firstly, similar TFBSs are grouped from the available TF-TFBS data (TRANSFAC database). Secondly, approximate and highly conserved binding cores are discovered from TF sequences corresponding to each TFBS group. A customized algorithm is developed for the specific objective. We discover the approximate TF-TFBS rules by associating the grouped TFBS consensuses and TF cores. The rules discovered are evaluated by matching (verifying with) the actual protein-DNA binding pairs from Protein Data Bank (PDB) 3D structures. The approximate results exhibit many more verified rules and up to 300% better verification ratios than the exact ones. The customized algorithm achieves over 73% better verification ratios than traditional methods. Approximate rules (64-79%) are shown statistically significant. Detailed variation analysis and conservation verification on NCBI records demonstrate that the approximate rules reveal both the flexible and specific protein-DNA interactions accurately. The approximate TF-TFBS rules discovered show great generalized capability of exploring more informative binding rules.


Asunto(s)
Algoritmos , Proteínas de Unión al ADN/genética , ADN/genética , Factores de Transcripción/genética , Secuencia de Bases , Sitios de Unión , Biología Computacional/métodos , ADN/metabolismo , Proteínas de Unión al ADN/metabolismo , Regulación de la Expresión Génica , Unión Proteica , Estructura Terciaria de Proteína , Factores de Transcripción/metabolismo
4.
Nucleic Acids Res ; 38(19): 6324-37, 2010 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-20529874

RESUMEN

Protein-DNA bindings between transcription factors (TFs) and transcription factor binding sites (TFBSs) play an essential role in transcriptional regulation. Over the past decades, significant efforts have been made to study the principles for protein-DNA bindings. However, it is considered that there are no simple one-to-one rules between amino acids and nucleotides. Many methods impose complicated features beyond sequence patterns. Protein-DNA bindings are formed from associated amino acid and nucleotide sequence pairs, which determine many functional characteristics. Therefore, it is desirable to investigate associated sequence patterns between TFs and TFBSs. With increasing computational power, availability of massive experimental databases on DNA and proteins, and mature data mining techniques, we propose a framework to discover associated TF-TFBS binding sequence patterns in the most explicit and interpretable form from TRANSFAC. The framework is based on association rule mining with Apriori algorithm. The patterns found are evaluated by quantitative measurements at several levels on TRANSFAC. With further independent verifications from literatures, Protein Data Bank and homology modeling, there are strong evidences that the patterns discovered reveal real TF-TFBS bindings across different TFs and TFBSs, which can drive for further knowledge to better understand TF-TFBS bindings.


Asunto(s)
Proteínas de Unión al ADN/química , ADN/química , Minería de Datos/métodos , Elementos Reguladores de la Transcripción , Análisis de Secuencia de ADN , Factores de Transcripción/química , Algoritmos , Sitios de Unión , ADN/metabolismo , Proteínas de Unión al ADN/metabolismo , Bases de Datos Genéticas , Homología Estructural de Proteína , Factores de Transcripción/metabolismo
5.
BMC Bioinformatics ; 10: 321, 2009 Oct 07.
Artículo en Inglés | MEDLINE | ID: mdl-19811641

RESUMEN

BACKGROUND: Identification of transcription factor binding sites (TFBSs) is a central problem in Bioinformatics on gene regulation. de novo motif discovery serves as a promising way to predict and better understand TFBSs for biological verifications. Real TFBSs of a motif may vary in their widths and their conservation degrees within a certain range. Deciding a single motif width by existing models may be biased and misleading. Additionally, multiple, possibly overlapping, candidate motifs are desired and necessary for biological verification in practice. However, current techniques either prohibit overlapping TFBSs or lack explicit control of different motifs. RESULTS: We propose a new generalized model to tackle the motif widths by considering and evaluating a width range of interest simultaneously, which should better address the width uncertainty. Moreover, a meta-convergence framework for genetic algorithms (GAs), is proposed to provide multiple overlapping optimal motifs simultaneously in an effective and flexible way. Users can easily specify the difference amongst expected motif kinds via similarity test. Incorporating Genetic Algorithm with Local Filtering (GALF) for searching, the new GALF-G (G for generalized) algorithm is proposed based on the generalized model and meta-convergence framework. CONCLUSION: GALF-G was tested extensively on over 970 synthetic, real and benchmark datasets, and is usually better than the state-of-the-art methods. The range model shows an increase in sensitivity compared with the single-width ones, while providing competitive precisions on the E. coli benchmark. Effectiveness can be maintained even using a very small population, exhibiting very competitive efficiency. In discovering multiple overlapping motifs in a real liver-specific dataset, GALF-G outperforms MEME by up to 73% in overall F-scores. GALF-G also helps to discover an additional motif which has probably not been annotated in the dataset. http://www.cse.cuhk.edu.hk/%7Etmchan/GALFG/


Asunto(s)
Biología Computacional/métodos , Factores de Transcripción/química , Factores de Transcripción/metabolismo , Algoritmos , Sitios de Unión , Análisis de Secuencia de ADN
6.
J Virol ; 82(7): 3604-11, 2008 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-18216102

RESUMEN

We aimed to identify genomic markers in hepatitis B virus (HBV) that are associated with hepatocellular carcinoma (HCC) development by comparing the complete genomic sequences of HBVs among patients with HCC and those without. One hundred patients with HBV-related HCC and 100 age-matched HBV-infected non-HCC patients (controls) were studied. HBV DNA from serum was directly sequenced to study the whole viral genome. Data mining and rule learning were employed to develop diagnostic algorithms. An independent cohort of 132 cases (43 HCC and 89 non-HCC) was used to validate the accuracy of these algorithms. Among the 100 cases of HCC, 37 had genotype B (all subgenotype Ba) and 63 had genotype C (16 subgenotype Ce and 47 subgenotype Cs) HBV infection. In the control group, 51 had genotype B and 49 had genotype C (10 subgenotype Ce and 39 subgenotype Cs) HBV infection. Genomic algorithms associated with HCC were derived based on genotype/subgenotype-specific mutations. In genotype B HBV, mutations C1165T, A1762T and G1764A, T2712C/A/G, and A/T2525C were associated with HCC. HCC-related mutations T31C, T53C, and A1499G were associated with HBV subgenotype Ce, and mutations G1613A, G1899A, T2170C/G, and T2441C were associated with HBV subgenotype Cs. Amino acid changes caused by these mutations were found in the X, envelope, and precore/core regions in association with HBV genotype B, Ce, and Cs, respectively. In conclusion, infections with different genotypes of HBV (B, Ce, and Cs) carry different genomic markers for HCC at different parts of the HBV genome. Different HBV genotypes may have different virologic mechanisms of hepatocarcinogenesis.


Asunto(s)
Carcinoma Hepatocelular/virología , ADN Viral/genética , Genoma Viral , Virus de la Hepatitis B/genética , Adulto , Anciano , Sustitución de Aminoácidos , Femenino , Marcadores Genéticos , Genotipo , Virus de la Hepatitis B/aislamiento & purificación , Humanos , Masculino , Persona de Mediana Edad , Filogenia , Mutación Puntual , Análisis de Secuencia de ADN , Homología de Secuencia , Proteínas Virales/genética
7.
Bioinformatics ; 24(3): 341-9, 2008 Feb 01.
Artículo en Inglés | MEDLINE | ID: mdl-18065426

RESUMEN

MOTIVATION: Identification of transcription factor binding sites (TFBSs) plays an important role in deciphering the mechanisms of gene regulation. Recently, GAME, a Genetic Algorithm (GA)-based approach with iterative post-processing, has shown superior performance in TFBS identification. However, the basic GA in GAME is not elaborately designed, and may be trapped in local optima in real problems. The feature operators are only applied in the post-processing, but the final performance heavily depends on the GA output. Hence, both effectiveness and efficiency of the overall algorithm can be improved by introducing more advanced representations and novel operators in the GA, as well as designing the post-processing in an adaptive way. RESULTS: We propose a novel framework GALF-P, consisting of Genetic Algorithm with Local Filtering (GALF) and adaptive post-processing techniques (-P), to achieve both effectiveness and efficiency for TFBS identification. GALF combines the position-led and consensus-led representations used separately in current GAs and employs a novel local filtering operator to get rid of false positives within an individual efficiently during the evolutionary process in the GA. Pre-selection is used to maintain diversity and avoid local optima. Post-processing with adaptive adding and removing is developed to handle general cases with arbitrary numbers of instances per sequence. GALF-P shows superior performance to GAME, MEME, BioProspector and BioOptimizer on synthetic datasets with difficult scenarios and real test datasets. GALF-P is also more robust and reliable when further compared with GAME, the current state-of-the-art approach. AVAILABILITY: http://www.cse.cuhk.edu.hk/~tmchan/GALFP/.


Asunto(s)
Algoritmos , Alineación de Secuencia/métodos , Análisis de Secuencia de ADN/métodos , Factores de Transcripción/genética , Secuencia de Bases , Sitios de Unión , Datos de Secuencia Molecular , Unión Proteica
8.
IEEE Trans Syst Man Cybern B Cybern ; 37(1): 84-91, 2007 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-17278562

RESUMEN

This correspondence introduces a multidrug cancer chemotherapy model to simulate the possible response of the tumor cells under drug administration. We formulate the model as an optimal control problem. The algorithm in this correspondence optimizes the multidrug cancer chemotherapy schedule. The objective is to minimize the tumor size under a set of constraints. We combine the adaptive elitist genetic algorithm with a local search algorithm called iterative dynamic programming (IDP) to form a new memetic algorithm (MA-IDP) for solving the problem. MA-IDP has been shown to be very efficient in solving the multidrug scheduling optimization problem.


Asunto(s)
Algoritmos , Protocolos de Quimioterapia Combinada Antineoplásica/administración & dosificación , Biomimética/métodos , Esquema de Medicación , Quimioterapia Asistida por Computador/métodos , Modelos Biológicos , Neoplasias/tratamiento farmacológico , Protocolos de Quimioterapia Combinada Antineoplásica/farmacocinética , Inteligencia Artificial , Simulación por Computador , Humanos , Neoplasias/metabolismo , Programas Informáticos , Teoría de Sistemas
9.
Artículo en Inglés | MEDLINE | ID: mdl-26451828

RESUMEN

Inferring gene regulatory network (GRN) from the microarray expression data is an important problem in Bioinformatics, because knowing the GRN is an essential first step in understanding the inner workings of the cell and the related diseases. Time delays exist in the regulatory effects from one gene to another due to the time needed for transcription, translation, and to accumulate a sufficient number of needed proteins. Also, it is known that the delays are important for oscillatory phenomenon. Therefore, it is crucial to develop a causal gene network model, preferably as a function of time. In this paper, we propose an algorithm CLINDE to infer causal directed links in GRN with time delays and regulatory effects in the links from time-series microarray gene expression data. It is one of the most comprehensive in terms of features compared to the state-of-the-art discrete gene network models. We have tested CLINDE on synthetic data, the in vivo IRMA (On and Off) datasets and the [1] yeast expression data validated using KEGG pathways. Results show that CLINDE can effectively recover the links, the time delays and the regulatory effects in the synthetic data, and outperforms other algorithms in the IRMA in vivo datasets.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Modelos Biológicos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Mapeo de Interacción de Proteínas/métodos , Proteoma/metabolismo , Transducción de Señal/fisiología , Algoritmos , Animales , Simulación por Computador , Regulación de la Expresión Génica/fisiología , Humanos , Factores de Tiempo
10.
PLoS One ; 10(9): e0138596, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26394325

RESUMEN

Inferring the gene regulatory network (GRN) is crucial to understanding the working of the cell. Many computational methods attempt to infer the GRN from time series expression data, instead of through expensive and time-consuming experiments. However, existing methods make the convenient but unrealistic assumption of causal sufficiency, i.e. all the relevant factors in the causal network have been observed and there are no unobserved common cause. In principle, in the real world, it is impossible to be certain that all relevant factors or common causes have been observed, because some factors may not have been conceived of, and therefore are impossible to measure. In view of this, we have developed a novel algorithm named HCC-CLINDE to infer an GRN from time series data allowing the presence of hidden common cause(s). We assume there is a sparse causal graph (possibly with cycles) of interest, where the variables are continuous and each causal link has a delay (possibly more than one time step). A small but unknown number of variables are not observed. Each unobserved variable has only observed variables as children and parents, with at least two children, and the children are not linked to each other. Since it is difficult to obtain very long time series, our algorithm is also capable of utilizing multiple short time series, which is more realistic. To our knowledge, our algorithm is far less restrictive than previous works. We have performed extensive experiments using synthetic data on GRNs of size up to 100, with up to 10 hidden nodes. The results show that our algorithm can adequately recover the true causal GRN and is robust to slight deviation from Gaussian distribution in the error terms. We have also demonstrated the potential of our algorithm on small YEASTRACT subnetworks using limited real data.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Redes Reguladoras de Genes , Modelos Genéticos , Bases de Datos Genéticas/estadística & datos numéricos , Perfilación de la Expresión Génica/métodos , Perfilación de la Expresión Génica/estadística & datos numéricos , Regulación de la Expresión Génica , Cinética , Reproducibilidad de los Resultados , Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/genética , Factores de Tiempo
11.
Artículo en Inglés | MEDLINE | ID: mdl-21233523

RESUMEN

Extraction of meaningful information from large experimental data sets is a key element in bioinformatics research. One of the challenges is to identify genomic markers in Hepatitis B Virus (HBV) that are associated with HCC (liver cancer) development by comparing the complete genomic sequences of HBV among patients with HCC and those without HCC. In this study, a data mining framework, which includes molecular evolution analysis, clustering, feature selection, classifier learning, and classification, is introduced. Our research group has collected HBV DNA sequences, either genotype B or C, from over 200 patients specifically for this project. In the molecular evolution analysis and clustering, three subgroups have been identified in genotype C and a clustering method has been developed to separate the subgroups. In the feature selection process, potential markers are selected based on Information Gain for further classifier learning. Then, meaningful rules are learned by our algorithm called the Rule Learning, which is based on Evolutionary Algorithm. Also, a new classification method by Nonlinear Integral has been developed. Good performance of this method comes from the use of the fuzzy measure and the relevant nonlinear integral. The nonadditivity of the fuzzy measure reflects the importance of the feature attributes as well as their interactions. These two classifiers give explicit information on the importance of the individual mutated sites and their interactions toward the classification (potential causes of liver cancer in our case). A thorough comparison study of these two methods with existing methods is detailed. For genotype B, genotype C subgroups C1, C2, and C3, important mutation markers (sites) have been found, respectively. These two classification methods have been applied to classify never-seen-before examples for validation. The results show that the classification methods have more than 70 percent accuracy and 80 percent sensitivity for most data sets, which are considered high as an initial scanning method for liver cancer diagnosis.


Asunto(s)
ADN Viral/química , Minería de Datos/métodos , Virus de la Hepatitis B/genética , Análisis de Secuencia de ADN , Secuencia de Bases , Carcinoma Hepatocelular/genética , Carcinoma Hepatocelular/virología , Análisis por Conglomerados , Genómica/métodos , Genotipo , Humanos , Neoplasias Hepáticas/genética , Neoplasias Hepáticas/virología , Mutación
12.
Artículo en Inglés | MEDLINE | ID: mdl-21030733

RESUMEN

Finding Transcription Factor Binding Sites, i.e., motif discovery, is crucial for understanding the gene regulatory relationship. Motifs are weakly conserved and motif discovery is an NP-hard problem. We propose a new approach called Cluster Refinement Algorithm for Motif Discovery (CRMD). CRMD employs a flexible statistical motif model allowing a variable number of motifs and motif instances. CRMD first uses a novel entropy-based clustering to find complete and good starting candidate motifs from the DNA sequences. CRMD then employs an effective greedy refinement to search for optimal motifs from the candidate motifs. The refinement is fast, and it changes the number of motif instances based on the adaptive thresholds. The performance of CRMD is further enhanced if the problem has one occurrence of motif instance per sequence. Using an appropriate similarity test of motifs, CRMD is also able to find multiple motifs. CRMD has been tested extensively on synthetic and real data sets. The experimental results verify that CRMD usually outperforms four other state-of-the-art algorithms in terms of the qualities of the solutions with competitive computing time. It finds a good balance between finding true motif instances and screening false motif instances, and is robust on problems of various levels of difficulty.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Elementos Reguladores de la Transcripción , Análisis de Secuencia de ADN/métodos , Factores de Transcripción/metabolismo , Sitios de Unión , Análisis por Conglomerados , Entropía
13.
IEEE Trans Syst Man Cybern B Cybern ; 38(4): 1036-49, 2008 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-18632395

RESUMEN

In genetic programming (GP), evolving tree nodes separately would reduce the huge solution space. However, tree nodes are highly interdependent with respect to their fitness. In this paper, we propose a new GP framework, namely, instruction-matrix (IM)-based GP (IMGP), to handle their interactions. IMGP maintains an IM to evolve tree nodes and subtrees separately. IMGP extracts program trees from an IM and updates the IM with the information of the extracted program trees. As the IM actually keeps most of the information of the schemata of GP and evolves the schemata directly, IMGP is effective and efficient. Our experimental results on benchmark problems have verified that IMGP is not only better than those of canonical GP in terms of the qualities of the solutions and the number of program evaluations, but they are also better than some of the related GP algorithms. IMGP can also be used to evolve programs for classification problems. The classifiers obtained have higher classification accuracies than four other GP classification algorithms on four benchmark classification problems. The testing errors are also comparable to or better than those obtained with well-known classifiers. Furthermore, an extended version, called condition matrix for rule learning, has been used successfully to handle multiclass classification problems.


Asunto(s)
Algoritmos , Inteligencia Artificial , Modelos Teóricos , Reconocimiento de Normas Patrones Automatizadas/métodos , Programación Lineal , Teoría de Sistemas , Simulación por Computador , Retroalimentación , Modelos Genéticos
14.
Evol Comput ; 14(2): 129-56, 2006.
Artículo en Inglés | MEDLINE | ID: mdl-16831104

RESUMEN

This paper presents a novel Genetic Parallel Programming (GPP) paradigm for evolving parallel programs running on a Multi-Arithmetic-Logic-Unit (Multi-ALU) Processor (MAP). The MAP is a Multiple Instruction-streams, Multiple Data-streams (MIMD), general-purpose register machine that can be implemented on modern Very Large-Scale Integrated Circuits (VLSIs) in order to evaluate genetic programs at high speed. For human programmers, writing parallel programs is more difficult than writing sequential programs. However, experimental results show that GPP evolves parallel programs with less computational effort than that of their sequential counterparts. It creates a new approach to evolving a feasible problem solution in parallel program form and then serializes it into a sequential program if required. The effectiveness and efficiency of GPP are investigated using a suite of 14 well-studied benchmark problems. Experimental results show that GPP speeds up evolution substantially.


Asunto(s)
Biología Computacional/métodos , Programas Informáticos , Algoritmos , Inteligencia Artificial , Metodologías Computacionales , Evolución Molecular , Genotipo , Humanos , Modelos Genéticos , Modelos Estadísticos , Reconocimiento de Normas Patrones Automatizadas , Lenguajes de Programación
15.
J Clin Microbiol ; 44(3): 681-7, 2006 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-16517839

RESUMEN

Hepatitis B virus (HBV) with T-1856 of the precore region is always associated with C-1858 (i.e., TCC at nucleotides 1856 to 1858), and it is reported only in genotype C HBV isolates. We aimed to investigate the phylogenetic, virological, and clinical characteristics of HBV isolates bearing TCC at nucleotides 1856 to 1858. We have previously reported on the presence of two major subgroups in genotype C HBV, namely, HBV genotype Cs (Southeast Asia) and HBV genotype Ce (Far East). We have designed a novel 5' nuclease technology based on the nucleotide polymorphism (C or A) at nucleotide 2733 to differentiate the two genotype C HBV subgroups. The mutations at the basal core promoter and precore regions were analyzed by direct sequencing. Among 214 genotype C HBV-infected patients, 31% had TCC, 37% had CCC, 3% had CTC, and 29% had CCT at nucleotides 1856 to 1858. All except one HBV strain with TCC at nucleotides 1856 to 1858 belonged to subgroup Cs, which has been reported only in Hong Kong; Guangzhou, China; and Vietnam. HBV with TCC at nucleotides 1856 to 1858 was associated with the G1898A mutation (64%). Patients infected with HBV harboring TCC had more liver cirrhosis than those infected with HBV harboring CCC (18% versus 5%; P = 0.008), and more of the patients infected with HBV harboring TCC were positive for HBeAg (58% versus 36%; P = 0.01) and had higher median alanine aminotransferase levels (65 IU/liter versus 49 IU/liter; P = 0.006); but similar proportions of patients infected with HBV harboring TCC and those infected with HBV harboring CCT had liver cirrhosis (18% versus 13%; P = 0.43). In summary, we report that HBV with TCC at nucleotides 1856 to 1858 of the precore region might represent a specific HBV strain associated with more aggressive liver disease than other genotype C HBV strains.


Asunto(s)
Virus de la Hepatitis B/genética , Secuencia de Bases , Codón/genética , ADN Viral/química , ADN Viral/genética , Variación Genética , Genotipo , Hepatitis B/virología , Antígenos del Núcleo de la Hepatitis B/genética , Virus de la Hepatitis B/clasificación , Virus de la Hepatitis B/patogenicidad , Humanos , Conformación de Ácido Nucleico , Filogenia , Mutación Puntual , Regiones Promotoras Genéticas , Virulencia/genética
16.
J Infect Dis ; 191(12): 2022-32, 2005 Jun 15.
Artículo en Inglés | MEDLINE | ID: mdl-15897987

RESUMEN

BACKGROUND: We aimed to investigate the characteristics of hepatitis B virus (HBV) genotype C subgroups in Hong Kong and their relationship with HBV genotype C in other parts of Asia. METHODS: Full-genome nucleotide sequences of 49 HBV genotype C isolates from Chinese patients with chronic hepatitis B were compared with the sequences of 69 HBV genotype C isolates and 12 non-genotype C isolates in the GenBank database. Phylogenetic analysis was performed to define the subgroups of HBV genotype C on the basis of >4% heterogeneity of the entire HBV genome. RESULTS: HBV in 80% of patients in Hong Kong belonged to a subgroup predominantly found in Southeast Asia (Vietnam, Thailand, Myanmar, and southern China) designated as HBV genotype "Cs," and HBV in the remaining 20% of patients belonged to another subgroup, predominantly found in the Far East (Korea, Japan, and northern China), designated as HBV genotype "Ce." Overall, the mean+/-SD nucleotide sequence difference between HBV genotype Cs and HBV genotype Ce was 4.2%+/-0.3%. When HBV genotype Cs and HBV genotype Ce were compared among patients in Hong Kong, HBV genotype Cs was associated with a higher tendency to develop basal core promoter mutations (80% vs. 50%; P=.14), a higher prevalence of C at nucleotide 1858 (95% vs. 0%; P<.001), and a lower prevalence of precore stop codon mutations (5% vs. 50%; P=.002). CONCLUSIONS: HBV genotype C can be differentiated into 2 subgroups--namely, genotype Ce and genotype Cs--that have different epidemiological distributions and virological characteristics.


Asunto(s)
Virus de la Hepatitis B/genética , Hepatitis B/epidemiología , Hepatitis B/virología , Adulto , Femenino , Genotipo , Virus de la Hepatitis B/clasificación , Hong Kong/epidemiología , Humanos , Masculino , Persona de Mediana Edad , Filogenia
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA