Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 44
Filtrar
Más filtros











Base de datos
Intervalo de año de publicación
1.
Int J Cancer ; 137(1): 86-95, 2015 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-25422082

RESUMEN

Gastric cancer is one of the most prevalent and aggressive cancers worldwide, and its molecular mechanism remains largely elusive. Here we report the genomic landscape in primary gastric adenocarcinoma of human, based on the complete genome sequences of five pairs of cancer and matching normal samples. In total, 103,464 somatic point mutations, including 407 nonsynonymous ones, were identified and the most recurrent mutations were harbored by Mucins (MUC3A and MUC12) and transcription factors (ZNF717, ZNF595 and TP53). 679 genomic rearrangements were detected, which affect 355 protein-coding genes; and 76 genes show copy number changes. Through mapping the boundaries of the rearranged regions to the folded three-dimensional structure of human chromosomes, we determined that 79.6% of the chromosomal rearrangements happen among DNA fragments in close spatial proximity, especially when two endpoints stay in a similar replication phase. We demonstrated evidences that microhomology-mediated break-induced replication was utilized as a mechanism in inducing ∼40.9% of the identified genomic changes in gastric tumor. Our data analyses revealed potential integrations of Helicobacter pylori DNA into the gastric cancer genomes. Overall a large set of novel genomic variations were detected in these gastric cancer genomes, which may be essential to the study of the genetic basis and molecular mechanism of the gastric tumorigenesis.


Asunto(s)
Adenocarcinoma/genética , Aberraciones Cromosómicas , Variación Genética , Infecciones por Helicobacter/genética , Helicobacter pylori/fisiología , Neoplasias Gástricas/genética , Adenocarcinoma/patología , Adenocarcinoma/virología , Anciano , Variaciones en el Número de Copia de ADN , ADN Viral/análisis , Genoma Humano , Humanos , Masculino , Persona de Mediana Edad , Mutación Puntual , Polimorfismo de Nucleótido Simple , Neoplasias Gástricas/patología , Neoplasias Gástricas/virología
2.
J Bioinform Comput Biol ; 12(1): 1350019, 2014 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-24467758

RESUMEN

The GPCR genes have a variety of exon-intron structures even though their proteins are all structurally homologous. We have examined all human GPCR genes with at least two functional protein isoforms, totaling 199, aiming to gain an understanding of what may have contributed to the large diversity of the exon-intron structures of the GPCR genes. The 199 genes have a total of 808 known protein splicing isoforms with experimentally verified functions. Our analysis reveals that 1301 (80.6%) adjacent exon-exon pairs out of the total of 1,613 in the 199 genes have either exactly one exon skipped or the intron in-between retained in at least one of the 808 protein splicing isoforms. This observation has a statistical significance p-value of 2.051762 * e(-09), assuming that the observed splicing isoforms are independent of the exon-intron structures. Our interpretation of this observation is that the exon boundaries of the GPCR genes are not randomly determined; instead they may be selected to facilitate specific alternative splicing for functional purposes.


Asunto(s)
Isoformas de Proteínas , Receptores Acoplados a Proteínas G/genética , Empalme Alternativo , Exones , Humanos , Intrones , Modelos Genéticos , Receptores Acoplados a Proteínas G/metabolismo
3.
PLoS One ; 8(2): e56726, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-23457606

RESUMEN

We have previously developed a computational method for representing a genome as a barcode image, which makes various genomic features visually apparent. We have demonstrated that this visual capability has made some challenging genome analysis problems relatively easy to solve. We have applied this capability to a number of challenging problems, including (a) identification of horizontally transferred genes, (b) identification of genomic islands with special properties and (c) binning of metagenomic sequences, and achieved highly encouraging results. These application results inspired us to develop this barcode-based genome analysis server for public service, which supports the following capabilities: (a) calculation of the k-mer based barcode image for a provided DNA sequence; (b) detection of sequence fragments in a given genome with distinct barcodes from those of the majority of the genome, (c) clustering of provided DNA sequences into groups having similar barcodes; and (d) homology-based search using Blast against a genome database for any selected genomic regions deemed to have interesting barcodes. The barcode server provides a job management capability, allowing processing of a large number of analysis jobs for barcode-based comparative genome analyses. The barcode server is accessible at http://csbl1.bmb.uga.edu/Barcode.


Asunto(s)
Gráficos por Computador , Genómica/métodos , Programas Informáticos , Algoritmos , Análisis por Conglomerados , Minería de Datos , Escherichia coli K12/genética , Escherichia coli O157/genética , Islas Genómicas/genética , Metagenómica , Análisis de Secuencia
5.
PLoS One ; 7(1): e29496, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-22235300

RESUMEN

Regulons, as groups of transcriptionally co-regulated operons, are the basic units of cellular response systems in bacterial cells. While the concept has been long and widely used in bacterial studies since it was first proposed in 1964, very little is known about how its component operons are arranged in a bacterial genome. We present a computational study to elucidate of the organizational principles of regulons in a bacterial genome, based on the experimentally validated regulons of E. coli and B. subtilis. Our results indicate that (1) genomic locations of transcriptional factors (TFs) are under stronger evolutionary constraints than those of the operons they regulate so changing a TF's genomic location will have larger impact to the bacterium than changing the genomic position of any of its target operons; (2) operons of regulons are generally not uniformly distributed in the genome but tend to form a few closely located clusters, which generally consist of genes working in the same metabolic pathways; and (3) the global arrangement of the component operons of all the regulons in a genome tends to minimize a simple scoring function, indicating that the global arrangement of regulons follows simple organizational principles.


Asunto(s)
Biología Computacional , Genoma Bacteriano/genética , Regulón/genética , Bacillus subtilis/genética , Bacillus subtilis/metabolismo , Escherichia coli K12/genética , Escherichia coli K12/metabolismo , Evolución Molecular , Operón/genética , Factores de Transcripción/metabolismo
6.
World J Gastroenterol ; 17(14): 1910-4, 2011 Apr 14.
Artículo en Inglés | MEDLINE | ID: mdl-21528067

RESUMEN

AIM: To identify and assess the novel makers for detection of Shiga toxin producing Escherichia coli (STEC) O157:H7 with an integrated computational and experimental approach. METHODS: High-throughput NCBI blast (E-value cutoff e-5) was used to search homologous genes among all sequenced prokaryotic genomes of each gene encoded in each of the three strains of STEC O157:H7 with complete genomes, aiming to find unique genes in O157:H7 as its potential markers. To ensure that the identified markers from the three strains of STEC O157:H7 can serve as general markers for all the STEC O157:H7 strains, a genomic barcode approach was used to select the markers to minimize the possibility of choosing a marker gene as part of a transposable element. Effectiveness of the markers predicted was then validated by running polymerase chain reaction (PCR) on 18 strains of O157:H7 with 5 additional genomes used as negative controls. RESULTS: The blast search identified 20, 16 and 20 genes, respectively, in the three sequenced strains of STEC O157:H7, which had no homologs in any of the other prokaryotic genomes. Three genes, wzy, Z0372 and Z0344, common to the three gene lists, were selected based on the genomic barcode approach. PCR showed an identification accuracy of 100% on the 18 tested strains and the 5 controls. CONCLUSION: The three identified novel markers, wzy, Z0372 and Z0344, are highly promising for the detection of STEC O157:H7, in complementary to the known markers.


Asunto(s)
Infecciones por Escherichia coli/diagnóstico , Escherichia coli O157/genética , Marcadores Genéticos , Animales , Genoma Bacteriano , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Datos de Secuencia Molecular , Reacción en Cadena de la Polimerasa , Reproducibilidad de los Resultados , Toxinas Shiga
7.
Structure ; 19(4): 484-95, 2011 Apr 13.
Artículo en Inglés | MEDLINE | ID: mdl-21481772

RESUMEN

Nuclear magnetic resonance paramagnetic relaxation enhancement (PRE) measures long-range distances to isotopically labeled residues, providing useful constraints for protein structure prediction. The method usually requires labor-intensive conjugation of nitroxide labels to multiple locations on the protein, one at a time. Here a computational procedure, based on protein sequence and simple secondary structure models, is presented to facilitate optimal placement of a minimum number of labels needed to determine the correct topology of a helical transmembrane protein. Tests on DsbB (four helices) using just one label lead to correct topology predictions in four of five cases, with the predicted structures <6 Å to the native structure. Benchmark results using simulated PRE data show that we can generally predict the correct topology for five and six to seven helices using two and three labels, respectively, with an average success rate of 76% and structures of similar precision. The results show promise in facilitating experimentally constrained structure prediction of membrane proteins.


Asunto(s)
Biología Computacional/métodos , Proteínas de la Membrana/química , Mutación , Estructura Secundaria de Proteína , Animales , Sitios de Unión/genética , Humanos , Espectroscopía de Resonancia Magnética , Proteínas de la Membrana/genética , Modelos Moleculares , Reproducibilidad de los Resultados
8.
Nucleic Acids Res ; 39(4): 1197-207, 2011 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-20965966

RESUMEN

This report describes an integrated study on identification of potential markers for gastric cancer in patients' cancer tissues and sera based on: (i) genome-scale transcriptomic analyses of 80 paired gastric cancer/reference tissues and (ii) computational prediction of blood-secretory proteins supported by experimental validation. Our findings show that: (i) 715 and 150 genes exhibit significantly differential expressions in all cancers and early-stage cancers versus reference tissues, respectively; and a substantial percentage of the alteration is found to be influenced by age and/or by gender; (ii) 21 co-expressed gene clusters have been identified, some of which are specific to certain subtypes or stages of the cancer; (iii) the top-ranked gene signatures give better than 94% classification accuracy between cancer and the reference tissues, some of which are gender-specific; and (iv) 136 of the differentially expressed genes were predicted to have their proteins secreted into blood, 81 of which were detected experimentally in the sera of 13 validation samples and 29 found to have differential abundances in the sera of cancer patients versus controls. Overall, the novel information obtained in this study has led to identification of promising diagnostic markers for gastric cancer and can benefit further analyses of the key (early) abnormalities during its development.


Asunto(s)
Biomarcadores de Tumor/sangre , Neoplasias Gástricas/genética , Adulto , Factores de Edad , Anciano , Biomarcadores de Tumor/genética , Biomarcadores de Tumor/metabolismo , Biología Computacional , Perfilación de la Expresión Génica , Humanos , Masculino , Persona de Mediana Edad , Factores Sexuales , Neoplasias Gástricas/sangre , Neoplasias Gástricas/clasificación
9.
PLoS One ; 5(10): e13696, 2010 Oct 27.
Artículo en Inglés | MEDLINE | ID: mdl-21060876

RESUMEN

A comparative study of public gene-expression data of seven types of cancers (breast, colon, kidney, lung, pancreatic, prostate and stomach cancers) was conducted with the aim of deriving marker genes, along with associated pathways, that are either common to multiple types of cancers or specific to individual cancers. The analysis results indicate that (a) each of the seven cancer types can be distinguished from its corresponding control tissue based on the expression patterns of a small number of genes, e.g., 2, 3 or 4; (b) the expression patterns of some genes can distinguish multiple cancer types from their corresponding control tissues, potentially serving as general markers for all or some groups of cancers; (c) the proteins encoded by some of these genes are predicted to be blood secretory, thus providing potential cancer markers in blood; (d) the numbers of differentially expressed genes across different cancer types in comparison with their control tissues correlate well with the five-year survival rates associated with the individual cancers; and (e) some metabolic and signaling pathways are abnormally activated or deactivated across all cancer types, while other pathways are more specific to certain cancers or groups of cancers. The novel findings of this study offer considerable insight into these seven cancer types and have the potential to provide exciting new directions for diagnostic and therapeutic development.


Asunto(s)
Perfilación de la Expresión Génica , Neoplasias/genética , Humanos , Análisis de Secuencia por Matrices de Oligonucleótidos , Tasa de Supervivencia
10.
BMC Genomics ; 11: 291, 2010 May 10.
Artículo en Inglés | MEDLINE | ID: mdl-20459751

RESUMEN

BACKGROUND: Osmotic stress is caused by sudden changes in the impermeable solute concentration around a cell, which induces instantaneous water flow in or out of the cell to balance the concentration. Very little is known about the detailed response mechanism to osmotic stress in marine Synechococcus, one of the major oxygenic phototrophic cyanobacterial genera that contribute greatly to the global CO2 fixation. RESULTS: We present here a computational study of the osmoregulation network in response to hyperosmotic stress of Synechococcus sp strain WH8102 using comparative genome analyses and computational prediction. In this study, we identified the key transporters, synthetases, signal sensor proteins and transcriptional regulator proteins, and found experimentally that of these proteins, 15 genes showed significantly changed expression levels under a mild hyperosmotic stress. CONCLUSIONS: From the predicted network model, we have made a number of interesting observations about WH8102. Specifically, we found that (i) the organism likely uses glycine betaine as the major osmolyte, and others such as glucosylglycerol, glucosylglycerate, trehalose, sucrose and arginine as the minor osmolytes, making it efficient and adaptable to its changing environment; and (ii) sigma38, one of the seven types of sigma factors, probably serves as a global regulator coordinating the osmoregulation network and the other relevant networks.


Asunto(s)
Polisacáridos/metabolismo , Synechococcus/química , Synechococcus/metabolismo , Equilibrio Hidroelectrolítico , Arginina/metabolismo , Betaína/metabolismo , Synechococcus/enzimología
11.
Proc Natl Acad Sci U S A ; 107(14): 6310-5, 2010 Apr 06.
Artículo en Inglés | MEDLINE | ID: mdl-20308592

RESUMEN

It is generally known that bacterial genes working in the same biological pathways tend to group into operons, possibly to facilitate cotranscription and to provide stoichiometry. However, very little is understood about what may determine the global arrangement of bacterial genes in a genome beyond the operon level. Here we present evidence that the global arrangement of operons in a bacterial genome is largely influenced by the tendency that a bacterium keeps its operons encoding the same biological pathway in nearby genomic locations, and by the tendency to keep operons involved in multiple pathways in locations close to the other members of their participating pathways. We also observed that the activation frequencies of pathways also influence the genomic locations of their encoding operons, tending to have operons of the more frequently activated pathways more tightly clustered together. We have quantitatively assessed the influences on the global genomic arrangement of operons by different factors. We found that the current arrangements of operons in most of the bacterial genomes we studied tend to minimize the overall distance between consecutive operons of a same pathway across all pathways encoded in the genome.


Asunto(s)
Bacillus subtilis/genética , Escherichia coli/genética , Genoma Bacteriano , Operón , Familia de Multigenes
12.
FEBS Lett ; 584(1): 194-8, 2010 Jan 04.
Artículo en Inglés | MEDLINE | ID: mdl-19941858

RESUMEN

The genome of lethal animal pathogenic bacterium Enterohemorrhagic Escherichia coli (EHEC) O157:H7 is characterized by the presence of multiple pathogenicity islands (PAIs). Computational methods have been developed to identify PAIs based on the distinguishing G+C levels in some PAI versus non-PAI regions. We observed that PAIs can have a very similar G+C level to that of the host chromosome, which may have led to false negative predictions using these methods. We have applied a novel method of genomic barcodes to identify PAIs. Using this technique, we have successfully identified both known and novel PAIs in the genomes of three strains of EHEC O157:H7.


Asunto(s)
Composición de Base , Cromosomas Bacterianos/genética , Escherichia coli O157/patogenicidad , Islas Genómicas/genética , Genómica/métodos , Análisis de Secuencia de ADN/métodos , Escherichia coli O157/genética
13.
Artículo en Inglés | MEDLINE | ID: mdl-19407357

RESUMEN

Large sets of bioinformatical data provide a challenge in time consumption while solving the cluster identification problem, and that is why a parallel algorithm is so needed for identifying dense clusters in a noisy background. Our algorithm works on a graph representation of the data set to be analyzed. It identifies clusters through the identification of densely intraconnected subgraphs. We have employed a minimum spanning tree (MST) representation of the graph and solve the cluster identification problem using this representation. The computational bottleneck of our algorithm is the construction of an MST of a graph, for which a parallel algorithm is employed. Our high-level strategy for the parallel MST construction algorithm is to first partition the graph, then construct MSTs for the partitioned subgraphs and auxiliary bipartite graphs based on the subgraphs, and finally merge these MSTs to derive an MST of the original graph. The computational results indicate that when running on 150 CPUs, our algorithm can solve a cluster identification problem on a data set with 1,000,000 data points almost 100 times faster than on single CPU, indicating that this program is capable of handling very large data clustering problems in an efficient manner. We have implemented the clustering algorithm as the software CLUMP.


Asunto(s)
Algoritmos , Análisis por Conglomerados , Biología Computacional/métodos , Bases de Datos Genéticas , Reconocimiento de Normas Patrones Automatizadas/métodos , Modelos Lineales , Familia de Multigenes , Reproducibilidad de los Resultados , Programas Informáticos , Integración de Sistemas
14.
Genomics Proteomics Bioinformatics ; 7(4): 194-9, 2009 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-20172492

RESUMEN

Cellulases are important glycosyl hydrolases (GHs) that hydrolyze cellulose polymers into smaller oligosaccharides by breaking the cellulose beta (1-->4) bonds, and they are widely used to produce cellulosic ethanol from the plant biomass. N-linked and O-linked glycosylations were proposed to impact the catalytic efficiency, cellulose binding affinity and the stability of cellulases based on observations of individual cellulases. As far as we know, there has not been any systematic analysis of the distributions of N-linked and O-linked glycosylated residues in cellulases, mainly due to the limited annotations of the relevant functional domains and the glycosylated residues. We have computationally annotated the functional domains and glycosylated residues in cellulases, and conducted a systematic analysis of the distributions of the N-linked and O-linked glycosylated residues in these enzymes. Many N-linked glycosylated residues were known to be in the GH domains of cellulases, but they are there probably just by chance, since the GH domain usually occupies more than half of the sequence length of a cellulase. Our analysis indicates that the O-linked glycosylated residues are significantly enriched in the linker regions between the carbohydrate binding module (CBM) domains and GH domains of cellulases. Possible mechanisms are discussed.


Asunto(s)
Celulasas/química , Celulasas/metabolismo , Celulosa/metabolismo , Glicosilación , Estructura Terciaria de Proteína
15.
Nucleic Acids Res ; 37(Database issue): D459-63, 2009 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-18988623

RESUMEN

We present a database DOOR (Database for prOkaryotic OpeRons) containing computationally predicted operons of all the sequenced prokaryotic genomes. All the operons in DOOR are predicted using our own prediction program, which was ranked to be the best among 14 operon prediction programs by a recent independent review. Currently, the DOOR database contains operons for 675 prokaryotic genomes, and supports a number of search capabilities to facilitate easy access and utilization of the information stored in it. (1) Querying the database: the database provides a search capability for a user to find desired operons and associated information through multiple querying methods. (2) Searching for similar operons: the database provides a search capability for a user to find operons that have similar composition and structure to a query operon. (3) Prediction of cis-regulatory motifs: the database provides a capability for motif identification in the promoter regions of a user-specified group of possibly coregulated operons, using motif-finding tools. (4) Operons for RNA genes: the database includes operons for RNA genes. (5) OperonWiki: the database provides a wiki page (OperonWiki) to facilitate interactions between users and the developer of the database. We believe that DOOR provides a useful resource to many biologists working on bacteria and archaea, which can be accessed at http://csbl1.bmb.uga.edu/OperonDB.


Asunto(s)
Bases de Datos Genéticas , Genoma Arqueal , Genoma Bacteriano , Operón , Genómica , Programas Informáticos
16.
BMC Bioinformatics ; 9: 546, 2008 Dec 17.
Artículo en Inglés | MEDLINE | ID: mdl-19091119

RESUMEN

BACKGROUND: Each genome has a stable distribution of the combined frequency for each k-mer and its reverse complement measured in sequence fragments as short as 1000 bps across the whole genome, for 1

Asunto(s)
Algoritmos , Secuencia de Bases/genética , Biología Computacional/métodos , Genoma/genética , Genómica/métodos , Especificidad de la Especie
17.
J Bioinform Comput Biol ; 6(3): 585-602, 2008 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-18574864

RESUMEN

As a protein evolves, not every part of the amino acid sequence has an equal probability of being deleted or for allowing insertions, because not every amino acid plays an equally important role in maintaining the protein structure. However, the most prevalent models in fold recognition methods treat every amino acid deletion and insertion as equally probable events. We have analyzed the alignment patterns for homologous and analogous sequences to determine patterns of insertion and deletion, and used that information to determine the statistics of insertions and deletions for different amino acids of a target sequence. We define these patterns as insertion/deletion (indel) frequency arrays (IFAs). By applying IFAs to the protein threading problem, we have been able to improve the alignment accuracy, especially for proteins with low sequence identity. We have also demonstrated that the application of this information can lead to an improvement in fold recognition.


Asunto(s)
Biología Computacional , Mutación INDEL , Alineación de Secuencia/métodos , Análisis de Secuencia de Proteína/métodos , Datos de Secuencia Molecular , Mutagénesis Insercional/métodos , Conformación Proteica , Eliminación de Secuencia , Programas Informáticos , Relación Estructura-Actividad
18.
Comput Biol Chem ; 32(3): 176-84, 2008 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-18440870

RESUMEN

Functional classification of genes represents one of the most basic problems in genome analysis and annotation. Our analysis of some of the popular methods for functional classification of genes shows that these methods are not always consistent with each other and may not be specific enough for high-resolution gene functional annotations. We have developed a method to integrate genomic neighborhood information of genes with their sequence similarity information for the functional classification of prokaryotic genes. The application of our method to 93 proteobacterial genomes has shown that (i) the genomic neighborhoods are much more conserved across prokaryotic genomes than expected by chance, and such conservation can be utilized to improve functional classification of genes; (ii) while our method is consistent with the existing popular schemes as much as they are among themselves, it does provide functional classification at higher resolution and hence allows functional assignments of (new) genes at a more specific level; and (iii) our method is fairly stable when being applied to different genomes.


Asunto(s)
Algoritmos , Clasificación/métodos , Biología Computacional/métodos , Genes Bacterianos/fisiología , Genómica/métodos , Células Procariotas/fisiología , Análisis por Conglomerados , Simulación por Computador , Genes Bacterianos/genética , Sensibilidad y Especificidad
19.
BMC Genomics ; 9: 36, 2008 Jan 24.
Artículo en Inglés | MEDLINE | ID: mdl-18218090

RESUMEN

BACKGROUND: Mobile genetic elements (MGEs) play an essential role in genome rearrangement and evolution, and are widely used as an important genetic tool. RESULTS: In this article, we present genetic maps of recently active Insertion Sequence (IS) elements, the simplest form of MGEs, for all sequenced cyanobacteria and archaea, predicted based on the previously identified ~1,500 IS elements. Our predicted IS maps are consistent with the NCBI annotations of the IS elements. By linking the predicted IS elements to various characteristics of the organisms under study and the organism's living conditions, we found that (a) the activities of IS elements heavily depend on the environments where the host organisms live; (b) the number of recently active IS elements in a genome tends to increase with the genome size; (c) the flanking regions of the recently active IS elements are significantly enriched with genes encoding DNA binding factors, transporters and enzymes; and (d) IS movements show no tendency to disrupt operonic structures. CONCLUSION: This is the first genome-scale maps of IS elements with detailed structural information on the sequence level. These genetic maps of recently active IS elements and the several interesting observations would help to improve our understanding of how IS elements proliferate and how they are involved in the evolution of the host genomes.


Asunto(s)
Archaea/genética , Archaea/metabolismo , Cianobacterias/genética , Cianobacterias/metabolismo , Elementos Transponibles de ADN , Mutagénesis Insercional , Secuencia de Bases , Mapeo Cromosómico , Cromosomas Bacterianos , Genoma Arqueal , Genoma Bacteriano , Modelos Genéticos , Conformación de Ácido Nucleico , Sistemas de Lectura Abierta , Filogenia , Secuencias Repetitivas de Ácidos Nucleicos , Moldes Genéticos , Secuencias Repetidas Terminales
20.
Artículo en Inglés | MEDLINE | ID: mdl-17951836

RESUMEN

As a protein evolves, not every part of the amino acid sequence has an equal probability of being deleted or for allowing insertions, because not every amino acid plays an equally important role in maintaining the protein structure. However the most prevalent models in fold recognition methods treat every amino acid deletion and insertion as equally probable events. We have analyzed the alignment patterns for homologous and analogous sequences to determine patterns of insertion and deletions, and used that information to determine the statistics of insertions and deletions for different amino acids of a target sequence. We define these patterns as Insertion/Deletion (Indel) Frequency Arrays (IFA). By applying IFA to the protein threading problem, we have been able to improve the alignment accuracy, especially for proteins with low sequence identity.


Asunto(s)
Algoritmos , Proteínas/química , Proteínas/genética , Alineación de Secuencia/métodos , Análisis de Secuencia de Proteína/métodos , Análisis de Secuencia/métodos , Secuencia de Aminoácidos , Eliminación de Gen , Mutación INDEL , Datos de Secuencia Molecular , Relación Estructura-Actividad
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA