RESUMO
Sequence similarity between a translated nucleotide sequence and a known biological protein can provide strong evidence for the presence of a homologous coding region, even between distantly related genes. The computer program BLASTX performed conceptual translation of a nucleotide query sequence followed by a protein database search in one programmatic step. We characterized the sensitivity of BLASTX recognition to the presence of substitution, insertion and deletion errors in the query sequence and to sequence divergence. Reading frames were reliably identified in the presence of 1% query errors, a rate that is typical for primary sequence data. BLASTX is appropriate for use in moderate and large scale sequencing projects at the earliest opportunity, when the data are most prone to containing errors.
Assuntos
Bases de Dados Factuais , Proteínas/genética , Algoritmos , Sequência de Aminoácidos , Animais , Dados de Sequência Molecular , Mutação , Probabilidade , Ratos , Proteínas Ribossômicas/genética , Homologia de Sequência de Aminoácidos , SoftwareRESUMO
Colony-stimulating factor-1 (CSF-1) is essential for macrophage growth, differentiation and survival. Myeloid cells expressing a CSF-1 receptor mutant (DeltaKI) show markedly impaired CSF-1-mediated proliferation and survival, accompanied by absent signal transducers and activators of transcription 3 (Stat3) phosphorylation and reduced PI3-kinase/Akt activity. Restoring phosphatidylinositol 3-kinase (PI3-kinase) but not Stat3 signals reverses the mitogenic defect. CSF-1-induced proliferation and survival are sensitive to glycolytic inhibitors, 2-deoxyglucose and 3-bromopyruvate. Consistent with a critical role for PI3-kinase-regulated glycolysis, DeltaKI cells reconstituted with active PI3-kinase or Akt are hypersensitive to these inhibitors. CSF-1 upregulates hexokinase II (HKII) expression through PI3-kinase, and PI3-kinase transcriptionally activates the HKII promoter. Moreover, HKII overexpression partially restores mitogenicity. In contrast, Bcl-x(L) expression does not enhance long-term proliferation, although short-term cell death is suppressed in a glycolysis-independent manner. This study identifies robust PI3-kinase activation as essential for optimal CSF-1-mediated mitogenesis in myeloid cells, in part through regulation of HKII and support of glycolysis.
Assuntos
Proliferação de Células/efeitos dos fármacos , Fator Estimulador de Colônias de Macrófagos/farmacologia , Células Mieloides/citologia , Células Mieloides/efeitos dos fármacos , Fosfatidilinositol 3-Quinases/metabolismo , Animais , Apoptose/efeitos dos fármacos , Caspases/metabolismo , Sobrevivência Celular/efeitos dos fármacos , Estabilidade Enzimática/efeitos dos fármacos , MAP Quinases Reguladas por Sinal Extracelular/metabolismo , Glicólise/efeitos dos fármacos , Hexoquinase/metabolismo , Humanos , Camundongos , Proteínas Mutantes/metabolismo , Proteínas Proto-Oncogênicas c-akt/metabolismo , Receptor de Fator Estimulador de Colônias de Macrófagos/metabolismo , Fator de Transcrição STAT3/metabolismo , Transdução de Sinais/efeitos dos fármacos , Proteína bcl-X/metabolismoRESUMO
Molecular sequences are experimentally derived data that can be expected to contain errors as a result of diverse phenomena such as biological variation, molecular cloning artifacts, imperfect sequence determination, and data handling during contig assembly. Errors will affect the reliability of database searches and sequence alignments, but their impact may be minimized by the use of analytical techniques that anticipate that the data will be imperfect.
Assuntos
Interpretação Estatística de Dados , Dados de Sequência Molecular , Sequência de Aminoácidos , Reprodutibilidade dos TestesRESUMO
Colony-stimulating factor 1 (CSF-1) supports the proliferation, survival, and differentiation of bone marrow-derived cells of the monocytic lineage. In the myeloid progenitor 32D cell line expressing CSF-1 receptor (CSF-1R), CSF-1 activation of the extracellular signal-regulated kinase (ERK) pathway is both Ras and phosphatidylinositol 3-kinase (PI3-kinase) dependent. PI3-kinase inhibition did not influence events leading to Ras activation. Using the activity of the PI3-kinase effector, Akt, as readout, studies with dominant-negative and oncogenic Ras failed to place PI3-kinase downstream of Ras. Thus, PI3-kinase appears to act in parallel to Ras. PI3-kinase inhibitors enhanced CSF-1-stimulated A-Raf and c-Raf-1 activities, and dominant-negative A-Raf but not dominant-negative c-Raf-1 reduced CSF-1-provoked ERK activation, suggesting that A-Raf mediates a part of the stimulatory signal from Ras to MEK/ERK, acting in parallel to PI3-kinase. Unexpectedly, a CSF-1R lacking the PI3-kinase binding site (DeltaKI) remained capable of activating MEK/ERK in a PI3-kinase-dependent manner. To determine if Src family kinases (SFKs) are involved, we demonstrated that CSF-1 activated Fyn and Lyn in cells expressing wild-type (WT) or DeltaKI receptors. Moreover, CSF-1-induced Akt activity in cells expressing DeltaKI is SFK dependent since Akt activation was prevented by pharmacological or genetic inhibition of SFK activity. The docking protein Gab2 may link SFK to PI3-kinase. CSF-1 induced Gab2 tyrosyl phosphorylation and association with PI3-kinase in cells expressing WT or DeltaKI receptors. However, only in DeltaKI cells are these events prevented by PP1. Thus in myeloid progenitors, CSF-1 can activate the PI3-kinase/Akt pathway by at least two mechanisms, one involving direct receptor binding and one involving SFKs.
Assuntos
Proteínas Adaptadoras de Transdução de Sinal , Fator Estimulador de Colônias de Macrófagos/metabolismo , Proteína Quinase 1 Ativada por Mitógeno/metabolismo , Quinases de Proteína Quinase Ativadas por Mitógeno/metabolismo , Fosfatidilinositol 3-Quinases/metabolismo , Proteínas Serina-Treonina Quinases/metabolismo , Quinases da Família src/metabolismo , Animais , Sítios de Ligação , Linhagem Celular , AMP Cíclico/metabolismo , Ativação Enzimática , Proteína Adaptadora GRB2 , Humanos , Interleucina-3/metabolismo , Interleucina-3/farmacologia , MAP Quinase Quinase 1 , Camundongos , Inibidores de Fosfoinositídeo-3 Quinase , Proteínas/metabolismo , Proteínas Proto-Oncogênicas/metabolismo , Proteínas Proto-Oncogênicas c-akt , Proteínas Proto-Oncogênicas c-raf/metabolismo , Receptor de Fator Estimulador de Colônias de Macrófagos/metabolismo , Transdução de Sinais , Células-Tronco , Proteínas ras/metabolismoRESUMO
The apparently complete refolding of reduced bovine pancreatic trypsin inhibitor (BPTI) is shown to produce a mixture of two species. One of these is native BPTI, but the other lacks the disulphide bond between cysteines 30 and 51. The latter species has a folded conformation very like that of native BPTI, and is oxidized by air to native BPTI on warming in aqueous solution. The two unreactive cysteine thiol groups appear to be buried in the interior of the molecule, which restricts access by reagents that can alkylate them or oxidize them to form the disulphide bond. The implications of this intermediate and its conformation for the understanding of protein folding are discussed.
Assuntos
Aprotinina , Dissulfetos , Sequência de Aminoácidos , Animais , Bovinos , Cromatografia , Cisteína , Eletroforese , Espectroscopia de Ressonância Magnética , Conformação ProteicaRESUMO
Intermediates in the folding pathway of the bovine pancreatic trypsin inhibitor (PTI) have been examined by 1H nuclear magnetic resonance (n.m.r.). The intermediates were trapped during the reoxidation and consequent refolding of reduced PTI by alkylating free thiols; each intermediate contained different disulphide linkages. The n.m.r. spectra reveal that conformational features of the native protein are present in the intermediate containing just one of the three normal disulphide linkages (30-51). As additional normal disulphide bonds are formed, the conformation becomes more similar to that of the native protein. Introduction of additional but incorrect disulphide bonds does not lead to an increase in observable globular structure. A description of the folding process in terms of the conformations of the different intermediates is proposed. The significance of these results for the general mechanism of protein folding is outlined.
Assuntos
Aprotinina , Sequência de Aminoácidos , Animais , Bovinos , Espectroscopia de Ressonância Magnética , Conformação Proteica , Temperatura , TermodinâmicaRESUMO
The pH dependence of the exchange rates for a number of tryptophan and amide hydrogen atoms in hen egg-white lysozyme has been determined at temperatures well below the thermal denaturation temperature. The pH behaviour of each hydrogen is unique and can differ markedly from that of simple compounds. A model for electrostatic effects in proteins is described and used to explain a number of the features of the pH dependence of the exchange rates of certain hydrogens. The results indicate that exchange takes place from a conformation of the protein closely similar to that of the native protein, with local fluctuations providing the mechanism for exchange. For the more-buried hydrogens at low pH values there is a general increase in the exchange rates caused by the decreasing stability of the protein as calculated from the electrostatic model. The analysis shows how evidence from hydrogen exchange studies can be used to provide information about electrostatic interactions in localized regions of proteins. A description of the electrostatic model and some applications are given in the Appendix.
Assuntos
Hidrogênio/metabolismo , Muramidase/metabolismo , Sequência de Aminoácidos , Animais , Grupo dos Citocromos c , Eletrofisiologia , Concentração de Íons de Hidrogênio , Modelos Biológicos , Aves Domésticas , Conformação Proteica , Temperatura , Triptofano/metabolismoRESUMO
A computer program called BLASTX was previously shown to be effective in identifying and assigning putative function to likely protein coding regions by detecting significant similarity between a conceptually translated nucleotide query sequence and members of a protein sequence database. We present and assess the sensitivity of a new option to this software tool, herein called BLASTC, which employs information obtained from biases in codon utilization, along with the information obtained from sequence similarity. A rationale for combining these diverse information sources was derived, and analyses of the information available from codon utilization in several species were performed, with wide variation seen. Codon bias information was found on average to improve the sensitivity of detection of short coding regions of human origin by about a factor of 5. The implications of combining information sources on the interpretation of positive findings are discussed.
Assuntos
Códon , Análise de Sequência de DNA/métodos , Software , Algoritmos , Sequência de Aminoácidos , Animais , Bacillus subtilis , Sequência de Bases , Bases de Dados Factuais , Drosophila melanogaster , Escherichia coli , Humanos , Dados de Sequência Molecular , Saccharomyces cerevisiae , Schizosaccharomyces , Homologia de Sequência de Aminoácidos , Homologia de Sequência do Ácido NucleicoRESUMO
There is an inherent relationship between the process of pairwise sequence alignment and the estimation of evolutionary distance. This relationship is explored and made explicit. Assuming an evolutionary model and given a specific pattern of observed base mismatches, the relative probabilities of evolution at each evolutionary distance are computed using a Bayesian framework. The mean or the median of this probability distribution provides a robust estimate of the central value. The evolutionary distance has traditionally been computed as zero for an observed homology of 20 bases with no mismatches; we prove that it is highly probable that the distance is greater than 0.01. The mean of the distribution is 0.047, which is a better estimate of the evolutionary distance. Bayesian estimates of the evolutionary distance incorporate arbitrary prior information about variable mutation rates both over time and along sequence position, thus requiring only a weak form of the molecular-clock hypothesis. The endpoints of the similarity between genomic DNA sequences are often ambiguous. The probability of evolution at each evolutionary distance can be estimated over the entire set of alignments by choosing the best alignment at each distance and the corresponding probability of duplication at that evolutionary distance. A central value of this distribution provides a robust evolutionary distance estimate. We provide an efficient algorithm for computing the parametric alignment, considering evolutionary distance as the only parameter. These techniques and estimates are used to infer the duplication history of the genomic sequence in C. elegans and in S. cerevisiae. Our results indicate that repeats discovered using a single scoring matrix show a considerable bias in subsequent evolutionary distance estimates.
Assuntos
Teorema de Bayes , Evolução Biológica , Alinhamento de Sequência , Animais , Sequência de Bases , Caenorhabditis elegans/genética , Simulação por Computador , Análise Mutacional de DNA , Dados de Sequência Molecular , Probabilidade , Saccharomyces cerevisiae/genética , Homologia de Sequência do Ácido NucleicoRESUMO
In four-color fluourescence-based automated DNA sequencing, a 4 x 4 filter matrix parameterizes the relationship between the dye-intensity signals of interest and the data collected by an optical imaging system. The filter matrix is important because the estimated DNA sequence is based on the dye intensities that can only be recovered via inversion of the matrix. In this paper, we present a calibration method for the estimation of the columns of this matrix, using data generated through a special experiment in which DNA samples are labeled with only one fluorescent dye at a time. Simulations and applications of the method to real data are provided, with promising results.
Assuntos
Processamento de Imagem Assistida por Computador , Análise de Sequência de DNA/métodos , Algoritmos , Corantes , Simulação por Computador , Modelos Lineares , Modelos Genéticos , Óptica e Fotônica , Distribuição Aleatória , Processamento de Sinais Assistido por ComputadorAssuntos
Antígenos CD/química , Antígenos de Diferenciação/química , Aplysia/enzimologia , N-Glicosil Hidrolases/química , ADP-Ribosil Ciclase , ADP-Ribosil Ciclase 1 , Sequência de Aminoácidos , Animais , Humanos , Glicoproteínas de Membrana , Dados de Sequência Molecular , Homologia de Sequência de AminoácidosRESUMO
Determining whether two DNA sequences are similar is an essential component of DNA sequence analysis. Dynamic programming is the algorithm of choice if computational time is not the most important consideration. Heuristic search tools, such as BLAST, are computationally more efficient, but they may miss some of the sequence similarities (Altschul et al., 1990). These tools often use common k-tuples (words) between the two sequences to determine anchor points for the alignment, and spend most of their computational time extending the alignment beyond these anchor points. We discuss and provide a DNA sequence similarity search implementation (called SENSEI) that improves upon the performance of BLASTN by almost an order of magnitude for comparable sensitivity. This improvement is a result of using compactly encoded scoring tables for k-tuples, encoding bases with a single bit, filtering the sequence to remove the simple sequence repeats using XNUN, and masking the known species-specific repeats in the query sequence. To reduce memory requirements, especially for large genomic DNA query sequences, we recommend generating the neighborhood words from the target sequence at run-time, instead of generating them by preprocessing the query sequence.
Assuntos
Análise de Sequência de DNA/métodos , Homologia de Sequência do Ácido Nucleico , Sequência de Bases , Bases de Dados Factuais , Biblioteca Gênica , Glucosefosfato Desidrogenase/genética , Humanos , Dados de Sequência Molecular , Sequências Repetitivas de Ácido Nucleico , SoftwareRESUMO
Molecular sequences, like all experimental data, have finite error rates. The impact of errors on the information content of molecular sequence data is dependent on the analytic paradigm used to interpret the data. We studied the impact of nucleic acid sequence errors on the ability to align predicted amino acid sequences with the sequences of related proteins. We found that with a simultaneous translation and alignment algorithm, identification of sequence homologies is resilient to the introduction of random errors. Proteins with greater than 30% sequence identity can be reliably recognized even in the presence of 1% frameshifting (insertion or deletion) error rates and 5% base substitution rates. Incorporation of prior knowledge about the location and characteristics of errors improves tolerance to error of amino acid sequence alignments. Similarly, inclusion of prior knowledge of biased codon utilization by yeast (Saccharomyces cerevisiae) allows reliable detection of correct reading frames in yeast sequences even in the presence of 5% substitution and 1% frameshift errors.
Assuntos
Sequência de Aminoácidos , Sequência de Bases , Códon , Algoritmos , Animais , Teorema de Bayes , Humanos , Neutrófilos/enzimologia , Elastase Pancreática/genética , Saccharomyces cerevisiae/genética , Schistosoma/enzimologia , Homologia de Sequência do Ácido Nucleico , Tripsina/genéticaRESUMO
DNA sequence analysis depends on the accurate assembly of fragment reads for the determination of a consensus sequence. This report examines the possibility of analyzing multiple, independent restriction digests as a method for testing the fidelity of sequence assembly. A dynamic programming algorithm to determine the maximum likelihood alignment of error prone electrophoretic mobility data to the expected fragment mobilities given the consensus sequence and restriction enzymes is derived and used to assess the likelihood of detecting rearrangements in genomic sequencing projects. The method is shown to reliably detect errors in sequence fragment assembly without the necessity of making reference to an overlying physical map. An html form-based interface is available at http:/(/)www.ibc.wustl.edu/services/validate. html.
Assuntos
Mapeamento por Restrição/métodos , Análise de Sequência de DNA/métodos , Algoritmos , Inteligência Artificial , Sequência de Bases , DNA/genética , Impressões Digitais de DNA , Reprodutibilidade dos Testes , Mapeamento por Restrição/estatística & dados numéricos , Alinhamento de Sequência/métodos , Alinhamento de Sequência/estatística & dados numéricos , Análise de Sequência de DNA/estatística & dados numéricosRESUMO
Over 3.6 million bases of DNA sequence from chromosome III of the C. elegans have been determined. The availability of this extended region of contiguous sequence has allowed us to analyze the nature and prevalence of repetitive sequences in the genome of a eukaryotic organism with a high gene density. We have assembled a Repeat Pattern Toolkit (RPT) to analyze the patterns of repeats occurring in DNA. The tools include identifying significant local alignments (utilizing both two-way and three-way alignments), dividing the set of alignments into connected components (signifying repeat families), computing evolutionary distance between repeat family members, constructing minimum spanning trees from the connected components, and visualizing the evolution of the repeat families. Over 7000 families of repetitive sequences were identified. The size of the families ranged from isolated pairs to over 1600 segments of similar sequence. Approximately 12.3% of the analyzed sequence participates in a repeat element.
Assuntos
Caenorhabditis elegans/genética , Sequências Repetitivas de Ácido Nucleico/genética , Animais , Evolução Biológica , Genoma , Modelos Teóricos , Análise de SequênciaRESUMO
MOTIVATION: Searching a protein sequence database for homologs is a powerful tool for discovering the structure and function of a sequence. Two new methods for searching sequence databases have recently been described: Probabilistic Smith-Waterman (PSW), which is based on Hidden Markov models for a single sequence using a standard scoring matrix, and a new version of BLAST (WU-BLAST2), which uses Sum statistics for gapped alignments. RESULTS: This paper compares and contrasts the effectiveness of these methods with three older methods (Smith-Waterman: SSEARCH, FASTA and BLASTP). The analysis indicates that the new methods are useful, and often offer improved accuracy. These tools are compared using a curated (by Bill Pearson) version of the annotated portion of PIR 39. Three different statistical criteria are utilized: equivalence number, minimum errors and the receiver operating characteristic. For complete-length protein query sequences from large families, PSW's accuracy is superior to that of the other methods, but its accuracy is poor when used with partial-length query sequences. False negatives are twice as common as false positives irrespective of the search methods if a family-specific threshold score that minimizes the total number of errors (i.e. the most favorable threshold score possible) is used. Thus, sensitivity, not selectivity, is the major problem. Among the analyzed methods using default parameters, the best accuracy was obtained from SSEARCH and PSW for complete-length proteins, and the two BLAST programs, plus SSEARCH, for partial-length proteins.
Assuntos
Bases de Dados Factuais , Alinhamento de Sequência/métodos , Homologia de Sequência de Aminoácidos , Armazenamento e Recuperação da Informação , Computação Matemática , Proteínas/químicaRESUMO
Flexible regions of proteins play an important role in catalysis, ligand binding, and macromolecular interactions. Because of its enhanced sensitivity to motional narrowing, two-dimensional coupling constant J-correlated 1H NMR may be used to observe these regions selectively. Dynamic filtering is an intrinsic feature of this experiment because cross-peak amplitude decays rapidly as linewidths approach the coupling constant. We demonstrate here the flexibility of the NH2-terminal arm of phage lambda repressor, which is thought to wrap around the double helix in the repressor-operator complex. The assignment of arm resonances is made possible by the construction of mutant repressor genes containing successive NH2-terminal deletions.
Assuntos
Bacteriófago lambda/genética , Conformação Proteica , Proteínas Repressoras , Fatores de Transcrição , DNA Ligases , Escherichia coli/genética , Espectroscopia de Ressonância Magnética/métodos , MutaçãoRESUMO
MOTIVATION: Current methods for identifying sequence specific binding sites in DNA sequence using position specific weight matrices are limited in both sensitivity and specificity. Double strand DNA helix exhibits sequence dependent variations in conformation. Interactions between macromolecules result from complementarity of the two tertiary structures. We hypothesize that this conformational variation plays a role in transcription factor binding site recognition, and that the use of this structure information will improve the predictive power of transcription factor binding site models. RESULTS: Conformation models for the sequence dependence of DNA helix distortion have been developed. Using our conformational models, we defined a tertiary structure template for the met operon repressor MetJ binding site. Both naturally occurring sites and precursor binding sites identified through in vitro selection were used as the basis for template definition. The conformational model appears to recognize features of protein binding sites that are distinct from the features recognized by primary sequence based profiles. Combining the conformational model and primary sequence profile yields a hybrid model with improved discriminatory power compared with either the conformational model or sequence profile alone. Using our hybrid model, we searched the E.coli genome. We are able to identify the documented MetJ sites in the promoter regions of metA, metB, metC, metR and metF. In addition, we find several novel loci with characteristics suggesting that they are functional MetJ repressor binding sites. Novel MetJ binding sites are found upstream of the metK gene, as well as upstream of a gene, abc, a gene that encodes for a component of a multifunction transporter which may transport amino acids across the membrane. The false positive rate is significantly lower than the sequence profile method. AVAILABILITY: The programs of implementation of this algorithm are available upon request. The list of crystal structures used for compiling the mean base step parameters of DNA is available by anonymous ftp at http://stateslab.wustl.edu/pub/helix/StructureList.