Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 32
Filtrar
Mais filtros

Bases de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
J Cancer Res Clin Oncol ; 150(5): 258, 2024 May 16.
Artigo em Inglês | MEDLINE | ID: mdl-38753091

RESUMO

PURPOSE: Breast cancer (BC) is the most prevalent malignant tumor worldwide among women, with the highest incidence rate. The mechanisms underlying nucleotide metabolism on biological functions in BC remain incompletely elucidated. MATERIALS AND METHODS: We harnessed differentially expressed nucleotide metabolism-related genes from The Cancer Genome Atlas-BRCA, constructing a prognostic risk model through univariate Cox regression and LASSO regression analyses. A validation set and the GSE7390 dataset were used to validate the risk model. Clinical relevance, survival and prognosis, immune infiltration, functional enrichment, and drug sensitivity analyses were conducted. RESULTS: Our findings identified four signature genes (DCTPP1, IFNG, SLC27A2, and MYH3) as nucleotide metabolism-related prognostic genes. Subsequently, patients were stratified into high- and low-risk groups, revealing the risk model's independence as a prognostic factor. Nomogram calibration underscored superior prediction accuracy. Gene Set Variation Analysis (GSVA) uncovered activated pathways in low-risk cohorts and mobilized pathways in high-risk cohorts. Distinctions in immune cells were noted between risk cohorts. Subsequent experiments validated that reducing SLC27A2 expression in BC cell lines or using the SLC27A2 inhibitor, Lipofermata, effectively inhibited tumor growth. CONCLUSIONS: We pinpointed four nucleotide metabolism-related prognostic genes, demonstrating promising accuracy as a risk prediction tool for patients with BC. SLC27A2 appears to be a potential therapeutic target for BC among these genes.


Assuntos
Neoplasias da Mama , Humanos , Feminino , Neoplasias da Mama/genética , Neoplasias da Mama/patologia , Prognóstico , Medição de Risco/métodos , Nucleotídeos/genética , Nomogramas , Biomarcadores Tumorais/genética , Biomarcadores Tumorais/metabolismo , Animais , Regulação Neoplásica da Expressão Gênica , Camundongos , Linhagem Celular Tumoral
2.
Genome Biol ; 22(1): 165, 2021 05 27.
Artigo em Inglês | MEDLINE | ID: mdl-34044851

RESUMO

Advancing RNA structural probing techniques with next-generation sequencing has generated demands for complementary computational tools to robustly extract RNA structural information amidst sampling noise and variability. We present diffBUM-HMM, a noise-aware model that enables accurate detection of RNA flexibility and conformational changes from high-throughput RNA structure-probing data. diffBUM-HMM is widely compatible, accounting for sampling variation and sequence coverage biases, and displays higher sensitivity than existing methods while robust against false positives. Our analyses of datasets generated with a variety of RNA probing chemistries demonstrate the value of diffBUM-HMM for quantitatively detecting RNA structural changes and RNA-binding protein binding sites.


Assuntos
Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala , Cadeias de Markov , Modelos Estatísticos , RNA/química , RNA/genética , Sequência de Bases , Sítios de Ligação , Bases de Dados Genéticas , Modelos Teóricos , Mutação/genética , Nucleotídeos/genética , Ligação Proteica , Precursores de RNA/genética , RNA Longo não Codificante/genética , Ribossomos/metabolismo
3.
Mol Phylogenet Evol ; 154: 106966, 2021 01.
Artigo em Inglês | MEDLINE | ID: mdl-32971285

RESUMO

Although numerous studies have demonstrated the theoretical and empirical importance of treating gaps as insertion/deletion (indel) events in phylogenetic analyses, the standard approach to maximum likelihood (ML) analysis employed in the vast majority of empirical studies codes gaps as nucleotides of unknown identity ("missing data"). Therefore, it is imperative to understand the empirical consequences of different numbers and distributions of gaps treated as missing data. We evaluated the effects of variation in the number and distribution of gaps (i.e., no base, coded as IUPAC "." or "-") treated as missing data (i.e., any base, coded as "?" or IUPAC "N") in standard ML analysis. We obtained alignments with variable numbers and arrangements of gaps by aligning seven diverse empirical datasets under different gap opening costs using MAFFT. We selected the optimal substitution model for each alignment using the corrected Akaike Information Criterion in jModelTest2 and searched for optimal trees using GARLI. We also employed a Monte Carlo approach to randomly replace nucleotides with gaps (treated as missing data) in an empirical dataset to understand more precisely the effects of varying their number and distribution. To compare alignments, we developed four new indices and used several existing measures to quantify the number and distribution of gaps in all alignments. Our most important finding is that ML scores correlate negatively with gap opening costs and the amount of missing data. However, this negative relationship is not due to the increase in missing data per se-which increases ML scores-but instead to the effect of gaps on nucleotide homology. These variables also cause significant but largely unpredictable effects on tree topology.


Assuntos
Filogenia , Bases de Dados Genéticas , Funções Verossimilhança , Método de Monte Carlo , Nucleotídeos/genética , Padrões de Referência , Alinhamento de Sequência
4.
PLoS Genet ; 16(10): e1009100, 2020 10.
Artigo em Inglês | MEDLINE | ID: mdl-33085659

RESUMO

Elucidating the functional consequence of molecular defects underlying genetic diseases enables appropriate design of therapeutic options. Treatment of cystic fibrosis (CF) is an exemplar of this paradigm as the development of CFTR modulator therapies has allowed for targeted and effective treatment of individuals harboring specific genetic variants. However, the mechanism of these drugs limits effectiveness to particular classes of variants that allow production of CFTR protein. Thus, assessment of the molecular mechanism of individual variants is imperative for proper assignment of these precision therapies. This is particularly important when considering variants that affect pre-mRNA splicing, thus limiting success of the existing protein-targeted therapies. Variants affecting splicing can occur throughout exons and introns and the complexity of the process of splicing lends itself to a variety of outcomes, both at the RNA and protein levels, further complicating assessment of disease liability and modulator response. To investigate the scope of this challenge, we evaluated splicing and downstream effects of 52 naturally occurring CFTR variants (exonic = 15, intronic = 37). Expression of constructs containing select CFTR intronic sequences and complete CFTR exonic sequences in cell line models allowed for assessment of RNA and protein-level effects on an allele by allele basis. Characterization of primary nasal epithelial cells obtained from individuals harboring splice variants corroborated in vitro data. Notably, we identified exonic variants that result in complete missplicing and thus a lack of modulator response (e.g. c.2908G>A, c.523A>G), as well as intronic variants that respond to modulators due to the presence of residual normally spliced transcript (e.g. c.4242+2T>C, c.3717+40A>G). Overall, our data reveals diverse molecular outcomes amongst both exonic and intronic variants emphasizing the need to delineate RNA, protein, and functional effects of each variant in order to accurately assign precision therapies.


Assuntos
Regulador de Condutância Transmembrana em Fibrose Cística/genética , Fibrose Cística/genética , Fibrose Cística/terapia , Splicing de RNA/genética , Processamento Alternativo/genética , Substituição de Aminoácidos/genética , Cloretos/metabolismo , Fibrose Cística/patologia , Eletromiografia , Éxons/genética , Variação Genética/genética , Células HEK293 , Humanos , Íntrons/genética , Mucosa Nasal/metabolismo , Mucosa Nasal/patologia , Nucleotídeos/genética , Medicina de Precisão/métodos , Cultura Primária de Células , RNA Mensageiro/genética
5.
J Math Biol ; 80(4): 995-1019, 2020 03.
Artigo em Inglês | MEDLINE | ID: mdl-31705189

RESUMO

Deciding whether a substitution matrix is embeddable (i.e. the corresponding Markov process has a continuous-time realization) is an open problem even for [Formula: see text] matrices. We study the embedding problem and rate identifiability for the K80 model of nucleotide substitution. For these [Formula: see text] matrices, we fully characterize the set of embeddable K80 Markov matrices and the set of embeddable matrices for which rates are identifiable. In particular, we describe an open subset of embeddable matrices with non-identifiable rates. This set contains matrices with positive eigenvalues and also diagonal largest in column matrices, which might lead to consequences in parameter estimation in phylogenetics. Finally, we compute the relative volumes of embeddable K80 matrices and of embeddable matrices with identifiable rates. This study concludes the embedding problem for the more general model K81 and its submodels, which had been initiated by the last two authors in a separate work.


Assuntos
Modelos Genéticos , Taxa de Mutação , Filogenia , Evolução Molecular , Cadeias de Markov , Conceitos Matemáticos , Mutação , Nucleotídeos/genética
6.
FEBS Lett ; 593(9): 918-925, 2019 05.
Artigo em Inglês | MEDLINE | ID: mdl-30941752

RESUMO

Base composition asymmetry and gene orientation bias are two common genomic structures in bacterial genomes. Here, correlation coefficients between nucleotide disparities and coding sequence (CDS) skew have been calculated, which provides insights into their relationship from an individual genome perspective. Consequently, we find GC and RY disparities correlate significantly with CDS skew, since around 60% of the bacterial genomes under study have correlation coefficients > 0.9. Then, we present a model for quantitative assessment of nucleotide disparity and CDS skew in which a numerical index R2 is used for evaluation. We find that skew curves with higher R2 perform better on the prediction of replication origins in bacteria.


Assuntos
Genoma Bacteriano/genética , Composição de Bases , Genômica , Modelos Genéticos , Nucleotídeos/genética
7.
Sci Rep ; 9(1): 3125, 2019 02 28.
Artigo em Inglês | MEDLINE | ID: mdl-30816181

RESUMO

Next generation sequencing (NGS) technologies play a powerful role in the preparation of large DNA databases such as DNA barcoding since it can produce a large number of sequence reads. Here we demonstrate a primer-induced sample labeling method aiming at sequencing a large number of samples simultaneously on NGS platforms. The strategy is to label samples with a unique oligo attached to the 5'-ends of primers. As a case study, 894 unique pentanucleotide oligoes were attached to the 5'-ends of three pairs of primers (for amplifying ITS, matK and rbcL) to label 894 samples. All PCR products of three barcodes of 894 samples were mixed together and sequenced on a high throughput sequencing platform. The results showed that 87.02%, 89.15% and 95.53% of the samples were successfully sequenced for rbcL, matK and ITS, respectively. The mean ratio of label mismatches for the three barcodes was 5.68%, and a sequencing depth of 30 ×to 40× was enough to obtain reliable sequences. It is flexible to label any number of samples simply by adjusting the length of oligoes. This easy, reliable and cost efficient method is useful in sequencing a large number of samples for construction of reference libraries for DNA barcoding, population biology and community phylogenetics.


Assuntos
Código de Barras de DNA Taxonômico/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Plantas/genética , Código de Barras de DNA Taxonômico/economia , Primers do DNA/genética , DNA de Plantas/genética , Sequenciamento de Nucleotídeos em Larga Escala/economia , Nucleotídeos/genética
8.
Mol Biol Rep ; 46(1): 1327-1333, 2019 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-30456740

RESUMO

We report complete mitochondrial genome of Northern Indian red muntjac, Muntiacus vaginalis, and its phylogenetic inferences. Mitogenome composition was 16,352 bp in length and its overall base composition in the circular genome was A = 33.2%, T = 29.0%, C = 24.50% and G = 13.30%. It exhibited a typical mitogenome structure, including 22 transfer RNA genes, 13 protein-coding genes, two ribosomal RNA genes and a major non-coding control region (D-loop region). All the genes except ND6 and eight tRNA's were encoded on the heavy strand. Phylogenetic analyses showed that M. vaginalis is closely related to M. muntjak and formed a sister relationship with Elaphodus cephalophus. In view of the unclear distribution range and escalating habitat loss, it is important to identify its population genetic status. The complete mitogenome described in this study can be used in further phylogenetics, identification of extant maternal lineage, evolutionary significance unit and its genetic conservation.


Assuntos
Genoma Mitocondrial , Cervo Muntjac/genética , Filogenia , Animais , Teorema de Bayes , Índia , Cadeias de Markov , Método de Monte Carlo , Nucleotídeos/genética , Fases de Leitura Aberta/genética , RNA Ribossômico/genética , RNA de Transferência/genética
9.
Biochem Biophys Res Commun ; 473(1): 243-248, 2016 Apr 22.
Artigo em Inglês | MEDLINE | ID: mdl-27005821

RESUMO

During DNA replication, bacterial helicase is recruited as a complex in association with loader proteins to unwind the parental duplex. Previous structural studies have reported saturated 6:6 helicase-loader complexes with different conformations. However, structural information on the sub-stoichiometric conformations of these previously-documented helicase-loader complexes remains elusive. Here, with the aid of single particle electron-microscopy (EM) image reconstruction, we present the Geobacillus kaustophilus HTA426 helicase-loader (DnaC-DnaI) complex with a 6:2 binding stoichiometry in the presence of ATPγS. In the 19 Šresolution EM map, the undistorted and unopened helicase ring holds a robust loader density above the C-terminal RecA-like domain. Meanwhile, the path of the central DNA binding channel appears to be obstructed by the reconstructed loader density, implying its potential role as a checkpoint conformation to prevent the loading of immature complex onto DNA. Our data also reveals that the bound nucleotides and the consequently induced conformational changes in the helicase hexamer are essential for active association with loader proteins. These observations provide fundamental insights into the formation of the helicase-loader complex in bacteria that regulates the DNA replication process.


Assuntos
Proteínas de Bactérias/química , DNA Helicases/química , Escherichia coli/metabolismo , Geobacillus/enzimologia , Trifosfato de Adenosina/análogos & derivados , Trifosfato de Adenosina/química , Sítios de Ligação , Replicação do DNA , DNA de Cadeia Simples/química , Proteínas de Escherichia coli/química , Hidrólise , Processamento de Imagem Assistida por Computador , Microscopia Eletrônica , Nucleotídeos/genética , Ligação Proteica , Estrutura Terciária de Proteína
10.
PLoS One ; 10(6): e0130411, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26121655

RESUMO

Most mutations are deleterious and require energetically costly repairs. Therefore, it seems that any minimization of mutation rate is beneficial. On the other hand, mutations generate genetic diversity indispensable for evolution and adaptation of organisms to changing environmental conditions. Thus, it is expected that a spontaneous mutational pressure should be an optimal compromise between these two extremes. In order to study the optimization of the pressure, we compared mutational transition probability matrices from bacterial genomes with artificial matrices fulfilling the same general features as the real ones, e.g., the stationary distribution and the speed of convergence to the stationarity. The artificial matrices were optimized on real protein-coding sequences based on Evolutionary Strategies approach to minimize or maximize the probability of non-synonymous substitutions and costs of amino acid replacements depending on their physicochemical properties. The results show that the empirical matrices have a tendency to minimize the effects of mutations rather than maximize their costs on the amino acid level. They were also similar to the optimized artificial matrices in the nucleotide substitution pattern, especially the high transitions/transversions ratio. We observed no substantial differences between the effects of mutational matrices on protein-coding sequences in genomes under study in respect of differently replicated DNA strands, mutational cost types and properties of the referenced artificial matrices. The findings indicate that the empirical mutational matrices are rather adapted to minimize mutational costs in the studied organisms in comparison to other matrices with similar mathematical constraints.


Assuntos
Genes Bacterianos , Genoma Bacteriano , Taxa de Mutação , Mutação , Algoritmos , Aminoácidos/química , Borrelia burgdorferi/genética , Chlamydia muridarum/genética , Chlamydia trachomatis/genética , Análise Mutacional de DNA , Reparo do DNA , Escherichia coli/genética , Evolução Molecular , Cadeias de Markov , Modelos Teóricos , Nucleotídeos/genética , Filogenia , Análise de Componente Principal , Rickettsia/genética , Staphylococcus aureus/genética , Streptococcus pyogenes/genética
11.
Molecules ; 19(12): 20113-27, 2014 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-25470277

RESUMO

Despite substantial advances in genotyping techniques and massively accumulated data over the past half century, a uniform measurement of neutral genetic diversity derived by different molecular markers across a wide taxonomical range has not yet been formulated. We collected genetic diversity data on seed plants derived by AFLP, allozyme, ISSR, RAPD, SSR and nucleotide sequences, converted expected heterozygosity (He) to nucleotide diversity (π), and reassessed the relationship between plant genetic diversity and life history traits or extinction risk. We successfully established a uniform π criterion and developed a comprehensive plant genetic diversity database. The mean population-level and species-level π values across seed plants were 0.00374 (966 taxa, 155 families, 47 orders) and 0.00569 (728 taxa, 130 families, 46 orders), respectively. Significant differences were recovered for breeding system (p < 0.001) at the population level and geographic range (p = 0.023) at the species level. Selfing taxa had significantly lower π values than outcrossing and mixed-mating taxa, whereas narrowly distributed taxa had significantly lower π values than widely distributed taxa. Despite significant differences between the two extreme threat categories (critically endangered and least concern), the genetic diversity reduction on the way to extinction was difficult to detect in early stages.


Assuntos
Variação Genética , Plantas/genética , Sementes/genética , Sequência de Bases , Heterozigoto , Modelos Lineares , Nucleotídeos/genética , Análise de Regressão , Estatísticas não Paramétricas
12.
Nat Commun ; 5: 4587, 2014 Aug 11.
Artigo em Inglês | MEDLINE | ID: mdl-25109325

RESUMO

Cytoplasmic dynein is a dimeric motor that transports intracellular cargoes towards the minus end of microtubules (MTs). In contrast to other processive motors, stepping of the dynein motor domains (heads) is not precisely coordinated. Therefore, the mechanism of dynein processivity remains unclear. Here, by engineering the mechanical and catalytic properties of the motor, we show that dynein processivity minimally requires a single active head and a second inert MT-binding domain. Processivity arises from a high ratio of MT-bound to unbound time, and not from interhead communication. In addition, nucleotide-dependent microtubule release is gated by tension on the linker domain. Intramolecular tension sensing is observed in dynein's stepping motion at high interhead separations. On the basis of these results, we propose a quantitative model for the stepping characteristics of dynein and its response to chemical and mechanical perturbation.


Assuntos
Trifosfato de Adenosina/química , Dineínas/química , Microtúbulos/química , Adenosina Trifosfatases/química , Animais , Citoplasma/metabolismo , Glutationa Transferase/metabolismo , Proteínas de Fluorescência Verde/química , Método de Monte Carlo , Movimento (Física) , Mutação , Nucleotídeos/química , Nucleotídeos/genética , Óptica e Fotônica , Conformação Proteica , Engenharia de Proteínas/métodos , Multimerização Proteica , Estrutura Terciária de Proteína , Saccharomyces cerevisiae/metabolismo , Ouriços-do-Mar , Estresse Mecânico , Thermus/metabolismo
13.
PLoS One ; 8(7): e69187, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23935949

RESUMO

Continuous-time Markov processes are often used to model the complex natural phenomenon of sequence evolution. To make the process of sequence evolution tractable, simplifying assumptions are often made about the sequence properties and the underlying process. The validity of one such assumption, time-homogeneity, has never been explored. Violations of this assumption can be found by identifying non-embeddability. A process is non-embeddable if it can not be embedded in a continuous time-homogeneous Markov process. In this study, non-embeddability was demonstrated to exist when modelling sequence evolution with Markov models. Evidence of non-embeddability was found primarily at the third codon position, possibly resulting from changes in mutation rate over time. Outgroup edges and those with a deeper time depth were found to have an increased probability of the underlying process being non-embeddable. Overall, low levels of non-embeddability were detected when examining individual edges of triads across a diverse set of alignments. Subsequent phylogenetic reconstruction analyses demonstrated that non-embeddability could impact on the correct prediction of phylogenies, but at extremely low levels. Despite the existence of non-embeddability, there is minimal evidence of violations of the local time homogeneity assumption and consequently the impact is likely to be minor.


Assuntos
Evolução Molecular , Cadeias de Markov , Modelos Genéticos , Mutação , Algoritmos , Animais , Humanos , Íntrons , Camundongos , Nucleotídeos/genética , Fases de Leitura Aberta/genética , Filogenia , Ratos
14.
Methods Mol Biol ; 942: 17-55, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23027044

RESUMO

Short interfering RNA (siRNA) has been widely used for studying gene function in mammalian cells but varies markedly in its gene silencing efficacy. Although many design rules/guidelines for effective siRNAs based on various criteria have been reported recently, there are only a few consistencies among them. This makes it difficult to select effective siRNA sequences in mammalian genes. This chapter first reviews the recently reported siRNA design guidelines and then proposes new methods for selecting effective siRNA sequences from many possible candidates by using decision tree learning, Bayes' theorem, and average silencing probability on the basis of a large number of known effective siRNAs. These methods differ from the previous score-based siRNA design techniques and can predict the probability that a candidate siRNA sequence will be effective. Evaluation of these methods by applying them to recently reported effective and ineffective siRNA sequences for a number of genes indicates that they would be useful for many other genes. They should, therefore, be of general utility for selecting effective siRNA sequences for mammalian genes. The chapter also describes another method using a hidden Markov model to select the optimal functional siRNAs and discusses the frequencies of combinations of two successive nucleotides as an important characteristic of effective siRNA sequences.


Assuntos
Engenharia Genética/métodos , RNA Interferente Pequeno/genética , Estatística como Assunto/métodos , Sequência de Bases , Teorema de Bayes , Árvores de Decisões , Inativação Gênica , Humanos , Cadeias de Markov , Nucleotídeos/genética
15.
J Biol Chem ; 287(46): 38442-8, 2012 Nov 09.
Artigo em Inglês | MEDLINE | ID: mdl-22942285

RESUMO

Polymerase δ is widely accepted as the lagging strand replicative DNA polymerase in eukaryotic cells. It forms a replication complex in the presence of replication factor C and proliferating cell nuclear antigen to perform efficient DNA synthesis in vivo. In this study, the human lagging strand holoenzyme was reconstituted in vitro. The rate of DNA synthesis of this holoenzyme, measured with a singly primed ssM13 DNA substrate, is 4.0 ± 0.4 nucleotides. Results from adenosine 5'-(3-thiotriphosphate) tetralithium salt (ATPγS) inhibition experiments revealed the nonprocessive characteristic of the human DNA polymerase (Pol δ) holoenzyme (150 bp for one binding event), consistent with data from chase experiments with catalytically inactive mutant Pol δ(AA). The ATPase activity of replication factor C was characterized and found to be stimulated ∼10-fold in the presence of both proliferating cell nuclear antigen and DNA, but the activity was not shut down by Pol δ in accord with rapid association/dissociation of the holoenzyme to/from DNA. It is noted that high concentrations of ATP inhibit the holoenzyme DNA synthesis activity, most likely due to its inhibition of the clamp loading process.


Assuntos
DNA Polimerase III/química , Holoenzimas/química , Trifosfato de Adenosina/análogos & derivados , Trifosfato de Adenosina/química , Trifosfato de Adenosina/metabolismo , Catálise , Simulação por Computador , DNA/genética , DNA/metabolismo , DNA Polimerase III/metabolismo , Replicação do DNA , Relação Dose-Resposta a Droga , Holoenzimas/genética , Humanos , Hidrólise , Cinética , Método de Monte Carlo , Nucleotídeos/química , Nucleotídeos/genética , Plasmídeos/metabolismo
16.
Biochem Genet ; 50(7-8): 642-56, 2012 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-22573136

RESUMO

Functional motif-directed profiling was performed with 15 nucleotide binding site (NBS) primer-enzyme combinations to identify and elucidate the phylogenetic relationships among 15 genotypes of the family Zingiberaceae. We retrieved 167 polymorphic bands (24.85 %), with an average of 11.13 bands per primer. Mean polymorphism rates were detected using MseI (26 %), RsaI (21 %), and AluI (28 %) as restriction enzymes. The polymorphism information content (PIC) for each NBS primer-enzyme combination ranged from 0.48 to 0.76 with a mean value of 0.65. The 38 NBS profiling markers had PIC values ranging from 0.3 to 0.6 and exhibited good power to discriminate between genotypes. Comparison of NBS profiling with microsatellite data for the same set of genotypes exhibited a correlation value of 0.78, P ≤ 0.001. Our study suggests that genetic variability assessment could be more efficient if it targeted genes that exhibit functionally relevant variation, rather than random markers.


Assuntos
Técnicas de Genotipagem/métodos , Motivos de Nucleotídeos , Nucleotídeos/genética , Nucleotídeos/metabolismo , Polimorfismo Genético/genética , Zingiberaceae/genética , Sítios de Ligação , Marcadores Genéticos/genética , Filogenia
17.
BMC Genomics ; 12: 245, 2011 May 19.
Artigo em Inglês | MEDLINE | ID: mdl-21592414

RESUMO

BACKGROUND: The rapid evolution of 454 GS-FLX sequencing technology has not been accompanied by a reassessment of the quality and accuracy of the sequences obtained. Current strategies for decision-making and error-correction are based on an initial analysis by Huse et al. in 2007, for the older GS20 system based on experimental sequences. We analyze here the quality of 454 sequencing data and identify factors playing a role in sequencing error, through the use of an extensive dataset for Roche control DNA fragments. RESULTS: We obtained a mean error rate for 454 sequences of 1.07%. More importantly, the error rate is not randomly distributed; it occasionally rose to more than 50% in certain positions, and its distribution was linked to several experimental variables. The main factors related to error are the presence of homopolymers, position in the sequence, size of the sequence and spatial localization in PT plates for insertion and deletion errors. These factors can be described by considering seven variables. No single variable can account for the error rate distribution, but most of the variation is explained by the combination of all seven variables. CONCLUSIONS: The pattern identified here calls for the use of internal controls and error-correcting base callers, to correct for errors, when available (e.g. when sequencing amplicons). For shotgun libraries, the use of both sequencing primers and deep coverage, combined with the use of random sequencing primer sites should partly compensate for even high error rates, although it may prove more difficult than previous thought to distinguish between low-frequency alleles and errors.


Assuntos
Análise de Sequência de DNA/métodos , Titânio , Humanos , Nucleotídeos/genética , Controle de Qualidade , Projetos de Pesquisa , Análise de Sequência de DNA/instrumentação , Análise de Sequência de DNA/normas
18.
BMC Bioinformatics ; 9: 511, 2008 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-19046431

RESUMO

BACKGROUND: The nucleotide substitution rate matrix is a key parameter of molecular evolution. Several methods for inferring this parameter have been proposed, with different mathematical bases. These methods include counting sequence differences and taking the log of the resulting probability matrices, methods based on Markov triples, and maximum likelihood methods that infer the substitution probabilities that lead to the most likely model of evolution. However, the speed and accuracy of these methods has not been compared. RESULTS: Different methods differ in performance by orders of magnitude (ranging from 1 ms to 10 s per matrix), but differences in accuracy of rate matrix reconstruction appear to be relatively small. Encouragingly, relatively simple and fast methods can provide results at least as accurate as far more complex and computationally intensive methods, especially when the sequences to be compared are relatively short. CONCLUSION: Based on the conditions tested, we recommend the use of method of Gojobori et al. (1982) for long sequences (> 600 nucleotides), and the method of Goldman et al. (1996) for shorter sequences (< 600 nucleotides). The method of Barry and Hartigan (1987) can provide somewhat more accuracy, measured as the Euclidean distance between the true and inferred matrices, on long sequences (> 2000 nucleotides) at the expense of substantially longer computation time. The availability of methods that are both fast and accurate will allow us to gain a global picture of change in the nucleotide substitution rate matrix on a genomewide scale across the tree of life.


Assuntos
Biologia Computacional/métodos , Análise Mutacional de DNA/métodos , Evolução Molecular , Nucleotídeos/genética , Algoritmos , Simulação por Computador , DNA/genética , Interpretação Estatística de Dados , Modelos Logísticos , Cadeias de Markov , Modelos Genéticos , Filogenia , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
19.
Mol Biol Evol ; 25(12): 2525-35, 2008 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-18682605

RESUMO

Markov models describing the evolution of the nucleotide substitution process, widely used in phylogeny reconstruction, usually assume the hypotheses of stationarity and time reversibility. Although these models give meaningful results when applied to biological data, it is not clear if the 2 assumptions mentioned above hold and, if not, how much sequence evolution processes deviate from them. To this aim, we introduce 2 sets of indices that can be calculated from the nucleotide distribution and the substitution rates. The stationarity indices (STIs) can be used to test the validity of the equilibrium assumption. The irreversibility indices (IRIs) are derived from the Kolmogorov cycle conditions for time reversibility and quantify the degree of nontime reversibility of a process. We have computed STIs and IRIs for the evolutionary process of 2 lineages, Drosophila simulans and Homo sapiens. In the latter case, we use a modified form of the indices that takes into account the CpG decay process. In both cases, we find statistically significant deviations from the ideal case of a process that has reached stationarity and is time reversible.


Assuntos
Modelos Genéticos , Nucleotídeos/genética , Animais , Drosophila/genética , Evolução Molecular , Humanos , Cadeias de Markov
20.
J Bioinform Comput Biol ; 3(2): 477-90, 2005 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-15852516

RESUMO

To date, the idea that microarray may shed the light on cellular processes by identifying groups of genes that appear to be co-expressed seems to remain a dream. This is partly because that there are some blank (meaning the knowledge is unavailable) or even erroneous areas in the fundamental theory in this field. This paper attempts to present the digest of microarray hybridization system with chemical thermodynamics, theoretically clarifying some misunderstandings and looking for answers to some critical questions around this technology, such as the mechanisms and conditions of quantitative measuring by hybridization reaction, the reasons of inconsistency of the data and the analysis results and the solutions, how to analyze the data, etc. A theoretical model for the next generation of microarray is proposed. We believe that this model is universal, laying the foundation for microarray technology from array design through the data analysis.


Assuntos
DNA/química , Perfilação da Expressão Gênica/métodos , Hibridização In Situ/métodos , Modelos Químicos , Nucleotídeos/química , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Avaliação da Tecnologia Biomédica , Simulação por Computador , DNA/análise , DNA/genética , Análise de Falha de Equipamento , Perfilação da Expressão Gênica/instrumentação , Hibridização In Situ/instrumentação , Modelos Genéticos , Nucleotídeos/análise , Nucleotídeos/genética , Análise de Sequência com Séries de Oligonucleotídeos/instrumentação , Termodinâmica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA