Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 30
Filtrar
1.
Nat Chem Biol ; 10(5): 331-9, 2014 May.
Artigo em Inglês | MEDLINE | ID: mdl-24743257

RESUMO

If methane, the main component of natural gas, can be efficiently converted to liquid fuels, world reserves of methane could satisfy the demand for transportation fuels in addition to use in other sectors. However, the direct activation of strong C-H bonds in methane and conversion to desired products remains a difficult technological challenge. This perspective reveals an opportunity to rethink the logic of biological methane activation and conversion to liquid fuels. We formulate a vision for a new foundation for methane bioconversion and suggest paths to develop technologies for the production of liquid transportation fuels from methane at high carbon yield and high energy efficiency and with low CO2 emissions. These technologies could support natural gas bioconversion facilities with a low capital cost and at small scales, which in turn could monetize the use of natural gas resources that are frequently flared, vented or emitted.


Assuntos
Biocombustíveis , Metano/metabolismo , Aerobiose , Anaerobiose , Biotransformação , Oxirredução
2.
Biochemistry ; 53(2): 333-43, 2014 Jan 21.
Artigo em Inglês | MEDLINE | ID: mdl-24392967

RESUMO

Proton uptake accompanies the reduction of all known substrates by nitrogenase. As a consequence, a higher pH should limit the availability of protons as a substrate essential for turnover, thereby increasing the proportion of more highly reduced forms of the enzyme for further study. The utility of the high-pH approach would appear to be problematic in view of the observation reported by Pham and Burgess [(1993) Biochemistry 32, 13725-13731] that the MoFe-protein undergoes irreversible protein denaturation above pH 8.65. In contrast, we found by both enzyme activity and crystallographic analyses that the MoFe-protein is stable when incubated at pH 9.5. We did observe, however, that at higher pHs and under turnover conditions, the MoFe-protein is slowly inactivated. While a normal, albeit low, level of substrate reduction occurs under these conditions, the MoFe-protein undergoes a complex transformation; initially, the enzyme is reversibly inhibited for substrate reduction at pH 9.5, yet in a second, slower process, the MoFe-protein becomes irreversibly inactivated as measured by substrate reduction activity at the optimal pH of 7.8. The final inactivated MoFe-protein has an increased hydrodynamic radius compared to that of the native MoFe-protein, yet it has a full complement of iron and molybdenum. Significantly, the modified MoFe-protein retains the ability to specifically interact with its nitrogenase partner, the Fe-protein, as judged by the support of ATP hydrolysis and by formation of a tight complex with the Fe-protein in the presence of ATP and aluminum fluoride. The turnover-dependent inactivation coupled to conformational change suggests a mechanism-based transformation that may provide a new probe of nitrogenase catalysis.


Assuntos
Molibdoferredoxina/antagonistas & inibidores , Molibdoferredoxina/metabolismo , Nitrogenase/antagonistas & inibidores , Nitrogenase/metabolismo , Trifosfato de Adenosina/metabolismo , Azotobacter vinelandii/química , Cristalografia por Raios X , Concentração de Íons de Hidrogênio , Hidrólise , Modelos Moleculares , Molibdoferredoxina/química , Nitrogenase/química , Fatores de Tempo
3.
PLoS Comput Biol ; 8(10): e1002709, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-23055912

RESUMO

The effects of disease mutations on protein structure and function have been extensively investigated, and many predictors of the functional impact of single amino acid substitutions are publicly available. The majority of these predictors are based on protein structure and evolutionary conservation, following the assumption that disease mutations predominantly affect folded and conserved protein regions. However, the prevalence of the intrinsically disordered proteins (IDPs) and regions (IDRs) in the human proteome together with their lack of fixed structure and low sequence conservation raise a question about the impact of disease mutations in IDRs. Here, we investigate annotated missense disease mutations and show that 21.7% of them are located within such intrinsically disordered regions. We further demonstrate that 20% of disease mutations in IDRs cause local disorder-to-order transitions, which represents a 1.7-2.7 fold increase compared to annotated polymorphisms and neutral evolutionary substitutions, respectively. Secondary structure predictions show elevated rates of transition from helices and strands into loops and vice versa in the disease mutations dataset. Disease disorder-to-order mutations also influence predicted molecular recognition features (MoRFs) more often than the control mutations. The repertoire of disorder-to-order transition mutations is limited, with five most frequent mutations (R→W, R→C, E→K, R→H, R→Q) collectively accounting for 44% of all deleterious disorder-to-order transitions. As a proof of concept, we performed accelerated molecular dynamics simulations on a deleterious disorder-to-order transition mutation of tumor protein p63 and, in agreement with our predictions, observed an increased α-helical propensity of the region harboring the mutation. Our findings highlight the importance of mutations in IDRs and refine the traditional structure-centric view of disease mutations. The results of this study offer a new perspective on the role of mutations in disease, with implications for improving predictors of the functional impact of missense mutations.


Assuntos
Doença/genética , Modelos Genéticos , Mutação , Proteínas/genética , Arginina/genética , Análise por Conglomerados , Biologia Computacional , Humanos , Simulação de Dinâmica Molecular , Conformação Proteica , Proteínas/química , Proteínas/metabolismo , Análise de Sequência de DNA , Fatores de Transcrição , Proteínas Supressoras de Tumor
4.
Hum Hered ; 70(2): 102-8, 2010.
Artigo em Inglês | MEDLINE | ID: mdl-20606457

RESUMO

BACKGROUND/AIMS: There is a growing interest regarding the effect of differential misclassification on power and type I error rate in genome-wide association studies. We present an extension of a previously published test statistic: the likelihood ratio test allowing for errors (LRTAE). This test uses double-sample information on a subset of individuals to increase power for genetic association in the presence of nondifferential misclassification. METHODS: We extend the original LRTAE by allowing for differential genotype misclassification between case and control populations. We label this new statistic as LRT(D)A(M)E . We test the performance of this statistic with data simulated under differential misclassification specifications and two different types of genetic models: null and power. For simulations using the null model, we specify that there is no difference between case and control genotype frequencies before the introduction of errors. For simulations under power, we consider three modes of inheritance: dominant, multiplicative, and recessive. RESULTS: We show that the LRT(D)A(M)E , with p values computed using permutation, maintains a correct type I error rate under the null model after the introduction of differential genotyping errors. Also, we find that as little as 10 to 15% of double-sampled genotype data is needed to achieve this effect. Aside from a few situations (particularly recessive mode of inheritance simulations) the LRT(D)A(M)E version that calculates p values through permutation requires 15 to 20% double sampling to maintain an 80% power for a 0.05 significance level and approximately 20% double sampling for a 0.01 significance level.


Assuntos
Análise Custo-Benefício , Estudo de Associação Genômica Ampla/economia , Estudo de Associação Genômica Ampla/métodos , Modelos Estatísticos , Estudos de Casos e Controles , Simulação por Computador , Genótipo , Humanos , Modelos Genéticos
5.
Genet Epidemiol ; 33(2): 172-80, 2009 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-18814273

RESUMO

Individuals are frequently observed to have long segments of uninterrupted sequences of homozygous markers. One of the major mechanisms that gives rise to such long homozygous segments is consanguineous marriages, where parents pass shared chromosomal segments to their child. Such chromosomal segments are also known as autozygous segments. The clinical evidence that progeny from inbred individuals may have reduced health and fitness because of homozygosity of recessive alleles is well known. As the length of such homozygous segments depends on the degree of parental consanguinity, it would be logical to observe shorter homozygous segments in more outbred populations. However, a recent study identified long homozygous regions, thus likely to be autozygous segments in the HapMap populations. While an abundance of homozygous segments may significantly reduce the ability to fine map disease genes using association studies, detecting tracts of extended homozygosity related to disease status seems the natural next step in genome-wide association studies beyond allele, genotype and haplotype association analyses. In this study, we propose a new algorithm to map disease-related segments based on autozygosity using case-control data. The underlying rationale for the proposed method is that shared autozygosity regions that differ between diseased and healthy individuals may harbor mutations underlying diseases. Specifically, our algorithm uses a sliding-window framework and employs a logarithm of the odds score measure of autozygosity coupled with permutation-based methods to identify disease-related regions. We illustrate the advantage of the algorithm with its application to a genome-wide association study on Parkinson's disease.


Assuntos
Mapeamento Cromossômico/estatística & dados numéricos , Genoma Humano , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Algoritmos , Alelos , Cromossomos Humanos Par 20/genética , Consanguinidade , Feminino , Frequência do Gene , Genética Médica/estatística & dados numéricos , Homozigoto , Humanos , Escore Lod , Masculino , Epidemiologia Molecular/estatística & dados numéricos , Doença de Parkinson/genética , Polimorfismo de Nucleotídeo Único
6.
Proteins ; 78(2): 365-80, 2010 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-19722269

RESUMO

Ubiquitination plays an important role in many cellular processes and is implicated in many diseases. Experimental identification of ubiquitination sites is challenging due to rapid turnover of ubiquitinated proteins and the large size of the ubiquitin modifier. We identified 141 new ubiquitination sites using a combination of liquid chromatography, mass spectrometry, and mutant yeast strains. Investigation of the sequence biases and structural preferences around known ubiquitination sites indicated that their properties were similar to those of intrinsically disordered protein regions. Using a combined set of new and previously known ubiquitination sites, we developed a random forest predictor of ubiquitination sites, UbPred. The class-balanced accuracy of UbPred reached 72%, with the area under the ROC curve at 80%. The application of UbPred showed that high confidence Rsp5 ubiquitin ligase substrates and proteins with very short half-lives were significantly enriched in the number of predicted ubiquitination sites. Proteome-wide prediction of ubiquitination sites in Saccharomyces cerevisiae indicated that highly ubiquitinated substrates were prevalent among transcription/enzyme regulators and proteins involved in cell cycle control. In the human proteome, cytoskeletal, cell cycle, regulatory, and cancer-associated proteins display higher extent of ubiquitination than proteins from other functional categories. We show that gain and loss of predicted ubiquitination sites may likely represent a molecular mechanism behind a number of disease-associatedmutations. UbPred is available at http://www.ubpred.org.


Assuntos
Proteoma/análise , Proteínas de Saccharomyces cerevisiae/análise , Saccharomyces cerevisiae/metabolismo , Proteínas Ubiquitinadas/análise , Sequência de Aminoácidos , Bases de Dados de Proteínas , Complexos Endossomais de Distribuição Requeridos para Transporte/metabolismo , Humanos , Espectrometria de Massas , Dados de Sequência Molecular , Proteoma/metabolismo , Proteínas de Saccharomyces cerevisiae/metabolismo , Análise de Sequência de Proteína , Complexos Ubiquitina-Proteína Ligase/metabolismo , Proteínas Ubiquitinadas/metabolismo , Ubiquitinação
7.
Stat Appl Genet Mol Biol ; 7(1): Article23, 2008.
Artigo em Inglês | MEDLINE | ID: mdl-18673292

RESUMO

Genome-wide association studies are now widely used tools to identify genes and/or regions which may contribute to the development of various diseases. With case-control data a 2x3 contingency table can be constructed for each SNP to perform genotype-based tests of association. An increasingly common technique to increase the power to detect an association is to collapse each 2x3 table into a table assuming either a dominant or recessive mode of inheritance (2x2 table). We consider three different methods of determining which genetic model to choose and show that each of these methods of collapsing genotypes increases the type I error rate (i.e., the rate of false positives). However, one of these methods does lead to an increase in power compared with the usual genotype- and allele-based tests for most genetic models.


Assuntos
Estudos de Casos e Controles , Predisposição Genética para Doença , Genótipo , Modelos Genéticos , Análise de Sequência com Séries de Oligonucleotídeos/estatística & dados numéricos , Polimorfismo de Nucleotídeo Único , Alelos , Análise de Variância , Distribuição de Qui-Quadrado , Genes Dominantes , Genes Recessivos , Humanos , Projetos de Pesquisa
8.
Genomics ; 92(5): 273-8, 2008 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-18692127

RESUMO

While there have been significant advances in understanding the genetic etiology of human hair loss over the previous decade, there remain a number of hereditary disorders for which a causative gene has yet to be identified. We studied a large, consanguineous Brazilian family that presented with woolly hair at birth that progressed to severe hypotrichosis by the age of 5, in which 6 of the 14 offspring were affected. After exclusion of known candidate genes, a genome-wide scan was performed to identify the disease locus. Autozygosity mapping revealed a highly significant region of extended homozygosity (lod score of 10.41) that contained a haplotype with a linkage lod score of 3.28. Results of these two methods defined a 9-Mb region on chromosome 13q14.11-q14.2. The interval contains the P2RY5 gene, in which we recently identified pathogenic mutations in several families of Pakistani origin affected with autosomal recessive woolly and sparse hair. After the exclusion of several other candidate genes, we sequenced the P2RY5 gene and identified a homozygous mutation (C278Y) in all affected individuals in this family. Our findings show that mutations in P2RY5 display variable expressivity, underlying both hypotrichosis and woolly hair, and underscore the essential role of P2RY5 in the tissue integrity and maintenance of the hair follicle.


Assuntos
Genes Recessivos , Predisposição Genética para Doença , Hipotricose/genética , Mutação , Receptores Purinérgicos P2/genética , Sequência de Aminoácidos , Animais , Brasil , Mapeamento Cromossômico , Cromossomos Humanos Par 13/genética , Feminino , Ligação Genética , Humanos , Escore Lod , Masculino , Camundongos , Modelos Moleculares , Dados de Sequência Molecular , Linhagem , Polimorfismo de Nucleotídeo Único , Análise de Sequência de DNA
9.
Nucleic Acids Res ; 34(1): 305-12, 2006.
Artigo em Inglês | MEDLINE | ID: mdl-16407336

RESUMO

Serine/arginine-rich (SR) splicing factors play an important role in constitutive and alternative splicing as well as during several steps of RNA metabolism. Despite the wealth of functional information about SR proteins accumulated to-date, structural knowledge about the members of this family is very limited. To gain a better insight into structure-function relationships of SR proteins, we performed extensive sequence analysis of SR protein family members and combined it with ordered/disordered structure predictions. We found that SR proteins have properties characteristic of intrinsically disordered (ID) proteins. The amino acid composition and sequence complexity of SR proteins were very similar to those of the disordered protein regions. More detailed analysis showed that the SR proteins, and their RS domains in particular, are enriched in the disorder-promoting residues and are depleted in the order-promoting residues as compared to the entire human proteome. Moreover, disorder predictions indicated that RS domains of SR proteins were completely unstructured. Two different classification methods, the charge-hydropathy measure and the cumulative distribution function (CDF) of the disorder scores, were in agreement with each other, and they both strongly predicted members of the SR protein family to be disordered. This study emphasizes the importance of the disordered structure for several functions of SR proteins, such as for spliceosome assembly and for interaction with multiple partners. In addition, it demonstrates the usefulness of order/disorder predictions for inferring protein structure from sequence.


Assuntos
Splicing de RNA , Proteínas de Ligação a RNA/química , Proteínas de Ligação a RNA/classificação , Aminoácidos/análise , Arginina/análise , Humanos , Estrutura Terciária de Proteína , Análise de Sequência de Proteína , Serina/análise
10.
BMC Genomics ; 8: 238, 2007 Jul 16.
Artigo em Inglês | MEDLINE | ID: mdl-17634103

RESUMO

BACKGROUND: Studies of association methods using DNA pooling of single nucleotide polymorphisms (SNPs) have focused primarily on the effects of "machine-error", number of replicates, and the size of the pool. We use the non-centrality parameter (NCP) for the analysis of variance test to compute the approximate power for genetic association tests with DNA pooling data on cases and controls. We incorporate genetic model parameters into the computation of the NCP. Parameters involved in the power calculation are disease allele frequency, frequency of the marker SNP allele in coupling with the disease locus, disease prevalence, genotype relative risk, sample size, genetic model, number of pools, number of replicates of each pool, and the proportion of variance of the pooled frequency estimate due to machine variability. We compute power for different settings of number of replicates and total number of genotypings when the genetic model parameters are fixed. Several significance levels are considered, including stringent significance levels (due to the increasing popularity of 100 K and 500 K SNP "chip" data). We use a factorial design with two to four settings of each parameter and multiple regression analysis to assess which parameters most significantly affect power. RESULTS: The power can increase substantially as the genotyping number increases. For a fixed number of genotypings, the power is a function of the number of replicates of each pool such that there is a setting with maximum power. The four most significant parameters affecting power for association are: (1) genotype relative risk, (2) genetic model, (3) sample size, and (4) the interaction term between disease and SNP marker allele probabilities. CONCLUSION: For a fixed number of genotypings, there is an optimal number of replicates of each pool that increases as the number of genotypings increases. Power is not substantially reduced when the number of replicates is close to but not equal to the optimal setting.


Assuntos
Análise Mutacional de DNA/métodos , Predisposição Genética para Doença , Genética Populacional/economia , Modelos Genéticos , Projetos de Pesquisa , Manejo de Espécimes , Estudos de Casos e Controles , Análise Custo-Benefício , Análise Mutacional de DNA/economia , Frequência do Gene , Genética Populacional/métodos , Genótipo , Humanos , Desequilíbrio de Ligação , Polimorfismo de Nucleotídeo Único , Análise de Regressão
11.
PLoS Comput Biol ; 2(8): e100, 2006 Aug 04.
Artigo em Inglês | MEDLINE | ID: mdl-16884331

RESUMO

Recent proteome-wide screening approaches have provided a wealth of information about interacting proteins in various organisms. To test for a potential association between protein connectivity and the amount of predicted structural disorder, the disorder propensities of proteins with various numbers of interacting partners from four eukaryotic organisms (Caenorhabditis elegans, Saccharomyces cerevisiae, Drosophila melanogaster, and Homo sapiens) were investigated. The results of PONDR VL-XT disorder analysis show that for all four studied organisms, hub proteins, defined here as those that interact with > or = 10 partners, are significantly more disordered than end proteins, defined here as those that interact with just one partner. The proportion of predicted disordered residues, the average disorder score, and the number of predicted disordered regions of various lengths were higher overall in hubs than in ends. A binary classification of hubs and ends into ordered and disordered subclasses using the consensus prediction method showed a significant enrichment of wholly disordered proteins and a significant depletion of wholly ordered proteins in hubs relative to ends in worm, fly, and human. The functional annotation of yeast hubs and ends using GO categories and the correlation of these annotations with disorder predictions demonstrate that proteins with regulation, transcription, and development annotations are enriched in disorder, whereas proteins with catalytic activity, transport, and membrane localization annotations are depleted in disorder. The results of this study demonstrate that intrinsic structural disorder is a distinctive and common characteristic of eukaryotic hub proteins, and that disorder may serve as a determinant of protein interactivity.


Assuntos
Proteínas de Caenorhabditis elegans/metabolismo , Proteínas de Transporte/metabolismo , Proteínas de Drosophila/metabolismo , Proteínas ELAV/metabolismo , Ligases/metabolismo , Proteínas de Saccharomyces cerevisiae/metabolismo , Aminoácidos/química , Animais , Caenorhabditis elegans/química , Caenorhabditis elegans/genética , Caenorhabditis elegans/metabolismo , Proteínas de Caenorhabditis elegans/química , Proteínas de Caenorhabditis elegans/classificação , Proteínas de Caenorhabditis elegans/genética , Proteínas de Transporte/química , Proteínas de Transporte/classificação , Proteínas de Transporte/genética , Biologia Computacional , Proteínas de Drosophila/química , Proteínas de Drosophila/classificação , Proteínas de Drosophila/genética , Drosophila melanogaster/química , Drosophila melanogaster/genética , Drosophila melanogaster/metabolismo , Proteínas ELAV/química , Proteínas ELAV/classificação , Proteínas ELAV/genética , Proteína Semelhante a ELAV 2 , Humanos , Ligases/química , Ligases/classificação , Ligases/genética , Modelos Moleculares , Ligação Proteica , Estrutura Terciária de Proteína , Saccharomyces cerevisiae/química , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/química , Proteínas de Saccharomyces cerevisiae/classificação , Proteínas de Saccharomyces cerevisiae/genética
12.
Structure ; 25(7): 978-987.e4, 2017 07 05.
Artigo em Inglês | MEDLINE | ID: mdl-28578873

RESUMO

Nitroreductase (NR) from Enterobacter cloacae reduces diverse nitroaromatics including herbicides, explosives, and prodrugs, and holds promise for bioremediation, prodrug activation, and enzyme-assisted synthesis. We solved crystal structures of NR complexes with bound substrate or analog for each of its two half-reactions. We complemented these with kinetic isotope effect (KIE) measurements elucidating H-transfer steps essential to each half-reaction. KIEs indicate hydride transfer from NADH to the flavin consistent with our structure of NR with the NADH analog nicotinic acid adenine dinucleotide (NAAD). The KIE on reduction of p-nitrobenzoic acid (p-NBA) also indicates hydride transfer, and requires revision of prior computational mechanisms. Our mechanistic information provided a structural restraint for the orientation of bound substrate, placing the nitro group closer to the flavin N5 in the pocket that binds the amide of NADH. KIEs show that solvent provides a proton, enabling accommodation of different nitro group placements, consistent with the broad repertoire of NR.


Assuntos
Proteínas de Bactérias/química , Nitrorredutases/química , Proteínas de Bactérias/metabolismo , Sítios de Ligação , Enterobacter cloacae/enzimologia , Flavinas/metabolismo , NAD/metabolismo , Nitrobenzoatos/metabolismo , Nitrorredutases/metabolismo , Ligação Proteica , Especificidade por Substrato
13.
Invest Ophthalmol Vis Sci ; 47(12): 5453-9, 2006 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-17122136

RESUMO

PURPOSE: To investigate further the genetic contribution to age-related macular degeneration (AMD), increasing the power of a previous analysis and reproducing the original findings. METHODS: A large cohort of families with this condition was assembled, and an expanded genome scan was performed with 556 microsatellite markers. In 2003, the results were reported of a genome-wide linkage analysis of 70 of these pedigrees. Members of 51 new families have now been ascertained and many of the original pedigrees expanded. Parametric and nonparametric linkage analyses were performed with a denser map of markers. In addition, analyses were performed with the sample stratified by age at ascertainment and by two major advanced phenotypes for the disease: neovascular AMD (choroidal neovascularization) and geographic atrophy. RESULTS: The results corroborate the macular degeneration-susceptibility loci consistently reported by the authors and others in genome-wide scans. New loci were identified, including the finding of a two-point HLOD of 3.70 at 6q25.2. CONCLUSIONS: The results suggest that the use of families enriched in predisposition to AMD has legitimacy. Genetic analyses of a genome-wide scan performed on our large cohort of families add further confirmatory evidence that susceptibility loci lie on 1q, 3p, 9q, and 10q. Furthermore, new loci have been identified, including a locus on 6q.


Assuntos
Mapeamento Cromossômico , Predisposição Genética para Doença , Genoma Humano , Degeneração Macular/genética , Idoso , Cromossomos Humanos Par 1/genética , Cromossomos Humanos Par 10/genética , Cromossomos Humanos Par 3/genética , Cromossomos Humanos Par 9/genética , Saúde da Família , Ligação Genética , Humanos , Escore Lod , Repetições de Microssatélites , Linhagem
14.
BMC Genet ; 7: 24, 2006 Apr 27.
Artigo em Inglês | MEDLINE | ID: mdl-16689984

RESUMO

BACKGROUND: In the field of statistical genetics, phenotype and genotype misclassification errors can substantially reduce power to detect association with genetic case/control studies. Misclassification also can bias population frequency parameters such as genotype, haplotype, or multi-locus genotype frequencies. These problems are of particular concern in case/control designs because, short of repeated sampling, there is no way to detect misclassification errors. We developed a double-sampling procedure for case/control genetic association using a likelihood ratio test framework. Different approaches have been proposed to deal with misclassification errors. We have chosen the likelihood framework because of the ease with which misclassification probabilities may be incorporated into in the statistical framework and hypothesis testing. The statistic is called the Likelihood Ratio Test allowing for errors (LRTae) and is freely available via software download. RESULTS: We applied our procedure to 10,000 replicates of simulated case/control data in which we introduced phenotype misclassification errors. The phenotype considered is Ankylosing Spondylitis (AS). The LRTae method power was always greater than LRTstd power for the significance levels considered (5%, 1%, 0.1%, 0.01%). Power gains for the LRTae method over the LRTstd method increased as the significance level became more stringent. Multi-locus genotype frequency estimates using LRTae method were more accurate than estimates using LRTstd method. CONCLUSION: The LRTae method can be applied to single-locus genotypes, multi-locus genotypes, or multi-locus haplotypes in a case/control framework and can be more powerful to detect association in case/control studies when both genotype and/or phenotype errors are present. Furthermore, the LRTae method provides asymptotically unbiased estimates of case and control genotype frequencies, as well as estimates of phenotype and/or genotype misclassification rates.


Assuntos
Interpretação Estatística de Dados , Projetos de Pesquisa , Estudos de Casos e Controles , Genética Populacional , Genótipo , Humanos , Modelos Genéticos , Modelos Estatísticos , Fenótipo , Reprodutibilidade dos Testes
15.
Stat Appl Genet Mol Biol ; 4: Article37, 2005.
Artigo em Inglês | MEDLINE | ID: mdl-16646856

RESUMO

It is well established that phenotype and genotype misclassification errors reduce the power to detect genetic association. Resampling a subset of the data (e.g, double-sampling) of genotype and/or phenotype with a gold standard measurement is one method to address this issue. We derive the non-centrality parameter (NCP) for the recently published Likelihood Ratio Test Allowing for Error (LRTae) in the presence of random phenotype and genotype errors. With the NCP, power and sample size can be analytically determined at any significance level. We verify analytic power with simulations using a 2**k factorial design given high and low settings of: case and control genotype frequencies, phenotype and genotype misclassification probabilities, total sample size, ratio of cases to controls, and proportions of phenotype and/or genotype double-samples. We also perform example applications of our method assuming equal costs for the LRTae method and the standard method that does not use double-sample information (LRTstd) to determine if power gain due to double-sampling a proportion of samples outweighs the reduction in sample size due to additional costs in obtaining double-samples. Our results showed a median difference of at most 0.01 between analytic and simulation power for the factorial design settings, with maximum difference of 0.054. For our cost/benefits analysis calculations, results for genotype errors are that double-sampling appears most beneficial (in terms of power gain) when cost of double-sampling is relatively low, irrespective of the proportion of individuals double-sampled. In the presence of phenotype error, there is always power gain using the LRTae method for the parameter settings considered. We have freely available software that performs power and sample size calculations for the LRTae method and cost/benefits analyses comparing power for LRTae and LRTstd methods assuming equal costs.

16.
BMC Genet ; 6 Suppl 1: S150, 2005 Dec 30.
Artigo em Inglês | MEDLINE | ID: mdl-16451611

RESUMO

BACKGROUND: Two factors impacting robustness of the original transmission disequilibrium test (TDT) are: i) missing parental genotypes and ii) undetected genotype errors. While it is known that independently these factors can inflate false-positive rates for the original TDT, no study has considered either the joint impact of these factors on false-positive rates or the precision score of TDT statistics regarding these factors. By precision score, we mean the absolute difference between disease gene position and the position of markers whose TDT statistic exceeds some threshold. METHODS: We apply our transmission disequilibrium test allowing for errors (TDTae) and the original TDT to phenotype and modified single-nucleotide polymorphism genotype simulation data from Genetic Analysis Workshop. We modify genotype data by randomly introducing genotype errors and removing a percentage of parental genotype data. We compute empirical distributions of each statistic's precision score for a chromosome harboring a simulated disease locus. We also consider inflation in type I error by studying markers on a chromosome harboring no disease locus. RESULTS: The TDTae shows median precision scores of approximately 13 cM, 2 cM, 0 cM, and 0 cM at the 5%, 1%, 0.1%, and 0.01% significance levels, respectively. By contrast, the original TDT shows median precision scores of approximately 23 cM, 21 cM, 15 cM, and 7 cM at the corresponding significance levels, respectively. For null chromosomes, the original TDT falsely rejects the null hypothesis for 28.8%, 14.8%, 5.4%, and 1.7% at the 5%, 1%, 0.1% and 0.01%, significance levels, respectively, while TDTae maintains the correct false-positive rate. CONCLUSION: Because missing parental genotypes and undetected genotype errors are unknown to the investigator, but are expected to be increasingly prevalent in multilocus datasets, we strongly recommend TDTae methods as a standard procedure, particularly where stricter significance levels are required.


Assuntos
Desequilíbrio de Ligação/genética , Modelos Estatísticos , Pais , Cromossomos Humanos Par 3/genética , Feminino , Marcadores Genéticos , Genótipo , Humanos , Masculino , Modelos Genéticos , Linhagem , Penetrância , Polimorfismo de Nucleotídeo Único/genética , Projetos de Pesquisa
17.
BMC Genet ; 6: 18, 2005 Apr 08.
Artigo em Inglês | MEDLINE | ID: mdl-15819990

RESUMO

BACKGROUND: Phenotype error causes reduction in power to detect genetic association. We present a quantification of phenotype error, also known as diagnostic error, on power and sample size calculations for case-control genetic association studies between a marker locus and a disease phenotype. We consider the classic Pearson chi-square test for independence as our test of genetic association. To determine asymptotic power analytically, we compute the distribution's non-centrality parameter, which is a function of the case and control sample sizes, genotype frequencies, disease prevalence, and phenotype misclassification probabilities. We derive the non-centrality parameter in the presence of phenotype errors and equivalent formulas for misclassification cost (the percentage increase in minimum sample size needed to maintain constant asymptotic power at a fixed significance level for each percentage increase in a given misclassification parameter). We use a linear Taylor Series approximation for the cost of phenotype misclassification to determine lower bounds for the relative costs of misclassifying a true affected (respectively, unaffected) as a control (respectively, case). Power is verified by computer simulation. RESULTS: Our major findings are that: (i) the median absolute difference between analytic power with our method and simulation power was 0.001 and the absolute difference was no larger than 0.011; (ii) as the disease prevalence approaches 0, the cost of misclassifying a unaffected as a case becomes infinitely large while the cost of misclassifying an affected as a control approaches 0. CONCLUSION: Our work enables researchers to specifically quantify power loss and minimum sample size requirements in the presence of phenotype errors, thereby allowing for more realistic study design. For most diseases of current interest, verifying that cases are correctly classified is of paramount importance.


Assuntos
Estudos de Casos e Controles , Erros de Diagnóstico , Predisposição Genética para Doença/genética , Modelos Genéticos , Doença de Alzheimer/genética , Apolipoproteínas E/genética , Testes Genéticos , Humanos , Fenótipo , Tamanho da Amostra
18.
Stat Appl Genet Mol Biol ; 3: Article26, 2004.
Artigo em Inglês | MEDLINE | ID: mdl-16646805

RESUMO

Phenotype and/or genotype misclassification can: significantly increase type II error probabilities for genetic case/control association, causing decrease in statistical power; and produce inaccurate estimates of population frequency parameters. We present a method, the likelihood ratio test allowing for errors (LRTae) that incorporates double-sample information for phenotypes and/or genotypes on a sub-sample of cases/controls. Population frequency parameters and misclassification probabilities are determined using a double-sample procedure as implemented in the Expectation-Maximization (EM) method. We perform null simulations assuming a SNP marker or a 4-allele (multi-allele) marker locus. To compare our method with the standard method that makes no adjustment for errors (LRTstd), we perform power simulations using a 2/k factorial design with high and low settings of: case/control samples, phenotype/genotype costs, double-sampled phenotypes/genotypes costs, phenotype/genotype error, and proportions of double-sampled individuals. All power simulations are performed fixing equal costs for the LRTstd and LRTae methods. We also consider case/control ApoE genotype data for an actual Alzheimer's study. The LRTae method maintains correct type I error proportions for all null simulations and all significance level thresholds (10%, 5%, 1%). LRTae average estimates of population frequencies and misclassification probabilities are equal to the true values, with variances of 10e-7 to 10e-8. For power simulations, the median power difference LRTae-LRTstd at the 5% significance level is 0.06 for multi-allele data and 0.01 for SNP data. For the ApoE data example, the LRTae and LRTstd p-values are 5.8 x 10e-5 and 1.6 x 10e-3, respectively. The increase in significance is due to adjustment in the LRTae for misclassification of the most commonly reported risk allele. We have developed freely available software that performs our LRTae statistic.

19.
Eur J Hum Genet ; 12(9): 752-61, 2004 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-15162128

RESUMO

Two issues regarding the robustness of the original transmission disequilibrium test (TDT) developed by Spielman et al are: (i) missing parental genotype data and (ii) the presence of undetected genotype errors. While extensions of the TDT that are robust to items (i) and (ii) have been developed, there is to date no single TDT statistic that is robust to both for general pedigrees. We present here a likelihood method, the TDT(ae), which is robust to these issues in general pedigrees. The TDT(ae) assumes a more general disease model than the traditional TDT, which assumes a multiplicative inheritance model for genotypic relative risk. Our model is based on Weinberg's work. To assess robustness, we perform simulations. Also, we apply our method to two data sets from actual diseases: psoriasis and sitosterolemia. Maximization under alternative and null hypotheses is performed using Powell's method. Results of our simulations indicate that our method maintains correct type I error rates at the 1, 5, and 10% levels of significance. Furthermore, a Kolmorogov-Smirnoff Goodness of Fit test suggests that the data are drawn from a central chi2 with 2 df, the correct asymptotic null distribution. The psoriasis results suggest two loci as being significantly linked to the disease, even in the presence of genotyping errors and missing data, and the sitosterolemia results show a P-value of 1.5 x 10(-9) for the marker locus nearest to the sitosterolemia disease genes. We have developed software to perform TDT(ae) calculations, which may be accessed from our ftp site.


Assuntos
Padrões de Herança/genética , Desequilíbrio de Ligação , Modelos Genéticos , Projetos de Pesquisa , Simulação por Computador , Genótipo , Funções Verossimilhança , Linhagem , Psoríase/genética
20.
Mol Diagn Ther ; 11(1): 15-9, 2007.
Artigo em Inglês | MEDLINE | ID: mdl-17286447

RESUMO

The Biology of Addictive Diseases-Database (BiolAD-DB) system is a research bioinformatics system for archiving, analyzing, and processing of complex clinical and genetic data. The database schema employs design principles for handling complex clinical information, such as response items in genetic questionnaires. Data access and validation is provided by the BiolAD-DB client application, which features a data validation engine tightly coupled to a graphical user interface. Data integrity is provided by the password-protected BiolAD-DB SQL compliant server and database. BiolAD-DB tools further provide functionalities for generating customized reports and views. The BiolAD-DB system schema, client, and installation instructions are freely available at http://www.rockefeller.edu/biolad-db/.


Assuntos
Informática Médica/métodos , Biologia Computacional/métodos , Gráficos por Computador , Sistemas de Gerenciamento de Base de Dados , Bases de Dados Factuais , Bases de Dados Genéticas , Processamento Eletrônico de Dados , Humanos , Internet , Projetos de Pesquisa , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA