Pesquisa | BVS IEC

1.

Evaluating a large language model's ability to solve programming exercises from an introductory bioinformatics course.

Piccolo, Stephen R; Denny, Paul; Luxton-Reilly, Andrew; Payne, Samuel H; Ridge, Perry G.

PLoS Comput Biol ; 19(9): e1011511, 2023 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-37769024

RESUMO

Computer programming is a fundamental tool for life scientists, allowing them to carry out essential research tasks. However, despite various educational efforts, learning to write code can be a challenging endeavor for students and researchers in life-sciences disciplines. Recent advances in artificial intelligence have made it possible to translate human-language prompts to functional code, raising questions about whether these technologies can aid (or replace) life scientists' efforts to write code. Using 184 programming exercises from an introductory-bioinformatics course, we evaluated the extent to which one such tool-OpenAI's ChatGPT-could successfully complete programming tasks. ChatGPT solved 139 (75.5%) of the exercises on its first attempt. For the remaining exercises, we provided natural-language feedback to the model, prompting it to try different approaches. Within 7 or fewer attempts, ChatGPT solved 179 (97.3%) of the exercises. These findings have implications for life-sciences education and research. Instructors may need to adapt their pedagogical approaches and assessment techniques to account for these new capabilities that are available to the general public. For some programming tasks, researchers may be able to work in collaboration with machine-learning models to produce functional code.

2.

Short Sequence Aligner Benchmarking for Chromatin Research.

Carter, John Lawrence; Stevens, Harlan; Ridge, Perry G; Johnson, Steven Michael.

Int J Mol Sci ; 24(18)2023 Sep 14.

Artigo em Inglês | MEDLINE | ID: mdl-37762379

RESUMO

Much of today's molecular science revolves around next-generation sequencing. Frequently, the first step in analyzing such data is aligning sequencing reads to a reference genome. This step is often taken for granted, but any analysis downstream of the alignment will be affected by the aligner's ability to correctly map sequences. In most cases, for research into chromatin structure and nucleosome positioning, ATAC-seq, ChIP-seq, and MNase-seq experiments use short read lengths. How well aligners manage these reads is critical. Most aligner programs will output mapped reads and unmapped reads. However, from a biological point of view, reads will fall into one of three categories: correctly mapped, incorrectly mapped, and unmapped. While increased sequencing depth can often compensate for unmapped reads, incorrectly and correctly mapped reads appear algorithmically identical but can produce biologically significant alterations in the results. For this reason, we are benchmarking various alignment programs to determine their propensity to incorrectly map short reads. As short-read alignment is an important step in ATAC-seq, ChIP-seq, and MNase-seq experiments, caution should be taken in mapping reads to ensure that the most accurate conclusions can be made from the data generated. Our analysis is intended to help investigators new to the field pick the alignment program best suited for their experimental conditions. In general, the aligners we tested performed well. BWA, Bowtie2, and Chromap were all exceptionally accurate, and we recommend using them. Furthermore, we show that longer read lengths do in fact lead to more accurate mappings.

Assuntos

Benchmarking , Cromatina , Cromatina/genética , Alinhamento de Sequência , Genoma , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Software , Algoritmos

3.

ExtRamp: a novel algorithm for extracting the ramp sequence based on the tRNA adaptation index or relative codon adaptiveness.

Miller, Justin B; Brase, Logan R; Ridge, Perry G.

Nucleic Acids Res ; 47(3): 1123-1131, 2019 02 20.

Artigo em Inglês | MEDLINE | ID: mdl-30649455

RESUMO

Different species, genes, and locations within genes use different codons to fine-tune gene expression. Within genes, the ramp sequence assists in ribosome spacing and decreases downstream collisions by incorporating slowly-translated codons at the beginning of a gene. Although previously reported as occurring in some species, no previous attempt at extracting the ramp sequence from specific genes has been published. We present ExtRamp, a software package that quickly extracts ramp sequences from any species using the tRNA adaptation index or relative codon adaptiveness. Different filters facilitate the analysis of codon efficiency and enable identification of genes with a ramp sequence. We validate the existence of a ramp sequence in most species by running ExtRamp on 229 742 339 genes across 23 428 species. We evaluate differences in reported ramp sequences when we use different parameters. Using the strictest ramp sequence cut-off, we show that across most taxonomic groups, ramp sequences are approximately 20-40 codons long and occur in about 10% of gene sequences. We also show that in Drosophila melanogaster as gene expression increases, a higher proportion of genes have ramp sequences. We provide a framework for performing this analysis on other species. ExtRamp is freely available at https://github.com/ridgelab/ExtRamp.

Assuntos

Algoritmos , Códon , Análise de Sequência de DNA/métodos , Animais , RNA de Transferência , Análise de Sequência de RNA/métodos , Software

4.

JustOrthologs: a fast, accurate and user-friendly ortholog identification algorithm.

Miller, Justin B; Pickett, Brandon D; Ridge, Perry G.

Bioinformatics ; 35(4): 546-552, 2019 02 15.

Artigo em Inglês | MEDLINE | ID: mdl-30084941

RESUMO

MOTIVATION: Orthologous gene identification is fundamental to all aspects of biology. For example, ortholog identification between species can provide functional insights for genes of unknown function and is a necessary step in phylogenetic inference. Currently, most ortholog identification algorithms require all-versus-all BLAST comparisons, which are time-consuming and memory intensive. RESULTS: In contrast to existing approaches, JustOrthologs exploits the conservation of gene structure by using the lengths of coding sequence regions and dinucleotide percentages to identify orthologs. In comparison to OrthoMCL, OMA and OrthoFinder, JustOrthologs decreases ortholog identification runtime by more than 96% and achieves comparable precision and recall scores. The computational speedup allowed us to conduct pairwise comparisons of 1197 complete genomes (780 eukaryotes and 417 archaea). We confirmed gene annotations for 384 120 genes, grouped 1 675 415 genes in previously unreported ortholog groups, and identified 51 429 potentially mislabeled genes across 622 843 ortholog groups. AVAILABILITY AND IMPLEMENTATION: JustOrthologs is an open source collaborative software package available in the GitHub repository: https://github.com/ridgelab/JustOrthologs/. All test FASTA files used for comparisons are freely available at https://github.com/ridgelab/JustOrthologs/comparisonFastaFiles/. Reference genomes used in this work are available for download from the NCBI repository: ftp://ftp.ncbi.nih.gov/genomes/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Algoritmos , Genômica , Software , Biologia Computacional , Anotação de Sequência Molecular , Filogenia

5.

Codon use and aversion is largely phylogenetically conserved across the tree of life.

Miller, Justin B; McKinnon, Lauren M; Whiting, Michael F; Ridge, Perry G.

Mol Phylogenet Evol ; 144: 106697, 2020 03.

Artigo em Inglês | MEDLINE | ID: mdl-31805345

RESUMO

Using parsimony, we analyzed codon usages across 12,337 species and 25,727 orthologous genes to rank specific genes and codons according to their phylogenetic signal. We examined each codon within each ortholog to determine the codon usage for each species. In total, 890,814 codons were parsimony informative. Next, we compared species that used a codon with species that did not use the codon. We assessed each codon's congruence with species relationships provided in the Open Tree of Life (OTL) and determined the statistical probability of observing these results by random chance. We determined that 25,771 codons had no parallelisms or reversals when mapped to the OTL. Codon usages from orthologous genes spanning many species were 1109× more likely to be congruent with species relationships in the OTL than would be expected by random chance. Using the OTL as a reference, we show that codon usage is phylogenetically conserved within orthologous genes in archaea, bacteria, plants, mammals, and other vertebrates. We also show how to use our provided framework to test different tree hypotheses by confirming the placement of turtles as sister taxa to archosaurs.

Assuntos

Uso do Códon/fisiologia , Códon/genética , Bases de Dados Genéticas , Especiação Genética , Filogenia , Animais , Archaea/classificação , Archaea/genética , Bactérias/classificação , Bactérias/genética , Sequência Conservada , Bases de Dados Genéticas/estatística & dados numéricos , Mamíferos/classificação , Mamíferos/genética , Plantas/classificação , Plantas/genética , Homologia de Sequência , Tartarugas/classificação , Tartarugas/genética , Vertebrados/classificação , Vertebrados/genética

6.

Rare coding variants in the phospholipase D3 gene confer risk for Alzheimer's disease.

Cruchaga, Carlos; Karch, Celeste M; Jin, Sheng Chih; Benitez, Bruno A; Cai, Yefei; Guerreiro, Rita; Harari, Oscar; Norton, Joanne; Budde, John; Bertelsen, Sarah; Jeng, Amanda T; Cooper, Breanna; Skorupa, Tara; Carrell, David; Levitch, Denise; Hsu, Simon; Choi, Jiyoon; Ryten, Mina; Sassi, Celeste; Bras, Jose; Gibbs, Raphael J; Hernandez, Dena G; Lupton, Michelle K; Powell, John; Forabosco, Paola; Ridge, Perry G; Corcoran, Christopher D; Tschanz, JoAnn T; Norton, Maria C; Munger, Ronald G; Schmutz, Cameron; Leary, Maegan; Demirci, F Yesim; Bamne, Mikhil N; Wang, Xingbin; Lopez, Oscar L; Ganguli, Mary; Medway, Christopher; Turton, James; Lord, Jenny; Braae, Anne; Barber, Imelda; Brown, Kristelle; Pastor, Pau; Lorenzo-Betancor, Oswaldo; Brkanac, Zoran; Scott, Erick; Topol, Eric; Morgan, Kevin; Rogaeva, Ekaterina.

Nature ; 505(7484): 550-554, 2014 Jan 23.

Artigo em Inglês | MEDLINE | ID: mdl-24336208

RESUMO

Genome-wide association studies (GWAS) have identified several risk variants for late-onset Alzheimer's disease (LOAD). These common variants have replicable but small effects on LOAD risk and generally do not have obvious functional effects. Low-frequency coding variants, not detected by GWAS, are predicted to include functional variants with larger effects on risk. To identify low-frequency coding variants with large effects on LOAD risk, we carried out whole-exome sequencing (WES) in 14 large LOAD families and follow-up analyses of the candidate variants in several large LOAD case-control data sets. A rare variant in PLD3 (phospholipase D3; Val232Met) segregated with disease status in two independent families and doubled risk for Alzheimer's disease in seven independent case-control series with a total of more than 11,000 cases and controls of European descent. Gene-based burden analyses in 4,387 cases and controls of European descent and 302 African American cases and controls, with complete sequence data for PLD3, reveal that several variants in this gene increase risk for Alzheimer's disease in both populations. PLD3 is highly expressed in brain regions that are vulnerable to Alzheimer's disease pathology, including hippocampus and cortex, and is expressed at significantly lower levels in neurons from Alzheimer's disease brains compared to control brains. Overexpression of PLD3 leads to a significant decrease in intracellular amyloid-ß precursor protein (APP) and extracellular Aß42 and Aß40 (the 42- and 40-residue isoforms of the amyloid-ß peptide), and knockdown of PLD3 leads to a significant increase in extracellular Aß42 and Aß40. Together, our genetic and functional data indicate that carriers of PLD3 coding variants have a twofold increased risk for LOAD and that PLD3 influences APP processing. This study provides an example of how densely affected families may help to identify rare variants with large effects on risk for disease or other complex traits.

Assuntos

Doença de Alzheimer/genética , Predisposição Genética para Doença/genética , Variação Genética/genética , Fosfolipase D/genética , Negro ou Afro-Americano/genética , Idade de Início , Idoso , Idoso de 80 Anos ou mais , Doença de Alzheimer/metabolismo , Peptídeos beta-Amiloides/metabolismo , Precursor de Proteína beta-Amiloide/metabolismo , Encéfalo/metabolismo , Estudos de Casos e Controles , Europa (Continente)/etnologia , Exoma/genética , Feminino , Humanos , Masculino , Fragmentos de Peptídeos/metabolismo , Fosfolipase D/deficiência , Fosfolipase D/metabolismo , Processamento de Proteína Pós-Traducional/genética , Proteólise

7.

Arabidopsis thaliana organelles mimic the T7 phage DNA replisome with specific interactions between Twinkle protein and DNA polymerases Pol1A and Pol1B.

Morley, Stewart A; Peralta-Castro, Antolín; Brieba, Luis G; Miller, Justin; Ong, Kai Li; Ridge, Perry G; Oliphant, Amanda; Aldous, Stephen; Nielsen, Brent L.

BMC Plant Biol ; 19(1): 241, 2019 Jun 06.

Artigo em Inglês | MEDLINE | ID: mdl-31170927

RESUMO

BACKGROUND: Plant chloroplasts and mitochondria utilize nuclear encoded proteins to replicate their DNA. These proteins are purposely built for replication in the organelle environment and are distinct from those involved in replication of the nuclear genome. These organelle-localized proteins have ancestral roots in bacterial and bacteriophage genes, supporting the endosymbiotic theory of their origin. We examined the interactions between three of these proteins from Arabidopsis thaliana: a DNA helicase-primase similar to bacteriophage T7 gp4 protein and animal mitochondrial Twinkle, and two DNA polymerases, Pol1A and Pol1B. We used a three-pronged approach to analyze the interactions, including Yeast-two-hybrid analysis, Direct Coupling Analysis (DCA), and thermophoresis. RESULTS: Yeast-two-hybrid analysis reveals residues 120-295 of Twinkle as the minimal region that can still interact with Pol1A or Pol1B. This region is a part of the primase domain of the protein and slightly overlaps the zinc-finger and RNA polymerase subdomains located within. Additionally, we observed that Arabidopsis Twinkle interacts much more strongly with Pol1A versus Pol1B. Thermophoresis also confirms that the primase domain of Twinkle has higher binding affinity than any other region of the protein. Direct-Coupling-Analysis identified specific residues in Twinkle and the DNA polymerases critical to positive interaction between the two proteins. CONCLUSIONS: The interaction of Twinkle with Pol1A or Pol1B mimics the minimal DNA replisomes of T7 phage and those present in mammalian mitochondria. However, while T7 and mammals absolutely require their homolog of Twinkle DNA helicase-primase, Arabidopsis Twinkle mutants are seemingly unaffected by this loss. This implies that while Arabidopsis mitochondria mimic minimal replisomes from T7 and mammalian mitochondria, there is an extra level of redundancy specific to loss of Twinkle function.

Assuntos

Proteínas de Arabidopsis/genética , Arabidopsis/genética , Bacteriófago T7/genética , DNA Polimerase Dirigida por DNA/genética , Complexos Multienzimáticos/genética , Enzimas Multifuncionais/genética , Arabidopsis/metabolismo , Proteínas de Arabidopsis/metabolismo , DNA Polimerase Dirigida por DNA/metabolismo , Mitocôndrias/metabolismo , Enzimas Multifuncionais/metabolismo

8.

Molecular epidemiology of carbapenem-resistance plasmids using publicly available sequences.

Card, Galen E; Pickett, Brandon D; Ridge, Perry G; Robison, Richard A.

Genome ; 62(12): 785-792, 2019 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-31491336

RESUMO

Carbapenem-resistant bacteria have quickly become a worldwide concern in nosocomial infections. Of the seven known carbapenemases, four have been shown to be particularly problematic: KPC, NDM, IMP, and VIM. To date, many local and species- or carbapenemase-specific epidemiological studies have been performed, which often focus on the organism itself. This report attempts to perform an inclusive (encompass both species and carbapenemase) epidemiologic study using publicly available plasmid sequences from NCBI. In this report, the gene content of these various plasmids has been characterized, replicon types of the plasmids identified, and the global spread and species promiscuity of the plasmids analyzed. Additionally, support to several groups targeting plasmid maintenance and transfer mechanisms to slow the spread of resistance plasmids is given.

Assuntos

Proteínas de Bactérias/genética , Farmacorresistência Bacteriana/genética , Plasmídeos/genética , beta-Lactamases/genética , Antibacterianos , Enterobacteriáceas Resistentes a Carbapenêmicos/genética , Carbapenêmicos , China , Bases de Dados de Ácidos Nucleicos , Plasmídeos/classificação , Replicon , Estados Unidos

9.

Kmer-SSR: a fast and exhaustive SSR search algorithm.

Pickett, Brandon D; Miller, Justin B; Ridge, Perry G.

Bioinformatics ; 33(24): 3922-3928, 2017 Dec 15.

Artigo em Inglês | MEDLINE | ID: mdl-28968741

RESUMO

MOTIVATION: One of the main challenges with bioinformatics software is that the size and complexity of datasets necessitate trading speed for accuracy, or completeness. To combat this problem of computational complexity, a plethora of heuristic algorithms have arisen that report a 'good enough' solution to biological questions. However, in instances such as Simple Sequence Repeats (SSRs), a 'good enough' solution may not accurately portray results in population genetics, phylogenetics and forensics, which require accurate SSRs to calculate intra- and inter-species interactions. RESULTS: We present Kmer-SSR, which finds all SSRs faster than most heuristic SSR identification algorithms in a parallelized, easy-to-use manner. The exhaustive Kmer-SSR option has 100% precision and 100% recall and accurately identifies every SSR of any specified length. To identify more biologically pertinent SSRs, we also developed several filters that allow users to easily view a subset of SSRs based on user input. Kmer-SSR, coupled with the filter options, accurately and intuitively identifies SSRs quickly and in a more user-friendly manner than any other SSR identification algorithm. AVAILABILITY AND IMPLEMENTATION: The source code is freely available on GitHub at https://github.com/ridgelab/Kmer-SSR. CONTACT: perry.ridge@byu.edu.

Assuntos

Algoritmos , Repetições de Microssatélites , Software , Biologia Computacional/métodos

10.

Assembly of 809 whole mitochondrial genomes with clinical, imaging, and fluid biomarker phenotyping.

Ridge, Perry G; Wadsworth, Mark E; Miller, Justin B; Saykin, Andrew J; Green, Robert C; Kauwe, John S K.

Alzheimers Dement ; 14(4): 514-519, 2018 04.

Artigo em Inglês | MEDLINE | ID: mdl-29306584

RESUMO

INTRODUCTION: Mitochondrial genetics are an important but largely neglected area of research in Alzheimer's disease. A major impediment is the lack of data sets. METHODS: We used an innovative, rigorous approach, combining several existing tools with our own, to accurately assemble and call variants in 809 whole mitochondrial genomes. RESULTS: To help address this impediment, we prepared a data set that consists of 809 complete and annotated mitochondrial genomes with samples from the Alzheimer's Disease Neuroimaging Initiative. These whole mitochondrial genomes include rich phenotyping, such as clinical, fluid biomarker, and imaging data, all of which is available through the Alzheimer's Disease Neuroimaging Initiative website. Genomes are cleaned, annotated, and prepared for analysis. DISCUSSION: These data provide an important resource for investigating the impact of mitochondrial genetic variation on risk for Alzheimer's disease and other phenotypes that have been measured in the Alzheimer's Disease Neuroimaging Initiative samples.

Assuntos

Doença de Alzheimer/genética , Genoma Mitocondrial , Idoso , Doença de Alzheimer/sangue , Doença de Alzheimer/líquido cefalorraquidiano , Doença de Alzheimer/diagnóstico por imagem , Apolipoproteínas E/genética , Biomarcadores/sangue , Biomarcadores/líquido cefalorraquidiano , Encéfalo/diagnóstico por imagem , Feminino , Variação Genética , Haplótipos , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Masculino , Neuroimagem , Fenótipo

11.

Missing something? Codon aversion as a new character system in phylogenetics.

Miller, Justin B; Hippen, Ariel A; Belyeu, Jonathon R; Whiting, Michael F; Ridge, Perry G.

Cladistics ; 33(5): 545-556, 2017 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-34706488

RESUMO

Although many studies have documented codon usage bias in different species, the importance of codon usage in a phylogenetic framework remains largely unknown. We demonstrate that a phylogenetic signal is present in the codon usage and non-usage biases of 17 717 orthologues evaluated across 72 tetrapod species using a simple parsimony analysis of a binary matrix of codon characters. Phylogenies estimated using stop codons were more congruent with previous hypotheses than phylogenies based on any other single codon or a combination of codons. Although each codon is present in every species, specific genes have different codon preferences and may or may not use every possible codon. This observation allowed us to map the pattern of codon usage and non-usage across the topology. These results suggest that codon usage is phylogenetically conserved across shallow and deep levels within tetrapods.

12.

Genome-wide association study of CSF levels of 59 alzheimer's disease candidate proteins: significant associations with proteins involved in amyloid processing and inflammation.

Kauwe, John S K; Bailey, Matthew H; Ridge, Perry G; Perry, Rachel; Wadsworth, Mark E; Hoyt, Kaitlyn L; Staley, Lyndsay A; Karch, Celeste M; Harari, Oscar; Cruchaga, Carlos; Ainscough, Benjamin J; Bales, Kelly; Pickering, Eve H; Bertelsen, Sarah; Fagan, Anne M; Holtzman, David M; Morris, John C; Goate, Alison M.

PLoS Genet ; 10(10): e1004758, 2014 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-25340798

RESUMO

Cerebrospinal fluid (CSF) 42 amino acid species of amyloid beta (Aß42) and tau levels are strongly correlated with the presence of Alzheimer's disease (AD) neuropathology including amyloid plaques and neurodegeneration and have been successfully used as endophenotypes for genetic studies of AD. Additional CSF analytes may also serve as useful endophenotypes that capture other aspects of AD pathophysiology. Here we have conducted a genome-wide association study of CSF levels of 59 AD-related analytes. All analytes were measured using the Rules Based Medicine Human DiscoveryMAP Panel, which includes analytes relevant to several disease-related processes. Data from two independently collected and measured datasets, the Knight Alzheimer's Disease Research Center (ADRC) and Alzheimer's Disease Neuroimaging Initiative (ADNI), were analyzed separately, and combined results were obtained using meta-analysis. We identified genetic associations with CSF levels of 5 proteins (Angiotensin-converting enzyme (ACE), Chemokine (C-C motif) ligand 2 (CCL2), Chemokine (C-C motif) ligand 4 (CCL4), Interleukin 6 receptor (IL6R) and Matrix metalloproteinase-3 (MMP3)) with study-wide significant p-values (p<1.46×10-10) and significant, consistent evidence for association in both the Knight ADRC and the ADNI samples. These proteins are involved in amyloid processing and pro-inflammatory signaling. SNPs associated with ACE, IL6R and MMP3 protein levels are located within the coding regions of the corresponding structural gene. The SNPs associated with CSF levels of CCL4 and CCL2 are located in known chemokine binding proteins. The genetic associations reported here are novel and suggest mechanisms for genetic control of CSF and plasma levels of these disease-related proteins. Significant SNPs in ACE and MMP3 also showed association with AD risk. Our findings suggest that these proteins/pathways may be valuable therapeutic targets for AD. Robust associations in cognitively normal individuals suggest that these SNPs also influence regulation of these proteins more generally and may therefore be relevant to other diseases.

Assuntos

Doença de Alzheimer/genética , Peptídeos beta-Amiloides/genética , Metaloproteinase 3 da Matriz/genética , Renina/genética , Doença de Alzheimer/sangue , Doença de Alzheimer/líquido cefalorraquidiano , Doença de Alzheimer/patologia , Peptídeos beta-Amiloides/líquido cefalorraquidiano , Proteínas Sanguíneas/genética , Quimiocina CCL2/genética , Quimiocina CCL4/genética , Feminino , Estudo de Associação Genômica Ampla , Humanos , Masculino , Fator de Crescimento Neural/genética , Polimorfismo de Nucleotídeo Único , Receptores de Interleucina-6/genética , Receptores de Lipoproteínas/genética , Proteínas tau/líquido cefalorraquidiano , Proteínas tau/genética

13.

A novel approach for multi-SNP GWAS and its application in Alzheimer's disease.

Bodily, Paul M; Fujimoto, M Stanley; Page, Justin T; Clement, Mark J; Ebbert, Mark T W; Ridge, Perry G.

BMC Bioinformatics ; 17 Suppl 7: 268, 2016 Jul 25.

Artigo em Inglês | MEDLINE | ID: mdl-27453991

RESUMO

BACKGROUND: Genome-wide association studies (GWAS) have effectively identified genetic factors for many diseases. Many diseases, including Alzheimer's disease (AD), have epistatic causes, requiring more sophisticated analyses to identify groups of variants which together affect phenotype. RESULTS: Based on the GWAS statistical model, we developed a multi-SNP GWAS analysis to identify pairs of variants whose common occurrence signaled the Alzheimer's disease phenotype. CONCLUSIONS: Despite not having sufficient data to demonstrate significance, our preliminary experimentation identified a high correlation between GRIA3 and HLA-DRB5 (an AD gene). GRIA3 has not been previously reported in association with AD, but is known to play a role in learning and memory.

Assuntos

Doença de Alzheimer/genética , Biologia Computacional/métodos , Epistasia Genética , Estudo de Associação Genômica Ampla/métodos , Polimorfismo de Nucleotídeo Único , Doença de Alzheimer/metabolismo , Feminino , Predisposição Genética para Doença , Cadeias HLA-DRB5/genética , Humanos , Masculino , Modelos Estatísticos , Receptores de AMPA/genética

14.

Evaluating the necessity of PCR duplicate removal from next-generation sequencing data and a comparison of approaches.

Ebbert, Mark T W; Wadsworth, Mark E; Staley, Lyndsay A; Hoyt, Kaitlyn L; Pickett, Brandon; Miller, Justin; Duce, John; Kauwe, John S K; Ridge, Perry G.

BMC Bioinformatics ; 17 Suppl 7: 239, 2016 Jul 25.

Artigo em Inglês | MEDLINE | ID: mdl-27454357

RESUMO

BACKGROUND: Analyzing next-generation sequencing data is difficult because datasets are large, second generation sequencing platforms have high error rates, and because each position in the target genome (exome, transcriptome, etc.) is sequenced multiple times. Given these challenges, numerous bioinformatic algorithms have been developed to analyze these data. These algorithms aim to find an appropriate balance between data loss, errors, analysis time, and memory footprint. Typical analysis pipelines require multiple steps. If one or more of these steps is unnecessary, it would significantly decrease compute time and data manipulation to remove the step. One step in many pipelines is PCR duplicate removal, where PCR duplicates arise from multiple PCR products from the same template molecule binding on the flowcell. These are often removed because there is concern they can lead to false positive variant calls. Picard (MarkDuplicates) and SAMTools (rmdup) are the two main softwares used for PCR duplicate removal. RESULTS: Approximately 92 % of the 17+ million variants called were called whether we removed duplicates with Picard or SAMTools, or left the PCR duplicates in the dataset. There were no significant differences between the unique variant sets when comparing the transition/transversion ratios (p = 1.0), percentage of novel variants (p = 0.99), average population frequencies (p = 0.99), and the percentage of protein-changing variants (p = 1.0). Results were similar for variants in the American College of Medical Genetics genes. Genotype concordance between NGS and SNP chips was above 99 % for all genotype groups (e.g., homozygous reference). CONCLUSIONS: Our results suggest that PCR duplicate removal has minimal effect on the accuracy of subsequent variant calls.

Assuntos

Sequenciamento de Nucleotídeos em Larga Escala/métodos , Polimorfismo de Nucleotídeo Único , Análise de Sequência de DNA/métodos , Software , Confiabilidade dos Dados , Genoma Humano , Genômica/métodos , Humanos , Reação em Cadeia da Polimerase

15.

Genome-wide association study of prolactin levels in blood plasma and cerebrospinal fluid.

Staley, Lyndsay A; Ebbert, Mark T W; Parker, Sheradyn; Bailey, Matthew; Ridge, Perry G; Goate, Alison M; Kauwe, John S K.

BMC Genomics ; 17 Suppl 3: 436, 2016 06 29.

Artigo em Inglês | MEDLINE | ID: mdl-27357110

RESUMO

BACKGROUND: Prolactin is a polypeptide hormone secreted by the anterior pituitary gland that plays an essential role in lactation, tissue growth, and suppressing apoptosis to increase cell survival. Prolactin serves as a key player in many life-critical processes, including immune system and reproduction. Prolactin is also found in multiple fluids throughout the body, including plasma and cerebrospinal fluid (CSF). METHODS: In this study, we measured prolactin levels in both plasma and CSF, and performed a genome-wide association study. We then performed meta-analyses using METAL with a significance threshold of p < 5 × 10(-8) and removed SNPs where the direction of the effect was different between the two datasets. RESULTS: We identified 12 SNPs associated with increased prolactin levels in both biological fluids. CONCLUSIONS: Our efforts will help researchers understand how prolactin is regulated in both CSF and plasma, which could be beneficial in research for the immune system and reproduction.

Assuntos

Estudo de Associação Genômica Ampla/métodos , Polimorfismo de Nucleotídeo Único , Prolactina/sangue , Prolactina/líquido cefalorraquidiano , Adulto , Idoso , Idoso de 80 Anos ou mais , Doença de Alzheimer/sangue , Doença de Alzheimer/líquido cefalorraquidiano , Doença de Alzheimer/genética , Biomarcadores/sangue , Biomarcadores/líquido cefalorraquidiano , Proteínas Quinases Dependentes de Cálcio-Calmodulina/genética , Feminino , Frequência do Gene , Genótipo , Humanos , Peptídeos e Proteínas de Sinalização Intracelular/genética , Modelos Lineares , Desequilíbrio de Ligação , Metanálise como Assunto , Pessoa de Meia-Idade , Sulfotransferases/genética

16.

Variants in ACPP are associated with cerebrospinal fluid Prostatic Acid Phosphatase levels.

Staley, Lyndsay A; Ebbert, Mark T W; Bunker, Daniel; Bailey, Matthew; Ridge, Perry G; Goate, Alison M; Kauwe, John S K.

BMC Genomics ; 17 Suppl 3: 439, 2016 06 29.

Artigo em Inglês | MEDLINE | ID: mdl-27357282

RESUMO

BACKGROUND: Prostatic Acid Phosphatase (PAP) is an enzyme that is produced primarily in the prostate and functions as a cell growth regulator and potential tumor suppressor. Understanding the genetic regulation of this enzyme is important because PAP plays an important role in prostate cancer and is expressed in other tissues such as the brain. METHODS: We tested association between 5.8 M SNPs and PAP levels in cerebrospinal fluid across 543 individuals in two datasets using linear regression. We then performed meta-analyses using METAL =with a significance threshold of p < 5 × 10(-8) and removed SNPs where the direction of the effect was different between the two datasets, identifying 289 candidate SNPs that affect PAP cerebrospinal fluid levels. We analyzed each of these SNPs individually and prioritized SNPs that had biologically meaningful functional annotations in wANNOVAR (e.g. non-synonymous, stop gain, 3' UTR, etc.) or had a RegulomeDB score less than 3. RESULTS: Thirteen SNPs met our criteria, suggesting they are candidate causal alleles that underlie ACPP regulation and expression. CONCLUSIONS: Given PAP's expression in the brain and its role as a cell-growth regulator and tumor suppressor, our results have important implications in brain health such as cancer and other brain diseases including neurodegenerative diseases (e.g., Alzheimer's disease and Parkinson's disease) and mental health (e.g., anxiety, depression, and schizophrenia).

Assuntos

Fosfatase Ácida/líquido cefalorraquidiano , Fosfatase Ácida/genética , Metanálise como Assunto , Polimorfismo de Nucleotídeo Único , Idoso , Idoso de 80 Anos ou mais , Alelos , Doença de Alzheimer/líquido cefalorraquidiano , Doença de Alzheimer/genética , Encéfalo/enzimologia , Encéfalo/metabolismo , Neoplasias Encefálicas/enzimologia , Neoplasias Encefálicas/genética , Regulação Enzimológica da Expressão Gênica , Frequência do Gene , Estudo de Associação Genômica Ampla/métodos , Genótipo , Humanos , Modelos Lineares , Pessoa de Meia-Idade , Doenças Neurodegenerativas/enzimologia , Doenças Neurodegenerativas/genética , Fatores de Risco

17.

Variants in CCL16 are associated with blood plasma and cerebrospinal fluid CCL16 protein levels.

Ebbert, Mark T W; Staley, Lyndsay A; Parker, Joshua; Parker, Sheradyn; Bailey, Matthew; Ridge, Perry G; Goate, Alison M; Kauwe, John S K.

BMC Genomics ; 17 Suppl 3: 437, 2016 06 29.

Artigo em Inglês | MEDLINE | ID: mdl-27357396

RESUMO

BACKGROUND: CCL16 is a chemokine predominantly expressed in the liver, but is also found in the blood and brain, and is known to play important roles in immune response and angiogenesis. Little is known about the gene's regulation. METHODS: Here, we test for potential causal SNPs that affect CCL16 protein levels in both blood plasma and cerebrospinal fluid in a genome-wide association study across two datasets. We then use METAL to performed meta-analyses with a significance threshold of p < 5x10(-8). We removed SNPs where the direction of the effect was different between the two datasets. RESULTS: We identify 10 SNPs associated with increased CCL16 protein levels in both biological fluids. CONCLUSIONS: Our results will help understand CCL16's regulation, allowing researchers to better understand the gene's effects on human health.

Assuntos

Quimiocinas CC/genética , Estudo de Associação Genômica Ampla/métodos , Metanálise como Assunto , Polimorfismo de Nucleotídeo Único , Idoso , Idoso de 80 Anos ou mais , Alelos , Doença de Alzheimer/sangue , Doença de Alzheimer/líquido cefalorraquidiano , Doença de Alzheimer/genética , Quimiocinas CC/sangue , Quimiocinas CC/líquido cefalorraquidiano , Regulação da Expressão Gênica , Frequência do Gene , Genótipo , Humanos , Desequilíbrio de Ligação , Pessoa de Meia-Idade

18.

Germline mutations in NFKB2 implicate the noncanonical NF-κB pathway in the pathogenesis of common variable immunodeficiency.

Chen, Karin; Coonrod, Emily M; Kumánovics, Attila; Franks, Zechariah F; Durtschi, Jacob D; Margraf, Rebecca L; Wu, Wilfred; Heikal, Nahla M; Augustine, Nancy H; Ridge, Perry G; Hill, Harry R; Jorde, Lynn B; Weyrich, Andrew S; Zimmerman, Guy A; Gundlapalli, Adi V; Bohnsack, John F; Voelkerding, Karl V.

Am J Hum Genet ; 93(5): 812-24, 2013 Nov 07.

Artigo em Inglês | MEDLINE | ID: mdl-24140114

RESUMO

Common variable immunodeficiency (CVID) is a heterogeneous disorder characterized by antibody deficiency, poor humoral response to antigens, and recurrent infections. To investigate the molecular cause of CVID, we carried out exome sequence analysis of a family diagnosed with CVID and identified a heterozygous frameshift mutation, c.2564delA (p.Lys855Serfs(∗)7), in NFKB2 affecting the C terminus of NF-κB2 (also known as p100/p52 or p100/p49). Subsequent screening of NFKB2 in 33 unrelated CVID-affected individuals uncovered a second heterozygous nonsense mutation, c.2557C>T (p.Arg853(∗)), in one simplex case. Affected individuals in both families presented with an unusual combination of childhood-onset hypogammaglobulinemia with recurrent infections, autoimmune features, and adrenal insufficiency. NF-κB2 is the principal protein involved in the noncanonical NF-κB pathway, is evolutionarily conserved, and functions in peripheral lymphoid organ development, B cell development, and antibody production. In addition, Nfkb2 mouse models demonstrate a CVID-like phenotype with hypogammaglobulinemia and poor humoral response to antigens. Immunoblot analysis and immunofluorescence microscopy of transformed B cells from affected individuals show that the NFKB2 mutations affect phosphorylation and proteasomal processing of p100 and, ultimately, p52 nuclear translocation. These findings describe germline mutations in NFKB2 and establish the noncanonical NF-κB signaling pathway as a genetic etiology for this primary immunodeficiency syndrome.

Assuntos

Imunodeficiência de Variável Comum/genética , Mutação em Linhagem Germinativa , Subunidade p52 de NF-kappa B/genética , Transdução de Sinais , Adolescente , Adulto , Sequência de Aminoácidos , Animais , Linfócitos B/citologia , Linfócitos B/metabolismo , Linhagem Celular , Criança , Imunodeficiência de Variável Comum/patologia , Modelos Animais de Doenças , Feminino , Testes Genéticos , Heterozigoto , Humanos , Imunoglobulina A/sangue , Imunoglobulina G/sangue , Imunoglobulina M/sangue , Masculino , Microscopia Confocal , Dados de Sequência Molecular , Subunidade p52 de NF-kappa B/metabolismo , Linhagem , Fenótipo , Adulto Jovem

19.

Interaction between variants in CLU and MS4A4E modulates Alzheimer's disease risk.

Ebbert, Mark T W; Boehme, Kevin L; Wadsworth, Mark E; Staley, Lyndsay A; Mukherjee, Shubhabrata; Crane, Paul K; Ridge, Perry G; Kauwe, John S K.

Alzheimers Dement ; 12(2): 121-129, 2016 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-26449541

RESUMO

INTRODUCTION: Ebbert et al. reported gene-gene interactions between rs11136000-rs670139 (CLU-MS4A4E) and rs3865444-rs670139 (CD33-MS4A4E). We evaluate these interactions in the largest data set for an epistasis study. METHODS: We tested interactions using 3837 cases and 4145 controls from Alzheimer's Disease Genetics Consortium using meta-analyses and permutation analyses. We repeated meta-analyses stratified by apolipoprotein E (APOE) Îµ4 status, estimated combined odds ratio (OR) and population attributable fraction (cPAF), and explored causal variants. RESULTS: Results support the CLU-MS4A4E interaction and a dominant effect. An association between CLU-MS4A4E and APOE Îµ4 negative status exists. The estimated synergy factor, OR, and cPAF for rs11136000-rs670139 are 2.23, 2.45, and 8.0, respectively. We identified potential causal variants. DISCUSSION: We replicated the CLU-MS4A4E interaction in a large case-control series and observed APOE Îµ4 and possible dominant effect. The CLU-MS4A4E OR is higher than any Alzheimer's disease locus except APOE Îµ4, APP, and TREM2. We estimated an 8% decrease in Alzheimer's disease incidence without CLU-MS4A4E risk alleles and identified potential causal variants.

Assuntos

Doença de Alzheimer/genética , Apolipoproteína E4/genética , Clusterina/genética , Epistasia Genética , Proteínas de Membrana/genética , Lectina 3 Semelhante a Ig de Ligação ao Ácido Siálico/genética , Alelos , Feminino , Predisposição Genética para Doença , Variação Genética , Estudo de Associação Genômica Ampla/métodos , Humanos , Masculino , Fatores de Risco

20.

Genetic analysis, structural modeling, and direct coupling analysis suggest a mechanism for phosphate signaling in Escherichia coli.

Gardner, Stewart G; Miller, Justin B; Dean, Tanner; Robinson, Tanner; Erickson, McCall; Ridge, Perry G; McCleary, William R.

BMC Genet ; 16 Suppl 2: S2, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-25953406

RESUMO

BACKGROUND: Proper phosphate signaling is essential for robust growth of Escherichia coli and many other bacteria. The phosphate signal is mediated by a classic two component signal system composed of PhoR and PhoB. The PhoR histidine kinase is responsible for phosphorylating/dephosphorylating the response regulator, PhoB, which controls the expression of genes that aid growth in low phosphate conditions. The mechanism by which PhoR receives a signal of environmental phosphate levels has remained elusive. A transporter complex composed of the PstS, PstC, PstA, and PstB proteins as well as a negative regulator, PhoU, have been implicated in signaling environmental phosphate to PhoR. RESULTS: This work confirms that PhoU and the PstSCAB complex are necessary for proper signaling of high environmental phosphate. Also, we identify residues important in PhoU/PhoR interaction with genetic analysis. Using protein modeling and docking methods, we show an interaction model that points to a potential mechanism for PhoU mediated signaling to PhoR to modify its activity. This model is tested with direct coupling analysis. CONCLUSIONS: These bioinformatics tools, in combination with genetic and biochemical analysis, help to identify and test a model for phosphate signaling and may be applicable to several other systems.

Assuntos

Escherichia coli/metabolismo , Fosfatos/metabolismo , Transdução de Sinais , Transportadores de Cassetes de Ligação de ATP/química , Transportadores de Cassetes de Ligação de ATP/metabolismo , Proteínas de Bactérias/química , Proteínas de Bactérias/metabolismo , Escherichia coli/genética , Proteínas de Escherichia coli/química , Proteínas de Escherichia coli/metabolismo , Proteínas de Membrana Transportadoras/química , Proteínas de Membrana Transportadoras/metabolismo , Modelos Moleculares , Fatores de Transcrição/química , Fatores de Transcrição/metabolismo

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA