Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 29
Filtrar
1.
Patterns (N Y) ; 2(9): 100322, 2021 Sep 10.
Artigo em Inglês | MEDLINE | ID: mdl-34553169

RESUMO

Reproducible computational research (RCR) is the keystone of the scientific method for in silico analyses, packaging the transformation of raw data to published results. In addition to its role in research integrity, improving the reproducibility of scientific studies can accelerate evaluation and reuse. This potential and wide support for the FAIR principles have motivated interest in metadata standards supporting reproducibility. Metadata provide context and provenance to raw data and methods and are essential to both discovery and validation. Despite this shared connection with scientific data, few studies have explicitly described how metadata enable reproducible computational research. This review employs a functional content analysis to identify metadata standards that support reproducibility across an analytic stack consisting of input data, tools, notebooks, pipelines, and publications. Our review provides background context, explores gaps, and discovers component trends of embeddedness and methodology weight from which we derive recommendations for future work.

2.
Cancer Genet ; 235-236: 1-12, 2019 06.
Artigo em Inglês | MEDLINE | ID: mdl-31296308

RESUMO

Identifying genetic biomarkers of patient survival remains a major goal of large-scale cancer profiling studies. Using gene expression data to predict the outcome of a patient's tumor makes biomarker discovery a compelling tool for improving patient care. As genomic technologies expand, multiple data types may serve as informative biomarkers, and bioinformatic strategies have evolved around these different applications. For categorical variables such as a gene's mutation status, biomarker identification to predict survival time is straightforward. However, for continuous variables like gene expression, the available methods generate highly-variable results, and studies on best practices are lacking. We investigated the performance of eight methods that deal specifically with continuous data. K-means, Cox regression, concordance index, D-index, 25th-75th percentile split, median-split, distribution-based splitting, and KaplanScan were applied to four RNA-sequencing (RNA-seq) datasets from the Cancer Genome Atlas. The reliability of the eight methods was assessed by splitting each dataset into two groups and comparing the overlap of the results. Gene sets that had been identified from the literature for a specific tumor type served as positive controls to assess the accuracy of each biomarker using receiver operating characteristic (ROC) curves. Artificial RNA-Seq data were generated to test the robustness of these methods under fixed levels of gene expression noise. Our results show that methods based on dichotomizing tend to have consistently poor performance while C-index, D-index, and k-means perform well in most settings. Overall, the Cox regression method had the strongest performance based on tests of accuracy, reliability, and robustness.


Assuntos
Perfilação da Expressão Gênica/métodos , Regulação Neoplásica da Expressão Gênica/genética , Neoplasias/genética , Neoplasias/mortalidade , Sequência de Bases , Biomarcadores Tumorais/genética , Interpretação Estatística de Dados , Humanos , Estimativa de Kaplan-Meier , Prognóstico , Modelos de Riscos Proporcionais , Curva ROC , Análise de Sequência de RNA/métodos , Análise de Sobrevida
3.
Cell Metab ; 29(1): 78-90.e5, 2019 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-30174309

RESUMO

Nuclear-encoded mutations causing metabolic and degenerative diseases have highly variable expressivity. Patients sharing the homozygous mutation (c.523delC) in the adenine nucleotide translocator 1 gene (SLC25A4, ANT1) develop cardiomyopathy that varies from slowly progressive to fulminant. This variability correlates with the mitochondrial DNA (mtDNA) lineage. To confirm that mtDNA variants can modulate the expressivity of nuclear DNA (nDNA)-encoded diseases, we combined in mice the nDNA Slc25a4-/- null mutation with a homoplasmic mtDNA ND6P25L or COIV421A variant. The ND6P25L variant significantly increased the severity of cardiomyopathy while the COIV421A variant was phenotypically neutral. The adverse Slc25a4-/- and ND6P25L combination was associated with impaired mitochondrial complex I activity, increased oxidative damage, decreased l-Opa1, altered mitochondrial morphology, sensitization of the mitochondrial permeability transition pore, augmented somatic mtDNA mutation levels, and shortened lifespan. The strikingly different phenotypic effects of these mild mtDNA variants demonstrate that mtDNA can be an important modulator of autosomal disease.


Assuntos
Cardiomiopatias/genética , DNA Mitocondrial/genética , Complexo I de Transporte de Elétrons/genética , Mitocôndrias/genética , Animais , Modelos Animais de Doenças , Camundongos , Camundongos Endogâmicos C57BL , Mutação
4.
PLoS Comput Biol ; 13(12): e1005867, 2017 12.
Artigo em Inglês | MEDLINE | ID: mdl-29227991

RESUMO

Novel or rare variants in mitochondrial tRNA sequences may be observed after mitochondrial DNA analysis. Determining whether these variants are pathogenic is critical, but confirmation of the effect of a variant on mitochondrial function can be challenging. We have used available databases of benign and pathogenic variants, alignment between diverse tRNAs, structural information and comparative genomics to predict the impact of all possible single-base variants and deletions. The Mitochondrial tRNA Informatics Predictor (MitoTIP) is available through MITOMAP at www.mitomap.org. The source code for MitoTIP is available at www.github.com/sonneysa/MitoTIP.


Assuntos
Mitocôndrias/genética , RNA de Transferência/genética , Virulência , Conformação de Ácido Nucleico , RNA de Transferência/química
5.
JAMA Psychiatry ; 74(11): 1161-1168, 2017 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-28832883

RESUMO

Importance: Autism spectrum disorders (ASD) are characterized by impairments in social interaction, communication, and repetitive or restrictive behavior. Although multiple physiologic and biochemical studies have reported defects in mitochondrial oxidative phosphorylation in patients with ASD, the role of mitochondrial DNA (mtDNA) variation has remained relatively unexplored. Objective: To assess what impact mitochondrial lineages encompassing ancient mtDNA functional polymorphisms, termed haplogroups, have on ASD risk. Design, Setting, and Participants: In this cohort study, individuals with autism and their families were studied using the Autism Genetic Resource Exchange cohort genome-wide association studies data previously generated at the Children's Hospital of Philadelphia. From October 2010 to January 2017, we analyzed the data and used the mtDNA single-nucleotide polymorphisms interrogated by the Illumina HumanHap 550 chip to determine the mtDNA haplogroups of the individuals. Taking into account the familial structure of the Autism Genetic Resource Exchange data, we then determined whether the mtDNA haplogroups correlate with ASD risk. Main Outcomes and Measures: Odds ratios of mitochondrial haplogroup as predictors of ASD risk. Results: Of 1624 patients with autism included in this study, 1299 were boys (80%) and 325 were girls (20%). Families in the Autism Genetic Resource Exchange collection (933 families, encompassing 4041 individuals: 1624 patients with ASD and 2417 healthy parents and siblings) had been previously recruited in the United States with no restrictions on age, sex, race/ethnicity, or socioeconomic status. Relative to the most common European haplogroup HHV, European haplogroups I, J, K, O-X, T, and U were associated with increased risk of ASD, as were Asian and Native American haplogroups A and M, with odds ratios ranging from 1.55 (95% CI, 1.16-2.06) to 2.18 (95% CI, 1.59-3) (adjusted P < .04). Hence, mtDNA haplogroup variation is an important risk factor for ASD. Conclusions and Relevance: Because haplogroups I, J, K, O-X, T, and U encompass 55% of the European population, mtDNA lineages must make a significant contribution to overall ASD risk.


Assuntos
Transtorno do Espectro Autista/genética , DNA Mitocondrial/genética , Predisposição Genética para Doença/genética , Feminino , Estudo de Associação Genômica Ampla , Haplótipos/genética , Humanos , Masculino , Polimorfismo de Nucleotídeo Único/genética
6.
Proc Natl Acad Sci U S A ; 114(10): 2705-2710, 2017 03 07.
Artigo em Inglês | MEDLINE | ID: mdl-28223503

RESUMO

Diabetes is associated with impaired glucose metabolism in the presence of excess insulin. Glucose and fatty acids provide reducing equivalents to mitochondria to generate energy, and studies have reported mitochondrial dysfunction in type II diabetes patients. If mitochondrial dysfunction can cause diabetes, then we hypothesized that increased mitochondrial metabolism should render animals resistant to diabetes. This was confirmed in mice in which the heart-muscle-brain adenine nucleotide translocator isoform 1 (ANT1) was inactivated. ANT1-deficient animals are insulin-hypersensitive, glucose-tolerant, and resistant to high fat diet (HFD)-induced toxicity. In ANT1-deficient skeletal muscle, mitochondrial gene expression is induced in association with the hyperproliferation of mitochondria. The ANT1-deficient muscle mitochondria produce excess reactive oxygen species (ROS) and are partially uncoupled. Hence, the muscle respiration under nonphosphorylating conditions is increased. Muscle transcriptome analysis revealed the induction of mitochondrial biogenesis, down-regulation of diabetes-related genes, and increased expression of the genes encoding the myokines FGF21 and GDF15. However, FGF21 was not elevated in serum, and FGF21 and UCP1 mRNAs were not induced in liver or brown adipose tissue (BAT). Hence, increased oxidation of dietary-reducing equivalents by elevated muscle mitochondrial respiration appears to be the mechanism by which ANT1-deficient mice prevent diabetes, demonstrating that the rate of mitochondrial oxidation of calories is important in the etiology of metabolic disease.


Assuntos
Translocador 1 do Nucleotídeo Adenina/genética , Diabetes Mellitus Tipo 2/genética , Fatores de Crescimento de Fibroblastos/genética , Fator 15 de Diferenciação de Crescimento/genética , Translocador 1 do Nucleotídeo Adenina/deficiência , Tecido Adiposo Marrom/metabolismo , Tecido Adiposo Marrom/patologia , Animais , Proliferação de Células/genética , Diabetes Mellitus Tipo 2/metabolismo , Diabetes Mellitus Tipo 2/patologia , Dieta Hiperlipídica/efeitos adversos , Metabolismo Energético/genética , Glucose/metabolismo , Humanos , Resistência à Insulina/genética , Camundongos , Mitocôndrias Musculares/genética , Mitocôndrias Musculares/metabolismo , Mitocôndrias Musculares/patologia , Músculo Esquelético/metabolismo , Espécies Reativas de Oxigênio/metabolismo , Transcriptoma/genética , Proteína Desacopladora 1/genética
7.
Brief Bioinform ; 18(3): 530-536, 2017 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-27013646

RESUMO

High-throughput bioinformatic analyses increasingly rely on pipeline frameworks to process sequence and metadata. Modern implementations of these frameworks differ on three key dimensions: using an implicit or explicit syntax, using a configuration, convention or class-based design paradigm and offering a command line or workbench interface. Here I survey and compare the design philosophies of several current pipeline frameworks. I provide practical recommendations based on analysis requirements and the user base.


Assuntos
Biologia Computacional , Humanos , Software
8.
Hum Mutat ; 37(6): 540-548, 2016 06.
Artigo em Inglês | MEDLINE | ID: mdl-26919060

RESUMO

MSeqDR is the Mitochondrial Disease Sequence Data Resource, a centralized and comprehensive genome and phenome bioinformatics resource built by the mitochondrial disease community to facilitate clinical diagnosis and research investigations of individual patient phenotypes, genomes, genes, and variants. A central Web portal (https://mseqdr.org) integrates community knowledge from expert-curated databases with genomic and phenotype data shared by clinicians and researchers. MSeqDR also functions as a centralized application server for Web-based tools to analyze data across both mitochondrial and nuclear DNA, including investigator-driven whole exome or genome dataset analyses through MSeqDR-Genesis. MSeqDR-GBrowse genome browser supports interactive genomic data exploration and visualization with custom tracks relevant to mtDNA variation and mitochondrial disease. MSeqDR-LSDB is a locus-specific database that currently manages 178 mitochondrial diseases, 1,363 genes associated with mitochondrial biology or disease, and 3,711 pathogenic variants in those genes. MSeqDR Disease Portal allows hierarchical tree-style disease exploration to evaluate their unique descriptions, phenotypes, and causative variants. Automated genomic data submission tools are provided that capture ClinVar compliant variant annotations. PhenoTips will be used for phenotypic data submission on deidentified patients using human phenotype ontology terminology. The development of a dynamic informed patient consent process to guide data access is underway to realize the full potential of these resources.


Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Doenças Mitocondriais/genética , Variação Genética , Genoma Mitocondrial , Genômica , Humanos , Disseminação de Informação , Interface Usuário-Computador , Navegador
9.
PLoS One ; 10(6): e0130927, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26098565

RESUMO

BACKGROUND: Traumatic brain injury (TBI) has been shown to activate the peripheral innate immune system and systemic inflammatory response, possibly through the central release of damage associated molecular patterns (DAMPs). Our main purpose was to gain an initial understanding of the peripheral mitochondrial response following TBI, and how this response could be utilized to determine cerebral mitochondrial bioenergetics. We hypothesized that TBI would increase peripheral whole blood relative mtDNA copy number, and that these alterations would be associated with cerebral mitochondrial bioenergetics triggered by TBI. METHODOLOGY: Blood samples were obtained before, 6 h after, and 25 h after focal (controlled cortical impact injury: CCI) and diffuse (rapid non-impact rotational injury: RNR) TBI. PCR primers, unique to mtDNA, were identified by aligning segments of nuclear DNA (nDNA) to mtDNA, normalizing values to nuclear 16S rRNA, for a relative mtDNA copy number. Three unique mtDNA regions were selected, and PCR primers were designed within those regions, limited to 25-30 base pairs to further ensure sequence specificity, and measured utilizing qRT-PCR. RESULTS: Mean relative mtDNA copy numbers increased significantly at 6 and 25 hrs after following both focal and diffuse traumatic brain injury. Specifically, the mean relative mtDNA copy number from three mitochondrial-specific regions pre-injury was 0.84 ± 0.05. At 6 and 25 h after diffuse non-impact TBI, mean mtDNA copy number was significantly higher: 2.07 ± 0.19 (P < 0.0001) and 2.37 ± 0.42 (P < 0.001), respectively. Following focal impact TBI, relative mtDNA copy number was also significantly higher, 1.35 ± 0.12 (P < 0.0001) at 25 hours. Alterations in mitochondrial respiration in the hippocampus and cortex post-TBI correlated with changes in the relative mtDNA copy number measured in peripheral blood. CONCLUSIONS: Alterations in peripheral blood relative mtDNA copy numbers may be a novel biosignature of cerebral mitochondrial bioenergetics with exciting translational potential for non-invasive diagnostic and interventional studies.


Assuntos
Biomarcadores/sangue , Lesões Encefálicas/complicações , DNA Mitocondrial/sangue , Metabolismo Energético/fisiologia , Doenças Mitocondriais/diagnóstico , Doenças Mitocondriais/etiologia , Animais , Biologia Computacional , Primers do DNA/genética , DNA Mitocondrial/genética , Reação em Cadeia da Polimerase em Tempo Real , Estatísticas não Paramétricas , Suínos
10.
Bioinformatics ; 31(8): 1310-2, 2015 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-25505086

RESUMO

MOTIVATION: All current mitochondrial haplogroup classification tools require variants to be detected from an alignment with the reference sequence and to be properly named according to the canonical nomenclature standards for describing mitochondrial variants, before they can be compared with the haplogroup determining polymorphisms. With the emergence of high-throughput sequencing technologies and hence greater availability of mitochondrial genome sequences, there is a strong need for an automated haplogroup classification tool that is alignment-free and agnostic to reference sequence. RESULTS: We have developed a novel mitochondrial genome haplogroup-defining algorithm using a k-mer approach namely Phy-Mer. Phy-Mer performs equally well as the leading haplogroup classifier, HaploGrep, while avoiding the errors that may occur when preparing variants to required formats and notations. We have further expanded Phy-Mer functionality such that next-generation sequencing data can be used directly as input. AVAILABILITY AND IMPLEMENTATION: Phy-Mer is publicly available under the GNU Affero General Public License v3.0 on GitHub (https://github.com/danielnavarrogomez/phy-mer). CONTACT: Xiaowu_Gai@meei.harvard.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , DNA Mitocondrial/genética , Variação Genética/genética , Haplótipos/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Humanos , Software
11.
Mol Genet Metab ; 114(3): 388-96, 2015 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-25542617

RESUMO

Success rates for genomic analyses of highly heterogeneous disorders can be greatly improved if a large cohort of patient data is assembled to enhance collective capabilities for accurate sequence variant annotation, analysis, and interpretation. Indeed, molecular diagnostics requires the establishment of robust data resources to enable data sharing that informs accurate understanding of genes, variants, and phenotypes. The "Mitochondrial Disease Sequence Data Resource (MSeqDR) Consortium" is a grass-roots effort facilitated by the United Mitochondrial Disease Foundation to identify and prioritize specific genomic data analysis needs of the global mitochondrial disease clinical and research community. A central Web portal (https://mseqdr.org) facilitates the coherent compilation, organization, annotation, and analysis of sequence data from both nuclear and mitochondrial genomes of individuals and families with suspected mitochondrial disease. This Web portal provides users with a flexible and expandable suite of resources to enable variant-, gene-, and exome-level sequence analysis in a secure, Web-based, and user-friendly fashion. Users can also elect to share data with other MSeqDR Consortium members, or even the general public, either by custom annotation tracks or through the use of a convenient distributed annotation system (DAS) mechanism. A range of data visualization and analysis tools are provided to facilitate user interrogation and understanding of genomic, and ultimately phenotypic, data of relevance to mitochondrial biology and disease. Currently available tools for nuclear and mitochondrial gene analyses include an MSeqDR GBrowse instance that hosts optimized mitochondrial disease and mitochondrial DNA (mtDNA) specific annotation tracks, as well as an MSeqDR locus-specific database (LSDB) that curates variant data on more than 1300 genes that have been implicated in mitochondrial disease and/or encode mitochondria-localized proteins. MSeqDR is integrated with a diverse array of mtDNA data analysis tools that are both freestanding and incorporated into an online exome-level dataset curation and analysis resource (GEM.app) that is being optimized to support needs of the MSeqDR community. In addition, MSeqDR supports mitochondrial disease phenotyping and ontology tools, and provides variant pathogenicity assessment features that enable community review, feedback, and integration with the public ClinVar variant annotation resource. A centralized Web-based informed consent process is being developed, with implementation of a Global Unique Identifier (GUID) system to integrate data deposited on a given individual from different sources. Community-based data deposition into MSeqDR has already begun. Future efforts will enhance capabilities to incorporate phenotypic data that enhance genomic data analyses. MSeqDR will fill the existing void in bioinformatics tools and centralized knowledge that are necessary to enable efficient nuclear and mtDNA genomic data interpretation by a range of shareholders across both clinical diagnostic and research settings. Ultimately, MSeqDR is focused on empowering the global mitochondrial disease community to better define and explore mitochondrial diseases.


Assuntos
Bases de Dados Genéticas , Genoma Mitocondrial , Interface Usuário-Computador , Biologia Computacional , Exoma , Feminino , Genômica , Humanos , Disseminação de Informação , Internet , Masculino , Doenças Mitocondriais/genética , Fenótipo , Software
12.
Circ Res ; 115(10): 884-896, 2014 Oct 24.
Artigo em Inglês | MEDLINE | ID: mdl-25205790

RESUMO

RATIONALE: Congenital heart disease (CHD) is among the most common birth defects. Most cases are of unknown pathogenesis. OBJECTIVE: To determine the contribution of de novo copy number variants (CNVs) in the pathogenesis of sporadic CHD. METHODS AND RESULTS: We studied 538 CHD trios using genome-wide dense single nucleotide polymorphism arrays and whole exome sequencing. Results were experimentally validated using digital droplet polymerase chain reaction. We compared validated CNVs in CHD cases with CNVs in 1301 healthy control trios. The 2 complementary high-resolution technologies identified 63 validated de novo CNVs in 51 CHD cases. A significant increase in CNV burden was observed when comparing CHD trios with healthy trios, using either single nucleotide polymorphism array (P=7×10(-5); odds ratio, 4.6) or whole exome sequencing data (P=6×10(-4); odds ratio, 3.5) and remained after removing 16% of de novo CNV loci previously reported as pathogenic (P=0.02; odds ratio, 2.7). We observed recurrent de novo CNVs on 15q11.2 encompassing CYFIP1, NIPA1, and NIPA2 and single de novo CNVs encompassing DUSP1, JUN, JUP, MED15, MED9, PTPRE SREBF1, TOP2A, and ZEB2, genes that interact with established CHD proteins NKX2-5 and GATA4. Integrating de novo variants in whole exome sequencing and CNV data suggests that ETS1 is the pathogenic gene altered by 11q24.2-q25 deletions in Jacobsen syndrome and that CTBP2 is the pathogenic gene in 10q subtelomeric deletions. CONCLUSIONS: We demonstrate a significantly increased frequency of rare de novo CNVs in CHD patients compared with healthy controls and suggest several novel genetic loci for CHD.


Assuntos
Variações do Número de Cópias de DNA/genética , Exoma/genética , Frequência do Gene/genética , Cardiopatias Congênitas/genética , Polimorfismo de Nucleotídeo Único/genética , Estudos de Casos e Controles , Estudos de Coortes , Redes Reguladoras de Genes/genética , Cardiopatias Congênitas/diagnóstico , Humanos , Dados de Sequência Molecular
13.
Nature ; 498(7453): 220-3, 2013 Jun 13.
Artigo em Inglês | MEDLINE | ID: mdl-23665959

RESUMO

Congenital heart disease (CHD) is the most frequent birth defect, affecting 0.8% of live births. Many cases occur sporadically and impair reproductive fitness, suggesting a role for de novo mutations. Here we compare the incidence of de novo mutations in 362 severe CHD cases and 264 controls by analysing exome sequencing of parent-offspring trios. CHD cases show a significant excess of protein-altering de novo mutations in genes expressed in the developing heart, with an odds ratio of 7.5 for damaging (premature termination, frameshift, splice site) mutations. Similar odds ratios are seen across the main classes of severe CHD. We find a marked excess of de novo mutations in genes involved in the production, removal or reading of histone 3 lysine 4 (H3K4) methylation, or ubiquitination of H2BK120, which is required for H3K4 methylation. There are also two de novo mutations in SMAD2, which regulates H3K27 methylation in the embryonic left-right organizer. The combination of both activating (H3K4 methylation) and inactivating (H3K27 methylation) chromatin marks characterizes 'poised' promoters and enhancers, which regulate expression of key developmental genes. These findings implicate de novo point mutations in several hundreds of genes that collectively contribute to approximately 10% of severe CHD.


Assuntos
Cardiopatias/congênito , Cardiopatias/genética , Histonas/metabolismo , Adulto , Estudos de Casos e Controles , Criança , Cromatina/química , Cromatina/metabolismo , Análise Mutacional de DNA , Elementos Facilitadores Genéticos/genética , Exoma/genética , Feminino , Genes Controladores do Desenvolvimento/genética , Cardiopatias/metabolismo , Histonas/química , Humanos , Lisina/química , Lisina/metabolismo , Masculino , Metilação , Mutação , Razão de Chances , Regiões Promotoras Genéticas/genética
14.
Curr Protoc Bioinformatics ; 44: 1.23.1-26, 2013 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-25489354

RESUMO

The Mitomap database of human mitochondrial DNA (mtDNA) information has been an important compilation of mtDNA variation for researchers, clinicians and genetic counselors for the past twenty-five years. The Mitomap protocol shows how users may look up human mitochondrial gene loci, search for public mitochondrial sequences, and browse or search for reported general population nucleotide variants as well as those reported in clinical disease. Within Mitomap is the powerful sequence analysis tool for human mitochondrial DNA, Mitomaster. The Mitomaster protocol gives step-by-step instructions showing how to submit sequences to identify nucleotide variants relative to the rCRS, to determine the haplogroup, and to view species conservation. User-supplied sequences, GenBank identifiers and single nucleotide variants may be analyzed.


Assuntos
DNA Mitocondrial/genética , Bases de Dados Genéticas , Variação Genética , Software , Sequência de Bases , Código Genético , Haplótipos/genética , Humanos , Dados de Sequência Molecular , Polimorfismo de Fragmento de Restrição , Polimorfismo de Nucleotídeo Único/genética , Alinhamento de Sequência
15.
BMC Bioinformatics ; 14 Suppl 11: S3, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24564231

RESUMO

BACKGROUND: High-throughput sequencing (HTS) technologies are spearheading the accelerated development of biomedical research. Processing and summarizing the large amount of data generated by HTS presents a non-trivial challenge to bioinformatics. A commonly adopted standard is to store sequencing reads aligned to a reference genome in SAM (Sequence Alignment/Map) or BAM (Binary Alignment/Map) files. Quality control of SAM/BAM files is a critical checkpoint before downstream analysis. The goal of the current project is to facilitate and standardize this process. RESULTS: We developed bamchop, a robust program to efficiently summarize key statistical metrics of HTS data stored in BAM files, and to visually present the results in a formatted report. The report documents information about various aspects of HTS data, such as sequencing quality, mapping to a reference genome, sequencing coverage, and base frequency. Bamchop uses the R language and Bioconductor packages to calculate statistical matrices and the Sweave utility and associated LaTeX markup for documentation. Bamchop's efficiency and robustness were tested on BAM files generated by local sequencing facilities and the 1000 Genomes Project. Source code, instruction and example reports of bamchop are freely available from https://github.com/CBMi-BiG/bamchop. CONCLUSIONS: Bamchop enables biomedical researchers to quickly and rigorously evaluate HTS data by providing a convenient synopsis and user-friendly reports.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequência de Bases , Cromossomos , Éxons , Genoma , Reprodutibilidade dos Testes , Alinhamento de Sequência , Software
16.
Gene Regul Syst Bio ; 6: 127-37, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-23071390

RESUMO

Physarum polycephalum is a unicellular eukaryote belonging to the amoebozoa group of organisms. The complex life cycle involves various cell types that differ in morphology, function, and biochemical composition. Sporulation, one step in the life cycle, is a stimulus-controlled differentiation response of macroscopic plasmodial cells that develop into fruiting bodies. Well-established Mendelian genetics and the occurrence of macroscopic cells with a naturally synchronous population of nuclei as source of homogeneous cell material for biochemical analyses make Physarum an attractive model organism for studying the regulatory control of cell differentiation. Here, we develop an approach using RNA-sequencing (RNA-seq), without needing to rely on a genome sequence as a reference, for studying the transcriptomic changes during stimulus-triggered commitment to sporulation in individual plasmodial cells. The approach is validated through the obtained expression patterns and annotations, and particularly the results from up- and downregulated genes, which correlate well with previous studies.

17.
Gene ; 487(1): 21-8, 2011 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-21803131

RESUMO

IRF1 is a transcription factor that participates in interferon signaling. Previous studies of IRF1 binding have utilized in vitro assays. We used ChIP-seq in human monocytes to better define the recognition motif for IRF1. The newly identified 18bp motif (RAAASNGAAAGTGAAASY) is a refinement of the 13bp IRF1 motif commonly used. We utilized the 18bp consensus motif and identified 345 potential target genes. To compare the 18bp motif with the 13bp motif, we compared putative gene targets. Only 56 potential gene targets were defined by both consensus motifs. To compare biological effects of interferon on the 13bp and the 18bp consensus targets, we mined expression data from cells exposed to interferons or transfected with IRF1. In all cases, the 18bp consensus motif was more strongly associated with transcriptional responses than the 13bp motif. Therefore, the new 18bp consensus motif appears to have a greater association with biological activities of IRF1.


Assuntos
Perfilação da Expressão Gênica , Genoma Humano/genética , Fator Regulador 1 de Interferon/genética , Monócitos/metabolismo , Proteínas Adaptadoras de Transdução de Sinal , Proteínas Reguladoras de Apoptose , Sequência de Bases , Sítios de Ligação/genética , Proteínas de Transporte/genética , Proteínas de Transporte/metabolismo , Células Cultivadas , Imunoprecipitação da Cromatina/métodos , Expressão Gênica/efeitos dos fármacos , Humanos , Fator Regulador 1 de Interferon/metabolismo , Interferons/farmacologia , Interleucina-4/farmacologia , Peptídeos e Proteínas de Sinalização Intracelular/genética , Peptídeos e Proteínas de Sinalização Intracelular/metabolismo , Monócitos/citologia , Monócitos/efeitos dos fármacos , Ligação Proteica , Proteínas/genética , Proteínas/metabolismo , Proteínas de Ligação a RNA , Análise de Sequência de DNA/métodos
18.
Nucleic Acids Res ; 36(9): e49, 2008 May.
Artigo em Inglês | MEDLINE | ID: mdl-18411205

RESUMO

Gene transfer has been used to correct inherited immunodeficiencies, but in several patients integration of therapeutic retroviral vectors activated proto-oncogenes and caused leukemia. Here, we describe improved methods for characterizing integration site populations from gene transfer studies using DNA bar coding and pyrosequencing. We characterized 160,232 integration site sequences in 28 tissue samples from eight mice, where Rag1 or Artemis deficiencies were corrected by introducing the missing gene with gamma-retroviral or lentiviral vectors. The integration sites were characterized for their genomic distributions, including proximity to proto-oncogenes. Several mice harbored abnormal lymphoproliferations following therapy--in these cases, comparison of the location and frequency of isolation of integration sites across multiple tissues helped clarify the contribution of specific proviruses to the adverse events. We also took advantage of the large number of pyrosequencing reads to show that recovery of integration sites can be highly biased by the use of restriction enzyme cleavage of genomic DNA, which is a limitation in all widely used methods, but describe improved approaches that take advantage of the power of pyrosequencing to overcome this problem. The methods described here should allow integration site populations from human gene therapy to be deeply characterized with spatial and temporal resolution.


Assuntos
Terapia Genética/efeitos adversos , Análise de Sequência de DNA/métodos , Animais , Proliferação de Células , Enzimas de Restrição do DNA , Modelos Animais de Doenças , Técnicas de Transferência de Genes , Genes Supressores de Tumor , Vetores Genéticos , Lentivirus/genética , Transtornos Linfoproliferativos/genética , Camundongos , Proto-Oncogenes
19.
PLoS Pathog ; 4(3): e1000027, 2008 Mar 21.
Artigo em Inglês | MEDLINE | ID: mdl-18369476

RESUMO

Human T-lymphotropic virus type 1 (HTLV-1) causes leukaemia or chronic inflammatory disease in approximately 5% of infected hosts. The level of proviral expression of HTLV-1 differs significantly among infected people, even at the same proviral load (proportion of infected mononuclear cells in the circulation). A high level of expression of the HTLV-1 provirus is associated with a high proviral load and a high risk of the inflammatory disease of the central nervous system known as HTLV-1-associated myelopathy/tropical spastic paraparesis (HAM/TSP). But the factors that control the rate of HTLV-1 proviral expression remain unknown. Here we show that proviral integration sites of HTLV-1 in vivo are not randomly distributed within the human genome but are associated with transcriptionally active regions. Comparison of proviral integration sites between individuals with high and low levels of proviral expression, and between provirus-expressing and provirus non-expressing cells from within an individual, demonstrated that frequent integration into transcription units was associated with an increased rate of proviral expression. An increased frequency of integration sites in transcription units in individuals with high proviral expression was also associated with the inflammatory disease HAM/TSP. By comparing the distribution of integration sites in human lymphocytes infected in short-term cell culture with those from persistent infection in vivo, we infer the action of two selective forces that shape the distribution of integration sites in vivo: positive selection for cells containing proviral integration sites in transcriptionally active regions of the genome, and negative selection against cells with proviral integration sites within transcription units.


Assuntos
Regulação Viral da Expressão Gênica , Vírus Linfotrópico T Tipo 1 Humano/genética , Paraparesia Espástica Tropical/virologia , Provírus/genética , Integração Viral/genética , Animais , Clonagem Molecular , DNA Viral/sangue , Genoma Humano , Vírus Linfotrópico T Tipo 1 Humano/imunologia , Humanos , Células Jurkat , Leucócitos Mononucleares/virologia , Paraparesia Espástica Tropical/sangue , RNA Mensageiro , RNA Viral/sangue , Carga Viral
20.
Nucleic Acids Res ; 35(13): e91, 2007.
Artigo em Inglês | MEDLINE | ID: mdl-17576693

RESUMO

Treatment of HIV-infected individuals with antiretroviral agents selects for drug-resistant mutants, resulting in frequent treatment failures. Although the major antiretroviral resistance mutations are routinely characterized by DNA sequencing, treatment failures are still common, probably in part because undetected rare resistance mutations facilitate viral escape. Here we combined DNA bar coding and massively parallel pyrosequencing to quantify rare drug resistance mutations. Using DNA bar coding, we were able to analyze seven viral populations in parallel, overall characterizing 118 093 sequence reads of average length 103 bp. Analysis of a control HIV mixture showed that resistance mutations present as 5% of the population could be readily detected without false positive calls. In three samples of multidrug-resistant HIV populations from patients, all the drug-resistant mutations called by conventional analysis were identified, as well as four additional low abundance drug resistance mutations, some of which would be expected to influence the response to antiretroviral therapy. Methods for sensitive characterization of HIV resistance alleles have been reported, but only the pyrosequencing method allows all the positions at risk for drug resistance mutations to be interrogated deeply for many HIV populations in a single experiment.


Assuntos
Fármacos Anti-HIV/uso terapêutico , HIV/genética , Mutação , Análise de Sequência de DNA/métodos , Alelos , DNA Viral/química , Interpretação Estatística de Dados , Farmacorresistência Viral/genética , HIV/efeitos dos fármacos , Infecções por HIV/tratamento farmacológico , Humanos , Reação em Cadeia da Polimerase
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA