Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 27
Filtrar
1.
Bioinformatics ; 30(5): 652-9, 2014 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-24135263

RESUMO

MOTIVATION: Inferring lengths of inherited microsatellite alleles with single base pair resolution from short sequence reads is challenging due to several sources of noise caused by the repetitive nature of microsatellites and the technologies used to generate raw sequence data. RESULTS: We have developed a program, GenoTan, using a discretized Gaussian mixture model combined with a rules-based approach to identify inherited variation of microsatellite loci from short sequence reads without paired-end information. It effectively distinguishes length variants from noise including insertion/deletion errors in homopolymer runs by addressing the bidirectional aspect of insertion and deletion errors in sequence reads. Here we first introduce a homopolymer decomposition method which estimates error bias toward insertion or deletion in homopolymer sequence runs. Combining these approaches, GenoTan was able to genotype 94.9% of microsatellite loci accurately from simulated data with 40x sequence coverage quickly while the other programs showed <90% correct calls for the same data and required 5∼30× more computational time than GenoTan. It also showed the highest true-positive rate for real data using mixed sequence data of two Drosophila inbred lines, which was a novel validation approach for genotyping. AVAILABILITY: GenoTan is open-source software available at http://genotan.sourceforge.net.


Assuntos
Técnicas de Genotipagem , Repetições de Microssatélites , Análise de Sequência de DNA/métodos , Alelos , Animais , Drosophila/genética , Loci Gênicos , Genótipo , Humanos , Distribuição Normal , Software
2.
Genomics ; 104(6 Pt B): 453-8, 2014 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-25173571

RESUMO

Several studies have demonstrated that unmapped reads in next generation sequencing data could be used to identify infectious agents or structural variants, but there has been no intensive effort to analyze and classify all non-human sequences found in individual large data sets. To identify commonality in non-human sequences by infectious agents and putative contamination events, we analyzed non-human sequences in 150 genomic sequencing data files from the 1000 Genomes Project and observed that 0.13% of reads on average showed similarities to non-human genomes. We compared results among different sample groups divided based on ethnicities, sequencing centers and enrichment methods (whole genome sequencing vs. exome sequencing) and found that sequencing centers had specific signatures of contaminating genomes as 'time stamps'. We also observed many unmapped reads that falsely indicated contamination because of the high similarity of human sequences to sequences in non-human genome assemblies such as mouse and Nicotiana.


Assuntos
Contaminação por DNA , Genoma Humano , DNA Bacteriano/química , DNA de Plantas/química , DNA Viral/química , Humanos
3.
Bioinformatics ; 29(14): 1734-41, 2013 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-23677944

RESUMO

MOTIVATION: Simple tandem repeats are highly variable genetic elements and widespread in genomes of many organisms. Next-generation sequencing technologies have enabled a robust comparison of large numbers of simple tandem repeat loci; however, analysis of their variation using traditional sequence analysis approaches still remains limiting and problematic due to variants occurring in repeat sequences confusing alignment programs into mapping sequence reads to incorrect loci when the sequence reads are significantly different from the reference sequence. RESULTS: We have developed a program, ReviSTER, which is an automated pipeline using a 'local mapping reference reconstruction method' to revise mismapped or partially misaligned reads at simple tandem repeat loci. RevisSTER estimates alleles of repeat loci using a local alignment method and creates temporary local mapping reference sequences, and finally remaps reads to the local mapping references. Using this approach, ReviSTER was able to successfully revise reads misaligned to repeat loci from both simulated data and real data. AVAILABILITY: ReviSTER is open-source software available at http://revister.sourceforge.net. CONTACT: garner@vbi.vt.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Software , Sequências de Repetição em Tandem , Alelos , Exoma , Genômica , Técnicas de Genotipagem , Haploidia , Sequenciamento de Nucleotídeos em Larga Escala , Humanos
4.
Genomics ; 100(5): 271-6, 2012 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-22967795

RESUMO

Sequencing data analysis remains limiting and problematic, especially for low complexity repeat sequences and transposon elements due to inherent sequencing errors and short sequence read lengths. We have developed a program, ReviSeq, which uses a hybrid method composed of iterative remapping and local assembly upon a bacterial sequence backbone. Application of this method to six Brucella suis field isolates compared to the newly revised B. suis 1330 reference genome identified on average 13, 15, 19 and 9 more variants per sample than STAMPY/SAMtools, BWA/SAMtools, iCORN and BWA/PINDEL pipelines, and excluded on average 4, 2, 3 and 19 variants per sample, respectively. In total, using this iterative approach, we identified on average 87 variants including SNVs, short INDELs and long INDELs per strain when compared to the reference. Our program outperforms other methods especially for long INDEL calling. The program is available at http://reviseq.sourceforge.net.


Assuntos
Brucella suis/genética , Técnicas Genéticas , Variação Genética , Genoma Bacteriano/genética , Software , Sequência de Bases , Análise por Conglomerados , Mutação INDEL/genética , Dados de Sequência Molecular , Filogenia , Análise de Sequência de DNA/métodos
5.
J Bacteriol ; 194(4): 910, 2012 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-22275106

RESUMO

Brucella suis is the causative agent of swine brucellosis and is known to be able to infect several different hosts, including cattle, dogs, and horses, without causing disease symptoms. Here we report the complete genome sequence of Brucella suis VBI22, which was isolated from raw milk from an infected cow.


Assuntos
Brucella suis/genética , Brucella suis/isolamento & purificação , Genoma Bacteriano , Leite/microbiologia , Animais , Sequência de Bases , Brucelose Bovina/microbiologia , Bovinos , Dados de Sequência Molecular , Análise de Sequência de DNA
6.
BMC Bioinformatics ; 13: 247, 2012 Sep 26.
Artigo em Inglês | MEDLINE | ID: mdl-23009593

RESUMO

BACKGROUND: With the advent of next-generation sequencing (NGS) technologies, full cDNA shotgun sequencing has become a major approach in the study of transcriptomes, and several different protocols in 454 sequencing have been invented. As each protocol uses its own short DNA tags or adapters attached to the ends of cDNA fragments for labeling or sequencing, different contaminants may lead to mis-assembly and inaccurate sequence products. RESULTS: We have designed and implemented a new program for raw sequence cleaning in a graphical user interface and a batch script. The cleaning process consists of several modules including barcode trimming, sequencing adapter trimming, amplification primer trimming, poly-A tail trimming, vector screening and low quality region trimming. These modules can be combined based on various sequencing applications. CONCLUSIONS: ESTclean is a software package not only for cleaning cDNA sequences, but also for helping to develop sequencing protocols by providing summary tables and figures for sequencing quality control in a graphical user interface. It outperforms in cleaning read sequences from complicated sequencing protocols which use barcodes and multiple amplification primers.


Assuntos
Etiquetas de Sequências Expressas , Análise de Sequência de DNA/métodos , Software , Transcriptoma , Animais , Primers do DNA/genética , DNA Complementar/genética , Drosophila melanogaster/genética , Sequenciamento de Nucleotídeos em Larga Escala
7.
Genes Chromosomes Cancer ; 50(4): 275-83, 2011 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-21319262

RESUMO

Using a custom CGH-like oligonucleotide array to measure the global microsatellite content in the genomes of 72 cancer, cancer-free, and high risk patient and cell line samples (56 germline DNA and 16 in tumor or tumor cell line DNA) we found a unique, reproducible, and statistically significant pattern of 18 motif-specific microsatellite families (out of 962 possible 1-6 mer repeats) in breast cancer patient germline and tumor DNA, but not in germline DNA of cancer-free volunteer controls or in breast cancer patients with BRCA1/2 mutations. These high-similarity A/T rich repetitive motifs were also more pronounced in the germlines and tumors of colon cancer tumor patients (3/6 samples) and microsatellite unstable colon cancer cell lines; however, germline DNA of sporadic breast cancer patients exhibited the largest global content shift for those motifs with extreme AT/GC ratios. These results indicate that global microsatellite variability is complex, suggest the existence of a previously unknown genomic destabilization mechanism in breast cancer patients' germline DNA, and warrant further testing of such microsatellite variability as a predictor of future breast cancer development.


Assuntos
Sequência Rica em At , Neoplasias da Mama/genética , Instabilidade de Microssatélites , Repetições de Microssatélites/genética , Linhagem Celular Tumoral , Neoplasias do Colo/genética , DNA de Neoplasias/genética , Feminino , Genes BRCA1 , Genes BRCA2 , Variação Genética , Humanos , Mutação , Análise de Sequência com Séries de Oligonucleotídeos/métodos
8.
J Bacteriol ; 193(22): 6410, 2011 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-22038969

RESUMO

Brucella suis is a causative agent of porcine brucellosis. We report the resequencing of the original sample upon which the published sequence of Brucella suis 1330 is based and describe the differences between the published assembly and our assembly at 12 loci.


Assuntos
Brucella suis/genética , Genoma Bacteriano , Sequência de Bases , Anotação de Sequência Molecular , Dados de Sequência Molecular
9.
BMC Genomics ; 11: 703, 2010 Dec 14.
Artigo em Inglês | MEDLINE | ID: mdl-21156066

RESUMO

BACKGROUND: Horned beetles, in particular in the genus Onthophagus, are important models for studies on sexual selection, biological radiations, the origin of novel traits, developmental plasticity, biocontrol, conservation, and forensic biology. Despite their growing prominence as models for studying both basic and applied questions in biology, little genomic or transcriptomic data are available for this genus. We used massively parallel pyrosequencing (Roche 454-FLX platform) to produce a comprehensive EST dataset for the horned beetle Onthophagus taurus. To maximize sequence diversity, we pooled RNA extracted from a normalized library encompassing diverse developmental stages and both sexes. RESULTS: We used 454 pyrosequencing to sequence ESTs from all post-embryonic stages of O. taurus. Approximately 1.36 million reads assembled into 50,080 non-redundant sequences encompassing a total of 26.5 Mbp. The non-redundant sequences match over half of the genes in Tribolium castaneum, the most closely related species with a sequenced genome. Analyses of Gene Ontology annotations and biochemical pathways indicate that the O. taurus sequences reflect a wide and representative sampling of biological functions and biochemical processes. An analysis of sequence polymorphisms revealed that SNP frequency was negatively related to overall expression level and the number of tissue types in which a given gene is expressed. The most variable genes were enriched for a limited number of GO annotations whereas the least variable genes were enriched for a wide range of GO terms directly related to fitness. CONCLUSIONS: This study provides the first large-scale EST database for horned beetles, a much-needed resource for advancing the study of these organisms. Furthermore, we identified instances of gene duplications and alternative splicing, useful for future study of gene regulation, and a large number of SNP markers that could be used in population-genetic studies of O. taurus and possibly other horned beetles.


Assuntos
Besouros/anatomia & histologia , Besouros/genética , Genes de Insetos/genética , Cornos , Processamento Alternativo/genética , Animais , Sequência de Bases , Análise por Conglomerados , Bases de Dados Genéticas , Bases de Dados de Proteínas , Redes e Vias Metabólicas/genética , Anotação de Sequência Molecular , Filogenia , Polimorfismo de Nucleotídeo Único/genética , Sequências Repetitivas de Ácido Nucleico/genética , Análise de Sequência de DNA
10.
BMC Genomics ; 11: 694, 2010 Dec 07.
Artigo em Inglês | MEDLINE | ID: mdl-21138572

RESUMO

BACKGROUND: The reptiles, characterized by both diversity and unique evolutionary adaptations, provide a comprehensive system for comparative studies of metabolism, physiology, and development. However, molecular resources for ectothermic reptiles are severely limited, hampering our ability to study the genetic basis for many evolutionarily important traits such as metabolic plasticity, extreme longevity, limblessness, venom, and freeze tolerance. Here we use massively parallel sequencing (454 GS-FLX Titanium) to generate a transcriptome of the western terrestrial garter snake (Thamnophis elegans) with two goals in mind. First, we develop a molecular resource for an ectothermic reptile; and second, we use these sex-specific transcriptomes to identify differences in the presence of expressed transcripts and potential genes of evolutionary interest. RESULTS: Using sex-specific pools of RNA (one pool for females, one pool for males) representing 7 tissue types and 35 diverse individuals, we produced 1.24 million sequence reads, which averaged 366 bp in length after cleaning. Assembly of the cleaned reads from both sexes with NEWBLER and MIRA resulted in 96,379 contigs containing 87% of the cleaned reads. Over 34% of these contigs and 13% of the singletons were annotated based on homology to previously identified proteins. From these homology assignments, additional clustering, and ORF predictions, we estimate that this transcriptome contains ~13,000 unique genes that were previously identified in other species and over 66,000 transcripts from unidentified protein-coding genes. Furthermore, we use a graph-clustering method to identify contigs linked by NEWBLER-split reads that represent divergent alleles, gene duplications, and alternatively spliced transcripts. Beyond gene identification, we identified 95,295 SNPs and 31,651 INDELs. From these sex-specific transcriptomes, we identified 190 genes that were only present in the mRNA sequenced from one of the sexes (84 female-specific, 106 male-specific), and many highly variable genes of evolutionary interest. CONCLUSIONS: This is the first large-scale, multi-organ transcriptome for an ectothermic reptile. This resource provides the most comprehensive set of EST sequences available for an individual ectothermic reptile species, increasing the number of snake ESTs 50-fold. We have identified genes that appear to be under evolutionary selection and those that are sex-specific. This resource will assist studies on gene expression and comparative genomics, and will facilitate the study of evolutionarily important traits at the molecular level.


Assuntos
Colubridae/genética , Perfilação da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Caracteres Sexuais , Animais , Sequência de Bases , Análise por Conglomerados , Feminino , Regulação da Expressão Gênica , Genoma/genética , Lagartos/genética , Complexo Principal de Histocompatibilidade/genética , Masculino , Anotação de Sequência Molecular , Mutação/genética , Filogenia , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Análise de Sequência de DNA , Homologia de Sequência do Ácido Nucleico , Titânio
11.
Cancer Med ; 9(17): 6452-6460, 2020 09.
Artigo em Inglês | MEDLINE | ID: mdl-32644297

RESUMO

Microsatellite instability (MSI) is a key secondary effect of a defective DNA mismatch repair mechanism resulting in incorrectly replicated microsatellites in many malignant tumors. Historically, MSI detection has been performed by fragment analysis (FA) on a panel of representative genomic markers. More recently, using next-generation sequencing (NGS) to analyze thousands of microsatellites has been shown to improve the robustness and sensitivity of MSI detection. However, NGS-based MSI tests can be prone to population biases if NGS results are aligned to a reference genome instead of patient-matched normal tissue. We observed an increased rate of false positives in patients of African ancestry with an NGS-based diagnostic for MSI status utilizing 7317 microsatellite loci. We then minimized this bias by training a modified calling model that utilized 2011 microsatellite loci. With these adjustments 100% (95% CI: 89.1% to 100%) of African ancestry patients in an independent validation test were called correctly using the updated model. This poses not only a significant technical improvement but also has an important clinical impact on directing immune checkpoint inhibitor therapy.


Assuntos
Reparo de Erro de Pareamento de DNA , Sequenciamento de Nucleotídeos em Larga Escala , Instabilidade de Microssatélites , Neoplasias/genética , Viés , População Negra , Intervalos de Confiança , Proteínas de Ligação a DNA/análise , Reações Falso-Positivas , Feminino , Marcadores Genéticos , Humanos , Masculino , Endonuclease PMS2 de Reparo de Erro de Pareamento/análise , Proteína 1 Homóloga a MutL/análise , Proteína 2 Homóloga a MutS/análise , Reprodutibilidade dos Testes , Fatores Sexuais
12.
J Immunother Cancer ; 8(1)2020 03.
Artigo em Inglês | MEDLINE | ID: mdl-32217756

RESUMO

BACKGROUND: Tumor mutational burden (TMB), defined as the number of somatic mutations per megabase of interrogated genomic sequence, demonstrates predictive biomarker potential for the identification of patients with cancer most likely to respond to immune checkpoint inhibitors. TMB is optimally calculated by whole exome sequencing (WES), but next-generation sequencing targeted panels provide TMB estimates in a time-effective and cost-effective manner. However, differences in panel size and gene coverage, in addition to the underlying bioinformatics pipelines, are known drivers of variability in TMB estimates across laboratories. By directly comparing panel-based TMB estimates from participating laboratories, this study aims to characterize the theoretical variability of panel-based TMB estimates, and provides guidelines on TMB reporting, analytic validation requirements and reference standard alignment in order to maintain consistency of TMB estimation across platforms. METHODS: Eleven laboratories used WES data from The Cancer Genome Atlas Multi-Center Mutation calling in Multiple Cancers (MC3) samples and calculated TMB from the subset of the exome restricted to the genes covered by their targeted panel using their own bioinformatics pipeline (panel TMB). A reference TMB value was calculated from the entire exome using a uniform bioinformatics pipeline all members agreed on (WES TMB). Linear regression analyses were performed to investigate the relationship between WES and panel TMB for all 32 cancer types combined and separately. Variability in panel TMB values at various WES TMB values was also quantified using 95% prediction limits. RESULTS: Study results demonstrated that variability within and between panel TMB values increases as the WES TMB values increase. For each panel, prediction limits based on linear regression analyses that modeled panel TMB as a function of WES TMB were calculated and found to approximately capture the intended 95% of observed panel TMB values. Certain cancer types, such as uterine, bladder and colon cancers exhibited greater variability in panel TMB values, compared with lung and head and neck cancers. CONCLUSIONS: Increasing uptake of TMB as a predictive biomarker in the clinic creates an urgent need to bring stakeholders together to agree on the harmonization of key aspects of panel-based TMB estimation, such as the standardization of TMB reporting, standardization of analytical validation studies and the alignment of panel-based TMB values with a reference standard. These harmonization efforts should improve consistency and reliability of panel TMB estimates and aid in clinical decision-making.


Assuntos
Guias como Assunto/normas , Inibidores de Checkpoint Imunológico/uso terapêutico , Carga Tumoral/genética , Simulação por Computador , Humanos , Inibidores de Checkpoint Imunológico/farmacologia , Mutação
13.
Bioprocess Biosyst Eng ; 32(6): 723-7, 2009 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-19205748

RESUMO

Polyketides have diverse biological activities, including pharmacological functions such as antibiotic, antitumor and agrochemical properties. They are biosynthesized from short carboxylic acid precursors by polyketide synthases (PKSs). As natural polyketide products include many clinically important drugs and the volume of data on polyketides is rapidly increasing, the development of a database system to manage polyketide data is essential. MapsiDB is an integrated web database formulated to contain data on type I polyketides and their PKSs, including domain and module composition and related genome information. Data on polyketides were collected from journals and online resources and processed with analysis programs. Web interfaces were utilized to construct and to access this database, allowing polyketide researchers to add their data to this database and to use it easily.


Assuntos
Bases de Dados de Proteínas , Policetídeo Sintases/química , Sistemas de Gerenciamento de Base de Dados , Bases de Dados Genéticas , Internet , Macrolídeos/química , Macrolídeos/classificação , Macrolídeos/metabolismo , Policetídeo Sintases/genética , Policetídeo Sintases/metabolismo , Interface Usuário-Computador
14.
J Microbiol Biotechnol ; 19(2): 140-6, 2009 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-19307762

RESUMO

MAPSI (Management and Analysis for Polyketide Synthase Type I) has been developed to offer computational analysis methods to detect type I PKS (polyketide synthase) gene clusters in genome sequences. MAPSI provides a genome analysis component, which detects PKS gene clusters by identifying domains in proteins of a genome. MAPSI also contains databases on polyketides and genome annotation data, as well as analytic components such as new PKS assembly and domain analysis. The polyketide data and analysis component are accessible through Web interfaces and are displayed with diverse information. MAPSI, which was developed to aid researchers studying type I polyketides, provides diverse components to access and analyze polyketide information and should become a very powerful computational tool for polyketide research. The system can be extended through further studies of factors related to the biological activities of polyketides.


Assuntos
Bacteriemia/genética , Família Multigênica , Policetídeo Sintases/genética , Software , Algoritmos , Biologia Computacional , Bases de Dados de Proteínas , Genoma Bacteriano , Cadeias de Markov , Alinhamento de Sequência , Análise de Sequência de Proteína , Homologia de Sequência de Aminoácidos , Interface Usuário-Computador
15.
JCO Precis Oncol ; 3: 1-13, 2019 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-35100709

RESUMO

PURPOSE: Tumor mutational burden (TMB) is a developing biomarker in non-small-cell lung cancer (NSCLC). Little is known regarding differences between TMB and sample location, histology, or other biomarkers. METHODS: A total of 3,424 unmatched NSCLC samples, including 2,351 lung adenocarcinomas (LUADs) and 1,073 lung squamous cell carcinomas (LUSCs), underwent profiling, including next-generation sequencing of 592 cancer-related genes, programmed death ligand 1 immunohistochemistry, and TMB. The rate TMB of 10 mutations per megabase (Mb) or greater was compared between primary and metastatic LUAD and LUSC. Molecular alteration frequency was compared at a cutoff of 10 mutations/Mb. RESULTS: LUAD metastases were more likely to have a TMB of 10 mutations/Mb or greater compared with primary LUADs (38% v 25%; P < .001), and this difference was most pronounced with brain metastases (61% v 35% for other metastases; P < .001). The median TMB for LUAD brain metastases was 13 mutations/Mb compared with six mutations/Mb for primary LUADs. Variability existed for other LUAD metastasis sites, with adrenal metastases most likely to meet the cutoff of 10 mutations/Mb (51%) and bone metastases least likely to meet the cutoff (19%). TMB was more commonly 10 mutations/Mb or greater for LUSC primary tumors than for LUAD primary tumors (35% v 25%, respectively; P < .001). LUSC metastases were more likely to have a TMB of 10 mutations/Mb or greater than LUSC primary tumors. Poorly differentiated disease was more likely have a TMB of 10 mutations/Mb or greater when stratified by histology and primary tumor or metastasis. Site-specific molecular differences existed at this TMB cutoff including programmed death ligand 1 positivity and STK11 and KRAS mutation rate. CONCLUSION: TMB is a site-specific biomarker in NSCLC with important spatial and histologic differences. TMB is more frequently 10 mutations/Mb or greater in LUAD and LUSC metastases and highest in LUAD brain metastases. Along this TMB cutoff, clinically informative distinctions exist in other tumor profiling characteristics. Further investigation is needed to expand on these findings.

16.
BMC Bioinformatics ; 8: 327, 2007 Sep 03.
Artigo em Inglês | MEDLINE | ID: mdl-17764579

RESUMO

BACKGROUND: Polyketides are secondary metabolites of microorganisms with diverse biological activities, including pharmacological functions such as antibiotic, antitumor and agrochemical properties. Polyketides are synthesized by serialized reactions of a set of enzymes called polyketide synthase(PKS)s, which coordinate the elongation of carbon skeletons by the stepwise condensation of short carbon precursors. Due to their importance as drugs, the volume of data on polyketides is rapidly increasing and creating a need for computational analysis methods for efficient polyketide research. Moreover, the increasing use of genetic engineering to research new kinds of polyketides requires genome wide analysis. RESULTS: We describe a system named ASMPKS (Analysis System for Modular Polyketide Synthesis) for computational analysis of PKSs against genome sequences. It also provides overall management of information on modular PKS, including polyketide database construction, new PKS assembly, and chain visualization. ASMPKS operates on a web interface to construct the database and to analyze PKSs, allowing polyketide researchers to add their data to this database and to use it easily. In addition, the ASMPKS can predict functional modules for a protein sequence submitted by users, estimate the chemical composition of a polyketide synthesized from the modules, and display the carbon chain structure on the web interface. CONCLUSION: ASMPKS has powerful computation features to aid modular PKS research. As various factors, such as starter units and post-processing, are related to polyketide biosynthesis, ASMPKS will be improved through further development for study of the factors.


Assuntos
Biologia Computacional/métodos , Policetídeo Sintases/química , Policetídeo Sintases/genética , Algoritmos , Carbono/química , Domínio Catalítico , Computadores , Engenharia Genética , Genoma Bacteriano , Genômica/métodos , Modelos Biológicos , Modelos Teóricos , Complexos Multienzimáticos/química , Software
17.
Sci Rep ; 6: 27722, 2016 06 09.
Artigo em Inglês | MEDLINE | ID: mdl-27278669

RESUMO

The human genome is 99% complete. This study contributes to filling the 1% gap by enriching previously unknown repeat regions called microsatellites (MST). We devised a Global MST Enrichment (GME) kit to enrich and nextgen sequence 2 colorectal cell lines and 16 normal human samples to illustrate its utility in identifying contigs from reads that do not map to the genome reference. The analysis of these samples yielded 790 novel extra-referential concordant contigs that are observed in more than one sample. We searched for evidence of functional elements in the concordant contigs in two ways: (1) BLAST-ing each contig against normal RNA-Seq samples, (2) Checking for predicted functional elements using GlimmerHMM. Of the 790 concordant contigs, 37 had an exact match to at least one RNA-Seq read; 15 aligned to more than 100 RNA-Seq reads. Of the 249 concordant contigs predicted by GlimmerHMM to have functional elements, 6 had at least one exact RNA-Seq match. BLAST-ing these novel contigs against all publically available sequences confirmed that they were found in human and chimpanzee BAC and FOSMID clones sequenced as part of the original human genome project. These extra-referential contigs predominantly contained pentameric repeats, especially two motifs: AATGG and GTGGA.


Assuntos
Repetições de Microssatélites , Análise de Sequência de DNA/métodos , Análise de Sequência de RNA/métodos , Algoritmos , Animais , Linhagem Celular , Mapeamento de Sequências Contíguas , Genoma Humano , Genômica , Humanos , Pan troglodytes/genética
18.
Genome Biol Evol ; 8(5): 1482-8, 2016 05 30.
Artigo em Inglês | MEDLINE | ID: mdl-27189993

RESUMO

The Hawaiian archipelago provides a natural arena for understanding adaptive radiation and speciation. The Hawaiian Drosophila are one of the most diverse endemic groups in Hawaiì with up to 1,000 species. We sequenced and analyzed entire genomes of recently diverged species of Hawaiian picture-winged Drosophila, Drosophila silvestris and Drosophila heteroneura from Hawaiì Island, in comparison with Drosophila planitibia, their sister species from Maui, a neighboring island where a common ancestor of all three had likely occurred. Genome-wide single nucleotide polymorphism patterns suggest the more recent origin of D. silvestris and D. heteroneura, as well as a pervasive influence of positive selection on divergence of the three species, with the signatures of positive selection more prominent in sympatry than allopatry. Positively selected genes were significantly enriched for functional terms related to sensory detection and mating, suggesting that sexual selection played an important role in speciation of these species. In particular, sequence variation in Olfactory receptor and Gustatory receptor genes seems to play a major role in adaptive radiation in Hawaiian pictured-winged Drosophila.


Assuntos
Drosophila/genética , Especiação Genética , Variação Genética , Genética Populacional , Animais , Genoma de Inseto , Havaí , Sequenciamento de Nucleotídeos em Larga Escala , Filogenia , Especificidade da Espécie
19.
PLoS One ; 9(11): e110263, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25402475

RESUMO

Microsatellites (MST), tandem repeats of 1-6 nucleotide motifs, are mutational hot-spots with a bias for insertions and deletions (INDELs) rather than single nucleotide polymorphisms (SNPs). The majority of MST instability studies are limited to a small number of loci, the Bethesda markers, which are only informative for a subset of colorectal cancers. In this paper we evaluate non-haplotype alleles present within next-gen sequencing data to evaluate somatic MST variation (SMV) within DNA repair proficient and DNA repair defective cell lines. We confirm that alleles present within next-gen data that do not contribute to the haplotype can be reliably quantified and utilized to evaluate the SMV without requiring comparisons of matched samples. We observed that SMV patterns found in DNA repair proficient cell lines without DNA repair defects, MCF10A, HEK293 and PD20 RV:D2, had consistent patterns among samples. Further, we were able to confirm that changes in SMV patterns in cell lines lacking functional BRCA2, FANCD2 and mismatch repair were consistent with the different pathways perturbed. Using this new exome sequencing analysis approach we show that DNA instability can be identified in a sample and that patterns of instability vary depending on the impaired DNA repair mechanism, and that genes harboring minor alleles are strongly associated with cancer pathways. The MST Minor Allele Caller used for this study is available at https://github.com/zalmanv/MST_minor_allele_caller.


Assuntos
Distúrbios no Reparo do DNA/genética , Reparo do DNA , Exoma , Variação Genética , Repetições de Microssatélites , Alelos , Linhagem Celular , Cromossomos Humanos Par 1 , Feminino , Loci Gênicos , Haplótipos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequenciamento de Nucleotídeos em Larga Escala/normas , Humanos , Mutação INDEL , Masculino , Polimorfismo de Nucleotídeo Único , Reprodutibilidade dos Testes
20.
Oncotarget ; 5(13): 4788-98, 2014 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-24947164

RESUMO

Although the connection between cancer and cigarette smoke is well established, nicotine is not characterized as a carcinogen. Here, we used exome sequencing to identify nicotine and oxidative stress-induced somatic mutations in normal human epithelial cells and its correlation with cancer. We identified over 6,400 SNVs, indels and microsatellites in each of the stress exposed cells relative to the control, of which, 2,159 were consistently observed at all nicotine doses. These included 429 nsSNVs including 158 novel and 79 cancer-associated. Over 80% of consistently nicotine induced variants overlap with variations detected in oxidative stressed cells, indicating that nicotine induced genomic alterations could be mediated through oxidative stress. Nicotine induced mutations were distributed across 1,585 genes, of which 49% were associated with cancer. MUC family genes were among the top mutated genes. Analysis of 591 lung carcinoma tumor exomes from The Cancer Genome Atlas (TCGA) revealed that 20% of non-small-cell lung cancer tumors in smokers have mutations in at least one of the MUC4, MUC6 or MUC12 genes in contrast to only 6% in non-smokers. These results indicate that nicotine induces genomic variations, promotes instability potentially mediated by oxidative stress, implicating nicotine in carcinogenesis, and establishes MUC genes as potential targets.


Assuntos
Exoma/genética , Peróxido de Hidrogênio/farmacologia , Mutação/efeitos dos fármacos , Neoplasias/genética , Nicotina/farmacologia , Adenocarcinoma/genética , Sequência de Bases , Carcinógenos/farmacologia , Carcinoma Pulmonar de Células não Pequenas , Carcinoma de Células Escamosas/genética , Linhagem Celular , Humanos , Mutação INDEL/efeitos dos fármacos , Neoplasias Pulmonares/genética , Repetições de Microssatélites/efeitos dos fármacos , Repetições de Microssatélites/genética , Mucina-2/genética , Mucina-4/genética , Mucinas/genética , Oxidantes/farmacologia , Estresse Oxidativo , Análise de Sequência de DNA/métodos , Fumar
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA