Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 27
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Bioinformatics ; 30(5): 652-9, 2014 Mar 01.
Artículo en Inglés | MEDLINE | ID: mdl-24135263

RESUMEN

MOTIVATION: Inferring lengths of inherited microsatellite alleles with single base pair resolution from short sequence reads is challenging due to several sources of noise caused by the repetitive nature of microsatellites and the technologies used to generate raw sequence data. RESULTS: We have developed a program, GenoTan, using a discretized Gaussian mixture model combined with a rules-based approach to identify inherited variation of microsatellite loci from short sequence reads without paired-end information. It effectively distinguishes length variants from noise including insertion/deletion errors in homopolymer runs by addressing the bidirectional aspect of insertion and deletion errors in sequence reads. Here we first introduce a homopolymer decomposition method which estimates error bias toward insertion or deletion in homopolymer sequence runs. Combining these approaches, GenoTan was able to genotype 94.9% of microsatellite loci accurately from simulated data with 40x sequence coverage quickly while the other programs showed <90% correct calls for the same data and required 5∼30× more computational time than GenoTan. It also showed the highest true-positive rate for real data using mixed sequence data of two Drosophila inbred lines, which was a novel validation approach for genotyping. AVAILABILITY: GenoTan is open-source software available at http://genotan.sourceforge.net.


Asunto(s)
Técnicas de Genotipaje , Repeticiones de Microsatélite , Análisis de Secuencia de ADN/métodos , Alelos , Animales , Drosophila/genética , Sitios Genéticos , Genotipo , Humanos , Distribución Normal , Programas Informáticos
2.
Genomics ; 104(6 Pt B): 453-8, 2014 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-25173571

RESUMEN

Several studies have demonstrated that unmapped reads in next generation sequencing data could be used to identify infectious agents or structural variants, but there has been no intensive effort to analyze and classify all non-human sequences found in individual large data sets. To identify commonality in non-human sequences by infectious agents and putative contamination events, we analyzed non-human sequences in 150 genomic sequencing data files from the 1000 Genomes Project and observed that 0.13% of reads on average showed similarities to non-human genomes. We compared results among different sample groups divided based on ethnicities, sequencing centers and enrichment methods (whole genome sequencing vs. exome sequencing) and found that sequencing centers had specific signatures of contaminating genomes as 'time stamps'. We also observed many unmapped reads that falsely indicated contamination because of the high similarity of human sequences to sequences in non-human genome assemblies such as mouse and Nicotiana.


Asunto(s)
Contaminación de ADN , Genoma Humano , ADN Bacteriano/química , ADN de Plantas/química , ADN Viral/química , Humanos
3.
Bioinformatics ; 29(14): 1734-41, 2013 Jul 15.
Artículo en Inglés | MEDLINE | ID: mdl-23677944

RESUMEN

MOTIVATION: Simple tandem repeats are highly variable genetic elements and widespread in genomes of many organisms. Next-generation sequencing technologies have enabled a robust comparison of large numbers of simple tandem repeat loci; however, analysis of their variation using traditional sequence analysis approaches still remains limiting and problematic due to variants occurring in repeat sequences confusing alignment programs into mapping sequence reads to incorrect loci when the sequence reads are significantly different from the reference sequence. RESULTS: We have developed a program, ReviSTER, which is an automated pipeline using a 'local mapping reference reconstruction method' to revise mismapped or partially misaligned reads at simple tandem repeat loci. RevisSTER estimates alleles of repeat loci using a local alignment method and creates temporary local mapping reference sequences, and finally remaps reads to the local mapping references. Using this approach, ReviSTER was able to successfully revise reads misaligned to repeat loci from both simulated data and real data. AVAILABILITY: ReviSTER is open-source software available at http://revister.sourceforge.net. CONTACT: garner@vbi.vt.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Alineación de Secuencia/métodos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Secuencias Repetidas en Tándem , Alelos , Exoma , Genómica , Técnicas de Genotipaje , Haploidia , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos
4.
Genomics ; 100(5): 271-6, 2012 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-22967795

RESUMEN

Sequencing data analysis remains limiting and problematic, especially for low complexity repeat sequences and transposon elements due to inherent sequencing errors and short sequence read lengths. We have developed a program, ReviSeq, which uses a hybrid method composed of iterative remapping and local assembly upon a bacterial sequence backbone. Application of this method to six Brucella suis field isolates compared to the newly revised B. suis 1330 reference genome identified on average 13, 15, 19 and 9 more variants per sample than STAMPY/SAMtools, BWA/SAMtools, iCORN and BWA/PINDEL pipelines, and excluded on average 4, 2, 3 and 19 variants per sample, respectively. In total, using this iterative approach, we identified on average 87 variants including SNVs, short INDELs and long INDELs per strain when compared to the reference. Our program outperforms other methods especially for long INDEL calling. The program is available at http://reviseq.sourceforge.net.


Asunto(s)
Brucella suis/genética , Técnicas Genéticas , Variación Genética , Genoma Bacteriano/genética , Programas Informáticos , Secuencia de Bases , Análisis por Conglomerados , Mutación INDEL/genética , Datos de Secuencia Molecular , Filogenia , Análisis de Secuencia de ADN/métodos
5.
J Bacteriol ; 194(4): 910, 2012 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-22275106

RESUMEN

Brucella suis is the causative agent of swine brucellosis and is known to be able to infect several different hosts, including cattle, dogs, and horses, without causing disease symptoms. Here we report the complete genome sequence of Brucella suis VBI22, which was isolated from raw milk from an infected cow.


Asunto(s)
Brucella suis/genética , Brucella suis/aislamiento & purificación , Genoma Bacteriano , Leche/microbiología , Animales , Secuencia de Bases , Brucelosis Bovina/microbiología , Bovinos , Datos de Secuencia Molecular , Análisis de Secuencia de ADN
6.
BMC Bioinformatics ; 13: 247, 2012 Sep 26.
Artículo en Inglés | MEDLINE | ID: mdl-23009593

RESUMEN

BACKGROUND: With the advent of next-generation sequencing (NGS) technologies, full cDNA shotgun sequencing has become a major approach in the study of transcriptomes, and several different protocols in 454 sequencing have been invented. As each protocol uses its own short DNA tags or adapters attached to the ends of cDNA fragments for labeling or sequencing, different contaminants may lead to mis-assembly and inaccurate sequence products. RESULTS: We have designed and implemented a new program for raw sequence cleaning in a graphical user interface and a batch script. The cleaning process consists of several modules including barcode trimming, sequencing adapter trimming, amplification primer trimming, poly-A tail trimming, vector screening and low quality region trimming. These modules can be combined based on various sequencing applications. CONCLUSIONS: ESTclean is a software package not only for cleaning cDNA sequences, but also for helping to develop sequencing protocols by providing summary tables and figures for sequencing quality control in a graphical user interface. It outperforms in cleaning read sequences from complicated sequencing protocols which use barcodes and multiple amplification primers.


Asunto(s)
Etiquetas de Secuencia Expresada , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Transcriptoma , Animales , Cartilla de ADN/genética , ADN Complementario/genética , Drosophila melanogaster/genética , Secuenciación de Nucleótidos de Alto Rendimiento
7.
Genes Chromosomes Cancer ; 50(4): 275-83, 2011 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-21319262

RESUMEN

Using a custom CGH-like oligonucleotide array to measure the global microsatellite content in the genomes of 72 cancer, cancer-free, and high risk patient and cell line samples (56 germline DNA and 16 in tumor or tumor cell line DNA) we found a unique, reproducible, and statistically significant pattern of 18 motif-specific microsatellite families (out of 962 possible 1-6 mer repeats) in breast cancer patient germline and tumor DNA, but not in germline DNA of cancer-free volunteer controls or in breast cancer patients with BRCA1/2 mutations. These high-similarity A/T rich repetitive motifs were also more pronounced in the germlines and tumors of colon cancer tumor patients (3/6 samples) and microsatellite unstable colon cancer cell lines; however, germline DNA of sporadic breast cancer patients exhibited the largest global content shift for those motifs with extreme AT/GC ratios. These results indicate that global microsatellite variability is complex, suggest the existence of a previously unknown genomic destabilization mechanism in breast cancer patients' germline DNA, and warrant further testing of such microsatellite variability as a predictor of future breast cancer development.


Asunto(s)
Secuencia Rica en At , Neoplasias de la Mama/genética , Inestabilidad de Microsatélites , Repeticiones de Microsatélite/genética , Línea Celular Tumoral , Neoplasias del Colon/genética , ADN de Neoplasias/genética , Femenino , Genes BRCA1 , Genes BRCA2 , Variación Genética , Humanos , Mutación , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos
8.
J Bacteriol ; 193(22): 6410, 2011 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-22038969

RESUMEN

Brucella suis is a causative agent of porcine brucellosis. We report the resequencing of the original sample upon which the published sequence of Brucella suis 1330 is based and describe the differences between the published assembly and our assembly at 12 loci.


Asunto(s)
Brucella suis/genética , Genoma Bacteriano , Secuencia de Bases , Anotación de Secuencia Molecular , Datos de Secuencia Molecular
9.
BMC Genomics ; 11: 703, 2010 Dec 14.
Artículo en Inglés | MEDLINE | ID: mdl-21156066

RESUMEN

BACKGROUND: Horned beetles, in particular in the genus Onthophagus, are important models for studies on sexual selection, biological radiations, the origin of novel traits, developmental plasticity, biocontrol, conservation, and forensic biology. Despite their growing prominence as models for studying both basic and applied questions in biology, little genomic or transcriptomic data are available for this genus. We used massively parallel pyrosequencing (Roche 454-FLX platform) to produce a comprehensive EST dataset for the horned beetle Onthophagus taurus. To maximize sequence diversity, we pooled RNA extracted from a normalized library encompassing diverse developmental stages and both sexes. RESULTS: We used 454 pyrosequencing to sequence ESTs from all post-embryonic stages of O. taurus. Approximately 1.36 million reads assembled into 50,080 non-redundant sequences encompassing a total of 26.5 Mbp. The non-redundant sequences match over half of the genes in Tribolium castaneum, the most closely related species with a sequenced genome. Analyses of Gene Ontology annotations and biochemical pathways indicate that the O. taurus sequences reflect a wide and representative sampling of biological functions and biochemical processes. An analysis of sequence polymorphisms revealed that SNP frequency was negatively related to overall expression level and the number of tissue types in which a given gene is expressed. The most variable genes were enriched for a limited number of GO annotations whereas the least variable genes were enriched for a wide range of GO terms directly related to fitness. CONCLUSIONS: This study provides the first large-scale EST database for horned beetles, a much-needed resource for advancing the study of these organisms. Furthermore, we identified instances of gene duplications and alternative splicing, useful for future study of gene regulation, and a large number of SNP markers that could be used in population-genetic studies of O. taurus and possibly other horned beetles.


Asunto(s)
Escarabajos/anatomía & histología , Escarabajos/genética , Genes de Insecto/genética , Cuernos , Empalme Alternativo/genética , Animales , Secuencia de Bases , Análisis por Conglomerados , Bases de Datos Genéticas , Bases de Datos de Proteínas , Redes y Vías Metabólicas/genética , Anotación de Secuencia Molecular , Filogenia , Polimorfismo de Nucleótido Simple/genética , Secuencias Repetitivas de Ácidos Nucleicos/genética , Análisis de Secuencia de ADN
10.
BMC Genomics ; 11: 694, 2010 Dec 07.
Artículo en Inglés | MEDLINE | ID: mdl-21138572

RESUMEN

BACKGROUND: The reptiles, characterized by both diversity and unique evolutionary adaptations, provide a comprehensive system for comparative studies of metabolism, physiology, and development. However, molecular resources for ectothermic reptiles are severely limited, hampering our ability to study the genetic basis for many evolutionarily important traits such as metabolic plasticity, extreme longevity, limblessness, venom, and freeze tolerance. Here we use massively parallel sequencing (454 GS-FLX Titanium) to generate a transcriptome of the western terrestrial garter snake (Thamnophis elegans) with two goals in mind. First, we develop a molecular resource for an ectothermic reptile; and second, we use these sex-specific transcriptomes to identify differences in the presence of expressed transcripts and potential genes of evolutionary interest. RESULTS: Using sex-specific pools of RNA (one pool for females, one pool for males) representing 7 tissue types and 35 diverse individuals, we produced 1.24 million sequence reads, which averaged 366 bp in length after cleaning. Assembly of the cleaned reads from both sexes with NEWBLER and MIRA resulted in 96,379 contigs containing 87% of the cleaned reads. Over 34% of these contigs and 13% of the singletons were annotated based on homology to previously identified proteins. From these homology assignments, additional clustering, and ORF predictions, we estimate that this transcriptome contains ~13,000 unique genes that were previously identified in other species and over 66,000 transcripts from unidentified protein-coding genes. Furthermore, we use a graph-clustering method to identify contigs linked by NEWBLER-split reads that represent divergent alleles, gene duplications, and alternatively spliced transcripts. Beyond gene identification, we identified 95,295 SNPs and 31,651 INDELs. From these sex-specific transcriptomes, we identified 190 genes that were only present in the mRNA sequenced from one of the sexes (84 female-specific, 106 male-specific), and many highly variable genes of evolutionary interest. CONCLUSIONS: This is the first large-scale, multi-organ transcriptome for an ectothermic reptile. This resource provides the most comprehensive set of EST sequences available for an individual ectothermic reptile species, increasing the number of snake ESTs 50-fold. We have identified genes that appear to be under evolutionary selection and those that are sex-specific. This resource will assist studies on gene expression and comparative genomics, and will facilitate the study of evolutionarily important traits at the molecular level.


Asunto(s)
Colubridae/genética , Perfilación de la Expresión Génica , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Caracteres Sexuales , Animales , Secuencia de Bases , Análisis por Conglomerados , Femenino , Regulación de la Expresión Génica , Genoma/genética , Lagartos/genética , Complejo Mayor de Histocompatibilidad/genética , Masculino , Anotación de Secuencia Molecular , Mutación/genética , Filogenia , ARN Mensajero/genética , ARN Mensajero/metabolismo , Análisis de Secuencia de ADN , Homología de Secuencia de Ácido Nucleico , Titanio
11.
Cancer Med ; 9(17): 6452-6460, 2020 09.
Artículo en Inglés | MEDLINE | ID: mdl-32644297

RESUMEN

Microsatellite instability (MSI) is a key secondary effect of a defective DNA mismatch repair mechanism resulting in incorrectly replicated microsatellites in many malignant tumors. Historically, MSI detection has been performed by fragment analysis (FA) on a panel of representative genomic markers. More recently, using next-generation sequencing (NGS) to analyze thousands of microsatellites has been shown to improve the robustness and sensitivity of MSI detection. However, NGS-based MSI tests can be prone to population biases if NGS results are aligned to a reference genome instead of patient-matched normal tissue. We observed an increased rate of false positives in patients of African ancestry with an NGS-based diagnostic for MSI status utilizing 7317 microsatellite loci. We then minimized this bias by training a modified calling model that utilized 2011 microsatellite loci. With these adjustments 100% (95% CI: 89.1% to 100%) of African ancestry patients in an independent validation test were called correctly using the updated model. This poses not only a significant technical improvement but also has an important clinical impact on directing immune checkpoint inhibitor therapy.


Asunto(s)
Reparación de la Incompatibilidad de ADN , Secuenciación de Nucleótidos de Alto Rendimiento , Inestabilidad de Microsatélites , Neoplasias/genética , Sesgo , Población Negra , Intervalos de Confianza , Proteínas de Unión al ADN/análisis , Reacciones Falso Positivas , Femenino , Marcadores Genéticos , Humanos , Masculino , Endonucleasa PMS2 de Reparación del Emparejamiento Incorrecto/análisis , Homólogo 1 de la Proteína MutL/análisis , Proteína 2 Homóloga a MutS/análisis , Reproducibilidad de los Resultados , Factores Sexuales
12.
J Immunother Cancer ; 8(1)2020 03.
Artículo en Inglés | MEDLINE | ID: mdl-32217756

RESUMEN

BACKGROUND: Tumor mutational burden (TMB), defined as the number of somatic mutations per megabase of interrogated genomic sequence, demonstrates predictive biomarker potential for the identification of patients with cancer most likely to respond to immune checkpoint inhibitors. TMB is optimally calculated by whole exome sequencing (WES), but next-generation sequencing targeted panels provide TMB estimates in a time-effective and cost-effective manner. However, differences in panel size and gene coverage, in addition to the underlying bioinformatics pipelines, are known drivers of variability in TMB estimates across laboratories. By directly comparing panel-based TMB estimates from participating laboratories, this study aims to characterize the theoretical variability of panel-based TMB estimates, and provides guidelines on TMB reporting, analytic validation requirements and reference standard alignment in order to maintain consistency of TMB estimation across platforms. METHODS: Eleven laboratories used WES data from The Cancer Genome Atlas Multi-Center Mutation calling in Multiple Cancers (MC3) samples and calculated TMB from the subset of the exome restricted to the genes covered by their targeted panel using their own bioinformatics pipeline (panel TMB). A reference TMB value was calculated from the entire exome using a uniform bioinformatics pipeline all members agreed on (WES TMB). Linear regression analyses were performed to investigate the relationship between WES and panel TMB for all 32 cancer types combined and separately. Variability in panel TMB values at various WES TMB values was also quantified using 95% prediction limits. RESULTS: Study results demonstrated that variability within and between panel TMB values increases as the WES TMB values increase. For each panel, prediction limits based on linear regression analyses that modeled panel TMB as a function of WES TMB were calculated and found to approximately capture the intended 95% of observed panel TMB values. Certain cancer types, such as uterine, bladder and colon cancers exhibited greater variability in panel TMB values, compared with lung and head and neck cancers. CONCLUSIONS: Increasing uptake of TMB as a predictive biomarker in the clinic creates an urgent need to bring stakeholders together to agree on the harmonization of key aspects of panel-based TMB estimation, such as the standardization of TMB reporting, standardization of analytical validation studies and the alignment of panel-based TMB values with a reference standard. These harmonization efforts should improve consistency and reliability of panel TMB estimates and aid in clinical decision-making.


Asunto(s)
Guías como Asunto/normas , Inhibidores de Puntos de Control Inmunológico/uso terapéutico , Carga Tumoral/genética , Simulación por Computador , Humanos , Inhibidores de Puntos de Control Inmunológico/farmacología , Mutación
13.
Bioprocess Biosyst Eng ; 32(6): 723-7, 2009 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-19205748

RESUMEN

Polyketides have diverse biological activities, including pharmacological functions such as antibiotic, antitumor and agrochemical properties. They are biosynthesized from short carboxylic acid precursors by polyketide synthases (PKSs). As natural polyketide products include many clinically important drugs and the volume of data on polyketides is rapidly increasing, the development of a database system to manage polyketide data is essential. MapsiDB is an integrated web database formulated to contain data on type I polyketides and their PKSs, including domain and module composition and related genome information. Data on polyketides were collected from journals and online resources and processed with analysis programs. Web interfaces were utilized to construct and to access this database, allowing polyketide researchers to add their data to this database and to use it easily.


Asunto(s)
Bases de Datos de Proteínas , Sintasas Poliquetidas/química , Sistemas de Administración de Bases de Datos , Bases de Datos Genéticas , Internet , Macrólidos/química , Macrólidos/clasificación , Macrólidos/metabolismo , Sintasas Poliquetidas/genética , Sintasas Poliquetidas/metabolismo , Interfaz Usuario-Computador
14.
J Microbiol Biotechnol ; 19(2): 140-6, 2009 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-19307762

RESUMEN

MAPSI (Management and Analysis for Polyketide Synthase Type I) has been developed to offer computational analysis methods to detect type I PKS (polyketide synthase) gene clusters in genome sequences. MAPSI provides a genome analysis component, which detects PKS gene clusters by identifying domains in proteins of a genome. MAPSI also contains databases on polyketides and genome annotation data, as well as analytic components such as new PKS assembly and domain analysis. The polyketide data and analysis component are accessible through Web interfaces and are displayed with diverse information. MAPSI, which was developed to aid researchers studying type I polyketides, provides diverse components to access and analyze polyketide information and should become a very powerful computational tool for polyketide research. The system can be extended through further studies of factors related to the biological activities of polyketides.


Asunto(s)
Bacteriemia/genética , Familia de Multigenes , Sintasas Poliquetidas/genética , Programas Informáticos , Algoritmos , Biología Computacional , Bases de Datos de Proteínas , Genoma Bacteriano , Cadenas de Markov , Alineación de Secuencia , Análisis de Secuencia de Proteína , Homología de Secuencia de Aminoácido , Interfaz Usuario-Computador
15.
JCO Precis Oncol ; 3: 1-13, 2019 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-35100709

RESUMEN

PURPOSE: Tumor mutational burden (TMB) is a developing biomarker in non-small-cell lung cancer (NSCLC). Little is known regarding differences between TMB and sample location, histology, or other biomarkers. METHODS: A total of 3,424 unmatched NSCLC samples, including 2,351 lung adenocarcinomas (LUADs) and 1,073 lung squamous cell carcinomas (LUSCs), underwent profiling, including next-generation sequencing of 592 cancer-related genes, programmed death ligand 1 immunohistochemistry, and TMB. The rate TMB of 10 mutations per megabase (Mb) or greater was compared between primary and metastatic LUAD and LUSC. Molecular alteration frequency was compared at a cutoff of 10 mutations/Mb. RESULTS: LUAD metastases were more likely to have a TMB of 10 mutations/Mb or greater compared with primary LUADs (38% v 25%; P < .001), and this difference was most pronounced with brain metastases (61% v 35% for other metastases; P < .001). The median TMB for LUAD brain metastases was 13 mutations/Mb compared with six mutations/Mb for primary LUADs. Variability existed for other LUAD metastasis sites, with adrenal metastases most likely to meet the cutoff of 10 mutations/Mb (51%) and bone metastases least likely to meet the cutoff (19%). TMB was more commonly 10 mutations/Mb or greater for LUSC primary tumors than for LUAD primary tumors (35% v 25%, respectively; P < .001). LUSC metastases were more likely to have a TMB of 10 mutations/Mb or greater than LUSC primary tumors. Poorly differentiated disease was more likely have a TMB of 10 mutations/Mb or greater when stratified by histology and primary tumor or metastasis. Site-specific molecular differences existed at this TMB cutoff including programmed death ligand 1 positivity and STK11 and KRAS mutation rate. CONCLUSION: TMB is a site-specific biomarker in NSCLC with important spatial and histologic differences. TMB is more frequently 10 mutations/Mb or greater in LUAD and LUSC metastases and highest in LUAD brain metastases. Along this TMB cutoff, clinically informative distinctions exist in other tumor profiling characteristics. Further investigation is needed to expand on these findings.

16.
BMC Bioinformatics ; 8: 327, 2007 Sep 03.
Artículo en Inglés | MEDLINE | ID: mdl-17764579

RESUMEN

BACKGROUND: Polyketides are secondary metabolites of microorganisms with diverse biological activities, including pharmacological functions such as antibiotic, antitumor and agrochemical properties. Polyketides are synthesized by serialized reactions of a set of enzymes called polyketide synthase(PKS)s, which coordinate the elongation of carbon skeletons by the stepwise condensation of short carbon precursors. Due to their importance as drugs, the volume of data on polyketides is rapidly increasing and creating a need for computational analysis methods for efficient polyketide research. Moreover, the increasing use of genetic engineering to research new kinds of polyketides requires genome wide analysis. RESULTS: We describe a system named ASMPKS (Analysis System for Modular Polyketide Synthesis) for computational analysis of PKSs against genome sequences. It also provides overall management of information on modular PKS, including polyketide database construction, new PKS assembly, and chain visualization. ASMPKS operates on a web interface to construct the database and to analyze PKSs, allowing polyketide researchers to add their data to this database and to use it easily. In addition, the ASMPKS can predict functional modules for a protein sequence submitted by users, estimate the chemical composition of a polyketide synthesized from the modules, and display the carbon chain structure on the web interface. CONCLUSION: ASMPKS has powerful computation features to aid modular PKS research. As various factors, such as starter units and post-processing, are related to polyketide biosynthesis, ASMPKS will be improved through further development for study of the factors.


Asunto(s)
Biología Computacional/métodos , Sintasas Poliquetidas/química , Sintasas Poliquetidas/genética , Algoritmos , Carbono/química , Dominio Catalítico , Computadores , Ingeniería Genética , Genoma Bacteriano , Genómica/métodos , Modelos Biológicos , Modelos Teóricos , Complejos Multienzimáticos/química , Programas Informáticos
17.
Sci Rep ; 6: 27722, 2016 06 09.
Artículo en Inglés | MEDLINE | ID: mdl-27278669

RESUMEN

The human genome is 99% complete. This study contributes to filling the 1% gap by enriching previously unknown repeat regions called microsatellites (MST). We devised a Global MST Enrichment (GME) kit to enrich and nextgen sequence 2 colorectal cell lines and 16 normal human samples to illustrate its utility in identifying contigs from reads that do not map to the genome reference. The analysis of these samples yielded 790 novel extra-referential concordant contigs that are observed in more than one sample. We searched for evidence of functional elements in the concordant contigs in two ways: (1) BLAST-ing each contig against normal RNA-Seq samples, (2) Checking for predicted functional elements using GlimmerHMM. Of the 790 concordant contigs, 37 had an exact match to at least one RNA-Seq read; 15 aligned to more than 100 RNA-Seq reads. Of the 249 concordant contigs predicted by GlimmerHMM to have functional elements, 6 had at least one exact RNA-Seq match. BLAST-ing these novel contigs against all publically available sequences confirmed that they were found in human and chimpanzee BAC and FOSMID clones sequenced as part of the original human genome project. These extra-referential contigs predominantly contained pentameric repeats, especially two motifs: AATGG and GTGGA.


Asunto(s)
Repeticiones de Microsatélite , Análisis de Secuencia de ADN/métodos , Análisis de Secuencia de ARN/métodos , Algoritmos , Animales , Línea Celular , Mapeo Contig , Genoma Humano , Genómica , Humanos , Pan troglodytes/genética
18.
Genome Biol Evol ; 8(5): 1482-8, 2016 05 30.
Artículo en Inglés | MEDLINE | ID: mdl-27189993

RESUMEN

The Hawaiian archipelago provides a natural arena for understanding adaptive radiation and speciation. The Hawaiian Drosophila are one of the most diverse endemic groups in Hawaiì with up to 1,000 species. We sequenced and analyzed entire genomes of recently diverged species of Hawaiian picture-winged Drosophila, Drosophila silvestris and Drosophila heteroneura from Hawaiì Island, in comparison with Drosophila planitibia, their sister species from Maui, a neighboring island where a common ancestor of all three had likely occurred. Genome-wide single nucleotide polymorphism patterns suggest the more recent origin of D. silvestris and D. heteroneura, as well as a pervasive influence of positive selection on divergence of the three species, with the signatures of positive selection more prominent in sympatry than allopatry. Positively selected genes were significantly enriched for functional terms related to sensory detection and mating, suggesting that sexual selection played an important role in speciation of these species. In particular, sequence variation in Olfactory receptor and Gustatory receptor genes seems to play a major role in adaptive radiation in Hawaiian pictured-winged Drosophila.


Asunto(s)
Drosophila/genética , Especiación Genética , Variación Genética , Genética de Población , Animales , Genoma de los Insectos , Hawaii , Secuenciación de Nucleótidos de Alto Rendimiento , Filogenia , Especificidad de la Especie
19.
PLoS One ; 9(11): e110263, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-25402475

RESUMEN

Microsatellites (MST), tandem repeats of 1-6 nucleotide motifs, are mutational hot-spots with a bias for insertions and deletions (INDELs) rather than single nucleotide polymorphisms (SNPs). The majority of MST instability studies are limited to a small number of loci, the Bethesda markers, which are only informative for a subset of colorectal cancers. In this paper we evaluate non-haplotype alleles present within next-gen sequencing data to evaluate somatic MST variation (SMV) within DNA repair proficient and DNA repair defective cell lines. We confirm that alleles present within next-gen data that do not contribute to the haplotype can be reliably quantified and utilized to evaluate the SMV without requiring comparisons of matched samples. We observed that SMV patterns found in DNA repair proficient cell lines without DNA repair defects, MCF10A, HEK293 and PD20 RV:D2, had consistent patterns among samples. Further, we were able to confirm that changes in SMV patterns in cell lines lacking functional BRCA2, FANCD2 and mismatch repair were consistent with the different pathways perturbed. Using this new exome sequencing analysis approach we show that DNA instability can be identified in a sample and that patterns of instability vary depending on the impaired DNA repair mechanism, and that genes harboring minor alleles are strongly associated with cancer pathways. The MST Minor Allele Caller used for this study is available at https://github.com/zalmanv/MST_minor_allele_caller.


Asunto(s)
Trastornos por Deficiencias en la Reparación del ADN/genética , Reparación del ADN , Exoma , Variación Genética , Repeticiones de Microsatélite , Alelos , Línea Celular , Cromosomas Humanos Par 1 , Femenino , Sitios Genéticos , Haplotipos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/normas , Humanos , Mutación INDEL , Masculino , Polimorfismo de Nucleótido Simple , Reproducibilidad de los Resultados
20.
Oncotarget ; 5(13): 4788-98, 2014 Jul 15.
Artículo en Inglés | MEDLINE | ID: mdl-24947164

RESUMEN

Although the connection between cancer and cigarette smoke is well established, nicotine is not characterized as a carcinogen. Here, we used exome sequencing to identify nicotine and oxidative stress-induced somatic mutations in normal human epithelial cells and its correlation with cancer. We identified over 6,400 SNVs, indels and microsatellites in each of the stress exposed cells relative to the control, of which, 2,159 were consistently observed at all nicotine doses. These included 429 nsSNVs including 158 novel and 79 cancer-associated. Over 80% of consistently nicotine induced variants overlap with variations detected in oxidative stressed cells, indicating that nicotine induced genomic alterations could be mediated through oxidative stress. Nicotine induced mutations were distributed across 1,585 genes, of which 49% were associated with cancer. MUC family genes were among the top mutated genes. Analysis of 591 lung carcinoma tumor exomes from The Cancer Genome Atlas (TCGA) revealed that 20% of non-small-cell lung cancer tumors in smokers have mutations in at least one of the MUC4, MUC6 or MUC12 genes in contrast to only 6% in non-smokers. These results indicate that nicotine induces genomic variations, promotes instability potentially mediated by oxidative stress, implicating nicotine in carcinogenesis, and establishes MUC genes as potential targets.


Asunto(s)
Exoma/genética , Peróxido de Hidrógeno/farmacología , Mutación/efectos de los fármacos , Neoplasias/genética , Nicotina/farmacología , Adenocarcinoma/genética , Secuencia de Bases , Carcinógenos/farmacología , Carcinoma de Pulmón de Células no Pequeñas , Carcinoma de Células Escamosas/genética , Línea Celular , Humanos , Mutación INDEL/efectos de los fármacos , Neoplasias Pulmonares/genética , Repeticiones de Microsatélite/efectos de los fármacos , Repeticiones de Microsatélite/genética , Mucina 2/genética , Mucina 4/genética , Mucinas/genética , Oxidantes/farmacología , Estrés Oxidativo , Análisis de Secuencia de ADN/métodos , Fumar
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA