Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
1.
BMC Med Inform Decis Mak ; 14 Suppl 1: S1, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25521230

RESUMO

To answer the need for the rigorous protection of biomedical data, we organized the Critical Assessment of Data Privacy and Protection initiative as a community effort to evaluate privacy-preserving dissemination techniques for biomedical data. We focused on the challenge of sharing aggregate human genomic data (e.g., allele frequencies) in a way that preserves the privacy of the data donors, without undermining the utility of genome-wide association studies (GWAS) or impeding their dissemination. Specifically, we designed two problems for disseminating the raw data and the analysis outcome, respectively, based on publicly available data from HapMap and from the Personal Genome Project. A total of six teams participated in the challenges. The final results were presented at a workshop of the iDASH (integrating Data for Analysis, 'anonymization,' and SHaring) National Center for Biomedical Computing. We report the results of the challenge and our findings about the current genome privacy protection techniques.


Assuntos
Privacidade Genética/normas , Estudo de Associação Genômica Ampla/normas , Disseminação de Informação , Humanos
2.
BMC Bioinformatics ; 15 Suppl 9: S8, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25253067

RESUMO

BACKGROUND: Metatranscriptomic sequencing is a highly sensitive bioassay of functional activity in a microbial community, providing complementary information to the metagenomic sequencing of the community. The acquisition of the metatranscriptomic sequences will enable us to refine the annotations of the metagenomes, and to study the gene activities and their regulation in complex microbial communities and their dynamics. RESULTS: In this paper, we present TransGeneScan, a software tool for finding genes in assembled transcripts from metatranscriptomic sequences. By incorporating several features of metatranscriptomic sequencing, including strand-specificity, short intergenic regions, and putative antisense transcripts into a Hidden Markov Model, TranGeneScan can predict a sense transcript containing one or multiple genes (in an operon) or an antisense transcript. CONCLUSION: We tested TransGeneScan on a mock metatranscriptomic data set containing three known bacterial genomes. The results showed that TranGeneScan performs better than metagenomic gene finders (MetaGeneMark and FragGeneScan) on predicting protein coding genes in assembled transcripts, and achieves comparable or even higher accuracy than gene finders for microbial genomes (Glimmer and GeneMark). These results imply, with the assistance of metatranscriptomic sequencing, we can obtain a broad and precise picture about the genes (and their functions) in a microbial community. AVAILABILITY: TransGeneScan is available as open-source software on SourceForge at https://sourceforge.net/projects/transgenescan/.


Assuntos
Bactérias/genética , Genes Bacterianos , Metagenômica/métodos , Transcriptoma , Genoma Bacteriano , Cadeias de Markov , Metagenoma , Modelos Genéticos , Óperon , RNA Antissenso/genética , Análise de Sequência de RNA/métodos , Software
3.
PLoS Comput Biol ; 9(3): e1002981, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23555216

RESUMO

Shotgun metagenomics has been applied to the studies of the functionality of various microbial communities. As a critical analysis step in these studies, biological pathways are reconstructed based on the genes predicted from metagenomic shotgun sequences. Pathway reconstruction provides insights into the functionality of a microbial community and can be used for comparing multiple microbial communities. The utilization of pathway reconstruction, however, can be jeopardized because of imperfect functional annotation of genes, and ambiguity in the assignment of predicted enzymes to biochemical reactions (e.g., some enzymes are involved in multiple biochemical reactions). Considering that metabolic functions in a microbial community are carried out by many enzymes in a collaborative manner, we present a probabilistic sampling approach to profiling functional content in a metagenomic dataset, by sampling functions of catalytically promiscuous enzymes within the context of the entire metabolic network defined by the annotated metagenome. We test our approach on metagenomic datasets from environmental and human-associated microbial communities. The results show that our approach provides a more accurate representation of the metabolic activities encoded in a metagenome, and thus improves the comparative analysis of multiple microbial communities. In addition, our approach reports likelihood scores of putative reactions, which can be used to identify important reactions and metabolic pathways that reflect the environmental adaptation of the microbial communities. Source code for sampling metabolic networks is available online at http://omics.informatics.indiana.edu/mg/MetaNetSam/.


Assuntos
Redes e Vias Metabólicas/genética , Metagenoma/genética , Metagenômica/métodos , Algoritmos , Análise por Conglomerados , Bases de Dados Genéticas , Microbiologia Ambiental , Humanos , Cadeias de Markov
4.
Proc Natl Acad Sci U S A ; 109(41): E2774-83, 2012 Oct 09.
Artigo em Inglês | MEDLINE | ID: mdl-22991466

RESUMO

Knowledge of the rate and nature of spontaneous mutation is fundamental to understanding evolutionary and molecular processes. In this report, we analyze spontaneous mutations accumulated over thousands of generations by wild-type Escherichia coli and a derivative defective in mismatch repair (MMR), the primary pathway for correcting replication errors. The major conclusions are (i) the mutation rate of a wild-type E. coli strain is ~1 × 10(-3) per genome per generation; (ii) mutations in the wild-type strain have the expected mutational bias for G:C > A:T mutations, but the bias changes to A:T > G:C mutations in the absence of MMR; (iii) during replication, A:T > G:C transitions preferentially occur with A templating the lagging strand and T templating the leading strand, whereas G:C > A:T transitions preferentially occur with C templating the lagging strand and G templating the leading strand; (iv) there is a strong bias for transition mutations to occur at 5'ApC3'/3'TpG5' sites (where bases 5'A and 3'T are mutated) and, to a lesser extent, at 5'GpC3'/3'CpG5' sites (where bases 5'G and 3'C are mutated); (v) although the rate of small (≤4 nt) insertions and deletions is high at repeat sequences, these events occur at only 1/10th the genomic rate of base-pair substitutions. MMR activity is genetically regulated, and bacteria isolated from nature often lack MMR capacity, suggesting that modulation of MMR can be adaptive. Thus, comparing results from the wild-type and MMR-defective strains may lead to a deeper understanding of factors that determine mutation rates and spectra, how these factors may differ among organisms, and how they may be shaped by environmental conditions.


Assuntos
Escherichia coli/genética , Genoma Bacteriano/genética , Mutação , Análise de Sequência de DNA/métodos , Adenosina Trifosfatases/genética , Sequência de Bases , Sítios de Ligação/genética , Metilação de DNA , Reparo de Erro de Pareamento de DNA/genética , Replicação do DNA/genética , DNA Bacteriano/química , DNA Bacteriano/genética , Proteínas de Escherichia coli/genética , Genes Bacterianos/genética , Mutação INDEL , Método de Monte Carlo , Proteínas MutL , Taxa de Mutação , Mutação Puntual , Polimorfismo de Nucleotídeo Único , Seleção Genética
5.
Nucleic Acids Res ; 37(21): e143, 2009 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-19762481

RESUMO

Computational methods for genome-wide identification of mobile genetic elements (MGEs) have become increasingly necessary for both genome annotation and evolutionary studies. Non-long terminal repeat (non-LTR) retrotransposons are a class of MGEs that have been found in most eukaryotic genomes, sometimes in extremely high numbers. In this article, we present a computational tool, MGEScan-non-LTR, for the identification of non-LTR retrotransposons in genomic sequences, following a computational approach inspired by a generalized hidden Markov model (GHMM). Three different states represent two different protein domains and inter-domain linker regions encoded in the non-LTR retrotransposons, and their scores are evaluated by using profile hidden Markov models (for protein domains) and Gaussian Bayes classifiers (for linker regions), respectively. In order to classify the non-LTR retrotransposons into one of the 12 previously characterized clades using the same model, we defined separate states for different clades. MGEScan-non-LTR was tested on the genome sequences of four eukaryotic organisms, Drosophila melanogaster, Daphnia pulex, Ciona intestinalis and Strongylocentrotus purpuratus. For the D. melanogaster genome, MGEScan-non-LTR found all known 'full-length' elements and simultaneously classified them into the clades CR1, I, Jockey, LOA and R1. Notably, for the D. pulex genome, in which no non-LTR retrotransposon has been annotated, MGEScan-non-LTR found a significantly larger number of elements than did RepeatMasker, using the current version of the RepBase Update library. We also identified novel elements in the other two genomes, which have only been partially studied for non-LTR retrotransposons.


Assuntos
Genômica/métodos , Retroelementos , Sequência de Aminoácidos , Animais , Ciona intestinalis/genética , Daphnia/classificação , Daphnia/genética , Drosophila melanogaster/genética , Cadeias de Markov , Dados de Sequência Molecular , Filogenia , Estrutura Terciária de Proteína , Homologia de Sequência de Aminoácidos , Strongylocentrotus purpuratus/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA