Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
1.
J Biomed Inform ; 54: 58-64, 2015 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-25625550

RESUMO

The recent exponential growth of genomic databases has resulted in the common task of sequence alignment becoming one of the major bottlenecks in the field of computational biology. It is typical for these large datasets and complex computations to require cost prohibitive High Performance Computing (HPC) to function. As such, parallelised solutions have been proposed but many exhibit scalability limitations and are incapable of effectively processing "Big Data" - the name attributed to datasets that are extremely large, complex and require rapid processing. The Hadoop framework, comprised of distributed storage and a parallelised programming framework known as MapReduce, is specifically designed to work with such datasets but it is not trivial to efficiently redesign and implement bioinformatics algorithms according to this paradigm. The parallelisation strategy of "divide and conquer" for alignment algorithms can be applied to both data sets and input query sequences. However, scalability is still an issue due to memory constraints or large databases, with very large database segmentation leading to additional performance decline. Herein, we present Hadoop Blast (HBlast), a parallelised BLAST algorithm that proposes a flexible method to partition both databases and input query sequences using "virtual partitioning". HBlast presents improved scalability over existing solutions and well balanced computational work load while keeping database segmentation and recompilation to a minimum. Enhanced BLAST search performance on cheap memory constrained hardware has significant implications for in field clinical diagnostic testing; enabling faster and more accurate identification of pathogenic DNA in human blood or tissue samples.


Assuntos
Bases de Dados Genéticas , Genômica/métodos , Alinhamento de Sequência/métodos , Software , Algoritmos , Metodologias Computacionais , Bases de Dados Factuais , Técnicas Genéticas , Humanos , Internet
2.
J Biomed Inform ; 46(5): 774-81, 2013 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-23872175

RESUMO

Since the completion of the Human Genome project at the turn of the Century, there has been an unprecedented proliferation of genomic sequence data. A consequence of this is that the medical discoveries of the future will largely depend on our ability to process and analyse large genomic data sets, which continue to expand as the cost of sequencing decreases. Herein, we provide an overview of cloud computing and big data technologies, and discuss how such expertise can be used to deal with biology's big data sets. In particular, big data technologies such as the Apache Hadoop project, which provides distributed and parallelised data processing and analysis of petabyte (PB) scale data sets will be discussed, together with an overview of the current usage of Hadoop within the bioinformatics community.


Assuntos
Projeto Genoma Humano , Internet , Humanos , Software
3.
Cancer Res ; 65(7): 2662-7, 2005 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-15805263

RESUMO

We have identified a t(8;9)(p21-23;p23-24) in seven male patients (mean age 50, range 32-74) with diverse hematologic malignancies and clinical outcomes: atypical chronic myeloid leukemia/chronic eosinophilic leukemia (n = 5), secondary acute myeloid leukemia (n = 1), and pre-B-cell acute lymphoblastic leukemia (n = 1). Initial fluorescence in situ hybridization studies of one patient indicated that the nonreceptor tyrosine kinase Janus-activated kinase 2 (JAK2) at 9p24 was disrupted. Rapid amplification of cDNA ends-PCR identified the 8p22 partner gene as human autoantigen pericentriolar material (PCM1), a gene encoding a large centrosomal protein with multiple coiled-coil domains. Reverse transcription-PCR and fluorescence in situ hybridization confirmed the fusion in this case and also identified PCM1-JAK2 in the six other t(8;9) patients. The breakpoints were variable in both genes, but in all cases the chimeric mRNA is predicted to encode a protein that retains several of the predicted coiled-coil domains from PCM1 and the entire tyrosine kinase domain of JAK2. Reciprocal JAK2-PCM1 mRNA was not detected in any patient. We conclude that human autoantigen pericentriolar material (PCM1)-JAK2 is a novel, recurrent fusion gene in hematologic malignancies. Patients with PCM1-JAK2 disease are attractive candidates for targeted signal transduction therapy.


Assuntos
Proteínas de Ciclo Celular/genética , Cromossomos Humanos Par 8/genética , Cromossomos Humanos Par 9/genética , Leucemia Mielogênica Crônica BCR-ABL Positiva/genética , Leucemia Mieloide/genética , Proteínas de Fusão Oncogênica/genética , Proteínas Tirosina Quinases/genética , Proteínas Proto-Oncogênicas/genética , Translocação Genética , Doença Aguda , Adulto , Idoso , Sequência de Aminoácidos , Autoantígenos , Sequência de Bases , Humanos , Janus Quinase 2 , Masculino , Pessoa de Meia-Idade , Dados de Sequência Molecular , Reação em Cadeia da Polimerase Via Transcriptase Reversa
4.
PLoS One ; 11(2): e0148028, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-26849217

RESUMO

Rapid advancements in sequencing technologies along with falling costs present widespread opportunities for microbiome studies across a vast and diverse array of environments. These impressive technological developments have been accompanied by a considerable growth in the number of methodological variables, including sampling, storage, DNA extraction, primer pairs, sequencing technology, chemistry version, read length, insert size, and analysis pipelines, amongst others. This increase in variability threatens to compromise both the reproducibility and the comparability of studies conducted. Here we perform the first reported study comparing both amplicon and shotgun sequencing for the three leading next-generation sequencing technologies. These were applied to six human stool samples using Illumina HiSeq, MiSeq and Ion PGM shotgun sequencing, as well as amplicon sequencing across two variable 16S rRNA gene regions. Notably, we found that the factor responsible for the greatest variance in microbiota composition was the chosen methodology rather than the natural inter-individual variance, which is commonly one of the most significant drivers in microbiome studies. Amplicon sequencing suffered from this to a large extent, and this issue was particularly apparent when the 16S rRNA V1-V2 region amplicons were sequenced with MiSeq. Somewhat surprisingly, the choice of taxonomic binning software for shotgun sequences proved to be of crucial importance with even greater discriminatory power than sequencing technology and choice of amplicon. Optimal N50 assembly values for the HiSeq was obtained for 10 million reads per sample, whereas the applied MiSeq and PGM sequencing depths proved less sufficient for shotgun sequencing of stool samples. The latter technologies, on the other hand, provide a better basis for functional gene categorisation, possibly due to their longer read lengths. Hence, in addition to highlighting methodological biases, this study demonstrates the risks associated with comparing data generated using different strategies. We also recommend that laboratories with particular interests in certain microbes should optimise their protocols to accurately detect these taxa using different techniques.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Metagenômica , Microbiota/genética , Idoso , Primers do DNA/genética , Fezes/microbiologia , Humanos , RNA Ribossômico 16S/genética , Análise de Sequência de RNA
5.
Artif DNA PNA XNA ; 4(2): 37-8, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23912716

RESUMO

The application of ex vivo synthetic DNA as a high capacity information storage medium is well documented. Herein, we consider the potential for synthetic DNA to be incorporated as part of the human genome; providing a definitive, accessible, in vivo database of patient history.


Assuntos
DNA/genética , Armazenamento e Recuperação da Informação/métodos , Cromossomos Artificiais/genética , Genética Médica , Humanos
6.
Bioengineered ; 4(3): 123-5, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23514938

RESUMO

With world wide data predicted to exceed 40 trillion gigabytes by 2020, big data storage is a very real and escalating problem. Herein, we discuss the utility of synthetic DNA as a robust and eco-friendly archival data storage solution of the future.


Assuntos
Biologia Computacional/instrumentação , DNA/síntese química , Bases de Dados de Ácidos Nucleicos/instrumentação , Armazenamento e Recuperação da Informação/métodos , Biologia Computacional/tendências , DNA/genética , Armazenamento e Recuperação da Informação/tendências , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA