RESUMO
BACKGROUND: The emergence of high throughput technologies that produce vast amounts of genomic data, such as next-generation sequencing (NGS) is transforming biological research. The dramatic increase in the volume of data, the variety and continuous change of data processing tools, algorithms and databases make analysis the main bottleneck for scientific discovery. The processing of high throughput datasets typically involves many different computational programs, each of which performs a specific step in a pipeline. Given the wide range of applications and organizational infrastructures, there is a great need for highly parallel, flexible, portable, and reproducible data processing frameworks. Several platforms currently exist for the design and execution of complex pipelines. Unfortunately, current platforms lack the necessary combination of parallelism, portability, flexibility and/or reproducibility that are required by the current research environment. To address these shortcomings, workflow frameworks that provide a platform to develop and share portable pipelines have recently arisen. We complement these new platforms by providing a graphical user interface to create, maintain, and execute complex pipelines. Such a platform will simplify robust and reproducible workflow creation for non-technical users as well as provide a robust platform to maintain pipelines for large organizations. RESULTS: To simplify development, maintenance, and execution of complex pipelines we created DolphinNext. DolphinNext facilitates building and deployment of complex pipelines using a modular approach implemented in a graphical interface that relies on the powerful Nextflow workflow framework by providing 1. A drag and drop user interface that visualizes pipelines and allows users to create pipelines without familiarity in underlying programming languages. 2. Modules to execute and monitor pipelines in distributed computing environments such as high-performance clusters and/or cloud 3. Reproducible pipelines with version tracking and stand-alone versions that can be run independently. 4. Modular process design with process revisioning support to increase reusability and pipeline development efficiency. 5. Pipeline sharing with GitHub and automated testing 6. Extensive reports with R-markdown and shiny support for interactive data visualization and analysis. CONCLUSION: DolphinNext is a flexible, intuitive, web-based data processing and analysis platform that enables creating, deploying, sharing, and executing complex Nextflow pipelines with extensive revisioning and interactive reporting to enhance reproducible results.
Assuntos
Sequenciamento de Cromatina por Imunoprecipitação , Genômica/métodos , RNA-Seq , Software , Algoritmos , Bases de Dados Factuais , Linguagens de Programação , Reprodutibilidade dos Testes , Interface Usuário-Computador , Fluxo de TrabalhoRESUMO
BACKGROUND: Targeted Next Generation Sequencing (NGS) assays are cost-efficient and reliable alternatives to Sanger sequencing. For sequencing of very large set of genes, the target enrichment approach is suitable. However, for smaller genomic regions, the target amplification method is more efficient than both the target enrichment method and Sanger sequencing. The major difficulty of the target amplification method is the preparation of amplicons, regarding required time, equipment, and labor. Multiplex PCR (MPCR) is a good solution for the mentioned problems. RESULTS: We propose a novel method to design MPCR primers for a continuous genomic region, following the best practices of clinically reliable PCR design processes. On an experimental setup with 48 different combinations of factors, we have shown that multiple parameters might effect finding the first feasible solution. Increasing the length of the initial primer candidate selection sequence gives better results whereas waiting for a longer time to find the first feasible solution does not have a significant impact. CONCLUSIONS: We generated MPCR primer designs for the HBB whole gene, MEFV coding regions, and human exons between 2000 bp to 2100 bp-long. Our benchmarking experiments show that the proposed MPCR approach is able produce reliable NGS assay primers for a given sequence in a reasonable amount of time.
Assuntos
Algoritmos , Primers do DNA/metabolismo , DNA/metabolismo , Reação em Cadeia da Polimerase Multiplex/métodos , DNA/química , Primers do DNA/química , Éxons , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Pirina/química , Pirina/genéticaRESUMO
BACKGROUND: Klippel-Feil syndrome (KFS) is characterized by the developmental failure of the cervical spine and has two dominantly inherited subtypes. Affected individuals who are the children of a consanguineous marriage are extremely rare in the medical literature, but the gene responsible for this recessive trait subtype of KFS has recently been reported. RESULTS: We identified a family with the KFS phenotype in which their parents have a consanguineous marriage. Radiological examinations revealed that they carry fusion defects and numerical abnormalities in the cervical spine, scoliosis, malformations of the cranial base, and Sprengel's deformity. We applied whole genome linkage and whole-exome sequencing analysis to identify the chromosomal locus and gene mutated in this family. Whole genome linkage analysis revealed a significant linkage to chromosome 17q12-q33 with a LOD score of 4.2. Exome sequencing identified the G > A p.Q84X mutation in the MEOX1 gene, which is segregated based on pedigree status. Homozygous MEOX1 mutations have reportedly caused a similar phenotype in knockout mice. CONCLUSIONS: Here, we report a truncating mutation in the MEOX1 gene in a KFS family with an autosomal recessive trait. Together with another recently reported study and the knockout mouse model, our results suggest that mutations in MEOX1 cause a recessive KFS phenotype in humans.
Assuntos
Síndrome de Klippel-Feil/genética , Fatores de Transcrição/genética , Adulto , Animais , Cromossomos Humanos Par 17 , Feminino , Ligação Genética , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala , Proteínas de Homeodomínio , Homozigoto , Humanos , Síndrome de Klippel-Feil/diagnóstico por imagem , Escore Lod , Masculino , Camundongos , Linhagem , Fenótipo , Polimorfismo de Nucleotídeo Único , Coluna Vertebral/anormalidades , Tomografia Computadorizada por Raios XRESUMO
Nuclease-directed genome editing is a powerful tool for investigating physiology and has great promise as a therapeutic approach to correct mutations that cause disease. In its most precise form, genome editing can use cellular homology-directed repair (HDR) pathways to insert information from an exogenously supplied DNA-repair template (donor) directly into a targeted genomic location. Unfortunately, particularly for long insertions, toxicity and delivery considerations associated with repair template DNA can limit HDR efficacy. Here, we explore chemical modifications to both double-stranded and single-stranded DNA-repair templates. We describe 5'-terminal modifications, including in its simplest form the incorporation of triethylene glycol (TEG) moieties, that consistently increase the frequency of precision editing in the germlines of three animal models (Caenorhabditis elegans, zebrafish, mice) and in cultured human cells.
Assuntos
Caenorhabditis elegans/genética , Reparo do DNA , DNA de Cadeia Simples/genética , DNA/genética , Edição de Genes/métodos , Camundongos/genética , Peixe-Zebra/genética , Animais , Células HEK293 , Humanos , Células K562RESUMO
BACKGROUND: Accuracy in the diagnosis of breast cancer and classification of cancer subtypes has improved over the years with the development of well-established immunohistopathological criteria. More recently, diagnostic gene-sets at the mRNA expression level have been tested as better predictors of disease state. However, breast cancer is heterogeneous in nature; thus extraction of differentially expressed gene-sets that stably distinguish normal tissue from various pathologies poses challenges. Meta-analysis of high-throughput expression data using a collection of statistical methodologies leads to the identification of robust tumor gene expression signatures. METHODS: A resampling-based meta-analysis strategy, which involves the use of resampling and application of distribution statistics in combination to assess the degree of significance in differential expression between sample classes, was developed. Two independent microarray datasets that contain normal breast, invasive ductal carcinoma (IDC), and invasive lobular carcinoma (ILC) samples were used for the meta-analysis. Expression of the genes, selected from the gene list for classification of normal breast samples and breast tumors encompassing both the ILC and IDC subtypes were tested on 10 independent primary IDC samples and matched non-tumor controls by real-time qRT-PCR. Other existing breast cancer microarray datasets were used in support of the resampling-based meta-analysis. RESULTS: The two independent microarray studies were found to be comparable, although differing in their experimental methodologies (Pearson correlation coefficient, R = 0.9389 and R = 0.8465 for ductal and lobular samples, respectively). The resampling-based meta-analysis has led to the identification of a highly stable set of genes for classification of normal breast samples and breast tumors encompassing both the ILC and IDC subtypes. The expression results of the selected genes obtained through real-time qRT-PCR supported the meta-analysis results. CONCLUSION: The proposed meta-analysis approach has the ability to detect a set of differentially expressed genes with the least amount of within-group variability, thus providing highly stable gene lists for class prediction. Increased statistical power and stringent filtering criteria used in the present study also make identification of novel candidate genes possible and may provide further insight to improve our understanding of breast cancer development.