Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 60
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
País de afiliación
Intervalo de año de publicación
1.
Brief Bioinform ; 22(1): 308-314, 2021 01 18.
Artículo en Inglés | MEDLINE | ID: mdl-32008042

RESUMEN

The use of machine learning (ML) has become prevalent in the genome engineering space, with applications ranging from predicting target site efficiency to forecasting the outcome of repair events. However, jargon and ML-specific accuracy measures have made it hard to assess the validity of individual approaches, potentially leading to misinterpretation of ML results. This review aims to close the gap by discussing ML approaches and pitfalls in the context of CRISPR gene-editing applications. Specifically, we address common considerations, such as algorithm choice, as well as problems, such as overestimating accuracy and data interoperability, by providing tangible examples from the genome-engineering domain. Equipping researchers with the knowledge to effectively use ML to better design gene-editing experiments and predict experimental outcomes will help advance the field more rapidly.


Asunto(s)
Sistemas CRISPR-Cas , Edición Génica/métodos , Aprendizaje Automático , Animales , Edición Génica/normas , Genómica/métodos , Genómica/normas , Humanos
2.
Prenat Diagn ; 43(1): 109-116, 2023 01.
Artículo en Inglés | MEDLINE | ID: mdl-36484552

RESUMEN

OBJECTIVE: European and Australian guidelines for cystic fibrosis (CF) reproductive carrier screening recommend testing a small number of high frequency CF causing variants, rather than comprehensive CFTR sequencing. The study objective was to determine variant detection rates of commercially available targeted reproductive carrier screening tests in Australia. METHODS: Next-generation DNA sequencing of the CFTR gene was performed on 2552 individuals from a whole population sample to identify CF causing variants. The variant detection rates of two commercially available Australian reproductive carrier screening tests, which target 50 or 175 CF causing variants, in this population were calculated. The ethnicity of individuals was determined using principal component analysis. RESULTS: Variant detection rates of the tests for 50 and 175 CF causing variants were 88.2% and 90.8%, respectively. No CF causing variants in individuals of East Asian ethnicity (n = 3) were detected by either test, while >86.6% (n = 69) of CF causing variants in Europeans would be identified by either test. CONCLUSIONS: Reproductive carrier screening tests for a targeted set of high frequency CF variants are unable to detect approximately 10% of CF variants in a multiethnic Australian population, and individuals of East Asian ethnicity are disproportionally affected by this test limitation.


Asunto(s)
Fibrosis Quística , Humanos , Fibrosis Quística/diagnóstico , Fibrosis Quística/epidemiología , Fibrosis Quística/genética , Regulador de Conductancia de Transmembrana de Fibrosis Quística/genética , Australia/epidemiología , Pruebas Genéticas , Etnicidad , Mutación
3.
Nucleic Acids Res ; 49(18): 10785-10795, 2021 10 11.
Artículo en Inglés | MEDLINE | ID: mdl-34534334

RESUMEN

Precise genomic modification using prime editing (PE) holds enormous potential for research and clinical applications. In this study, we generated all-in-one prime editing (PEA1) constructs that carry all the components required for PE, along with a selection marker. We tested these constructs (with selection) in HEK293T, K562, HeLa and mouse embryonic stem (ES) cells. We discovered that PE efficiency in HEK293T cells was much higher than previously observed, reaching up to 95% (mean 67%). The efficiency in K562 and HeLa cells, however, remained low. To improve PE efficiency in K562 and HeLa, we generated a nuclease prime editor and tested this system in these cell lines as well as mouse ES cells. PE-nuclease greatly increased prime editing initiation, however, installation of the intended edits was often accompanied by extra insertions derived from the repair template. Finally, we show that zygotic injection of the nuclease prime editor can generate correct modifications in mouse fetuses with up to 100% efficiency.


Asunto(s)
Proteína 9 Asociada a CRISPR , Edición Génica , Animales , Proteína 9 Asociada a CRISPR/genética , Células Cultivadas , Células Madre Embrionarias/metabolismo , Células HEK293 , Células HeLa , Humanos , Células K562 , Ratones , Plásmidos/genética , Cigoto
4.
Brief Bioinform ; 21(6): 1920-1936, 2020 12 01.
Artículo en Inglés | MEDLINE | ID: mdl-31774481

RESUMEN

Oncogenesis and cancer can arise as a consequence of a wide range of genomic aberrations including mutations, copy number alterations, expression changes and epigenetic modifications encompassing multiple omics layers. Integrating genomic, transcriptomic, proteomic and epigenomic datasets via multi-omics analysis provides the opportunity to derive a deeper and holistic understanding of the development and progression of cancer. There are two primary approaches to integrating multi-omics data: multi-staged (focused on identifying genes driving cancer) and meta-dimensional (focused on establishing clinically relevant tumour or sample classifications). A number of ready-to-use bioinformatics tools are available to perform both multi-staged and meta-dimensional integration of multi-omics data. In this study, we compared nine different integration tools using real and simulated cancer datasets. The performance of the multi-staged integration tools were assessed at the gene, function and pathway levels, while meta-dimensional integration tools were assessed based on the sample classification performance. Additionally, we discuss the influence of factors such as data representation, sample size, signal and noise on multi-omics data integration. Our results provide current and much needed guidance regarding selection and use of the most appropriate and best performing multi-omics integration tools.


Asunto(s)
Biología Computacional , Genómica , Neoplasias , Proteómica , Biología Computacional/métodos , Variaciones en el Número de Copia de ADN , Epigenómica , Perfilación de la Expresión Génica , Genómica/métodos , Humanos , Neoplasias/genética , Oncogenes , Transcriptoma
5.
J Med Genet ; 2020 May 14.
Artículo en Inglés | MEDLINE | ID: mdl-32409511

RESUMEN

BACKGROUND: Amyotrophic lateral sclerosis (ALS) is a fatal neurodegenerative disease with phenotypic and genetic heterogeneity. Approximately 10% of cases are familial, while remaining cases are classified as sporadic. To date, >30 genes and several hundred genetic variants have been implicated in ALS. METHODS: Seven hundred and fifty-seven sporadic ALS cases were recruited from Australian neurology clinics. Detailed clinical data and whole genome sequencing (WGS) data were available from 567 and 616 cases, respectively, of which 426 cases had both datasets available. As part of a comprehensive genetic analysis, 853 genetic variants previously reported as ALS-linked mutations or disease-associated alleles were interrogated in sporadic ALS WGS data. Statistical analyses were performed to identify correlation between clinical variables, and between phenotype and the number of ALS-implicated variants carried by an individual. Relatedness between individuals carrying identical variants was assessed using identity-by-descent analysis. RESULTS: Forty-three ALS-implicated variants from 18 genes, including C9orf72, ATXN2, TARDBP, SOD1, SQSTM1 and SETX, were identified in Australian sporadic ALS cases. One-third of cases carried at least one variant and 6.82% carried two or more variants, implicating a potential oligogenic or polygenic basis of ALS. Relatedness was detected between two sporadic ALS cases carrying a SOD1 p.I114T mutation, and among three cases carrying a SQSTM1 p.K238E mutation. Oligogenic/polygenic sporadic ALS cases showed earlier age of onset than those with no reported variant. CONCLUSION: We confirm phenotypic associations among ALS cases, and highlight the contribution of genetic variation to all forms of ALS.

6.
Brief Bioinform ; 19(2): 179-187, 2018 03 01.
Artículo en Inglés | MEDLINE | ID: mdl-27802932

RESUMEN

Motivation: Despite being essential for numerous clinical and research applications, high-resolution human leukocyte antigen (HLA) typing remains challenging and laboratory tests are also time-consuming and labour intensive. With next-generation sequencing data becoming widely accessible, on-demand in silico HLA typing offers an economical and efficient alternative. Results: In this study we evaluate the HLA typing accuracy and efficiency of five computational HLA typing methods by comparing their predictions against a curated set of > 1000 published polymerase chain reaction-derived HLA genotypes on three different data sets (whole genome sequencing, whole exome sequencing and transcriptomic sequencing data). The highest accuracy at clinically relevant resolution (four digits) we observe is 81% on RNAseq data by PHLAT and 99% accuracy by OptiType when limited to Class I genes only. We also observed variability between the tools for resource consumption, with runtime ranging from an average of 5 h (HLAminer) to 7 min (seq2HLA) and memory from 12.8 GB (HLA-VBSeq) to 0.46 GB (HLAminer) per sample. While a minimal coverage is required, other factors also determine prediction accuracy and the results between tools do not correlate well. Therefore, by combining tools, there is the potential to develop a highly accurate ensemble method that is able to deliver fast, economical HLA typing from existing sequencing data.


Asunto(s)
Algoritmos , Antígenos HLA/genética , Prueba de Histocompatibilidad/métodos , Análisis de Secuencia de ADN/métodos , Biología Computacional/métodos , Exoma , Genotipo , Humanos
7.
Genome Res ; 26(6): 719-31, 2016 06.
Artículo en Inglés | MEDLINE | ID: mdl-27053337

RESUMEN

A three-dimensional chromatin state underpins the structural and functional basis of the genome by bringing regulatory elements and genes into close spatial proximity to ensure proper, cell-type-specific gene expression profiles. Here, we performed Hi-C chromosome conformation capture sequencing to investigate how three-dimensional chromatin organization is disrupted in the context of copy-number variation, long-range epigenetic remodeling, and atypical gene expression programs in prostate cancer. We find that cancer cells retain the ability to segment their genomes into megabase-sized topologically associated domains (TADs); however, these domains are generally smaller due to establishment of additional domain boundaries. Interestingly, a large proportion of the new cancer-specific domain boundaries occur at regions that display copy-number variation. Notably, a common deletion on 17p13.1 in prostate cancer spanning the TP53 tumor suppressor locus results in bifurcation of a single TAD into two distinct smaller TADs. Change in domain structure is also accompanied by novel cancer-specific chromatin interactions within the TADs that are enriched at regulatory elements such as enhancers, promoters, and insulators, and associated with alterations in gene expression. We also show that differential chromatin interactions across regulatory regions occur within long-range epigenetically activated or silenced regions of concordant gene activation or repression in prostate cancer. Finally, we present a novel visualization tool that enables integrated exploration of Hi-C interaction data, the transcriptome, and epigenome. This study provides new insights into the relationship between long-range epigenetic and genomic dysregulation and changes in higher-order chromatin interactions in cancer.


Asunto(s)
Cromatina/genética , Epigénesis Genética , Neoplasias/genética , Factor de Unión a CCCTC , Línea Celular Tumoral , Elementos de Facilitación Genéticos , Regulación Neoplásica de la Expresión Génica , Genoma Humano , Histonas/metabolismo , Humanos , Anotación de Secuencia Molecular , Neoplasias/metabolismo , Unión Proteica , Procesamiento Proteico-Postraduccional , Proteínas Represoras/fisiología
8.
BMC Biotechnol ; 19(1): 40, 2019 06 27.
Artículo en Inglés | MEDLINE | ID: mdl-31248401

RESUMEN

BACKGROUND: Natural variations in a genome can drastically alter the CRISPR-Cas9 off-target landscape by creating or removing sites. Despite the resulting potential side-effects from such unaccounted for sites, current off-target detection pipelines are not equipped to include variant information. To address this, we developed VARiant-aware detection and SCoring of Off-Targets (VARSCOT). RESULTS: VARSCOT identifies only 0.6% of off-targets to be common between 4 individual genomes and the reference, with an average of 82% of off-targets unique to an individual. VARSCOT is the most sensitive detection method for off-targets, finding 40 to 70% more experimentally verified off-targets compared to other popular software tools and its machine learning model allows for CRISPR-Cas9 concentration aware off-target activity scoring. CONCLUSIONS: VARSCOT allows researchers to take genomic variation into account when designing individual or population-wide targeting strategies. VARSCOT is available from https://github.com/BauerLab/VARSCOT .


Asunto(s)
Sistemas CRISPR-Cas , Biología Computacional/métodos , Edición Génica/métodos , Marcación de Gen/métodos , Genómica/métodos , Programas Informáticos , Edición Génica/normas , Marcación de Gen/normas , Genómica/normas , Internet , Reproducibilidad de los Resultados
9.
Blood ; 128(9): 1290-301, 2016 09 01.
Artículo en Inglés | MEDLINE | ID: mdl-27465915

RESUMEN

The factors that determine red blood cell (RBC) lifespan and the rate of RBC aging have not been fully elucidated. In several genetic conditions, including sickle cell disease, thalassemia, and G6PD deficiency, erythrocyte lifespan is significantly shortened. Many of these diseases are also associated with protection from severe malaria, suggesting a role for accelerated RBC senescence and clearance in malaria resistance. Here, we report a novel, N-ethyl-N-nitrosourea-induced mutation that causes a gain of function in adenosine 5'-monophosphate deaminase (AMPD3). Mice carrying the mutation exhibit rapid RBC turnover, with increased erythropoiesis, dramatically shortened RBC lifespan, and signs of increased RBC senescence/eryptosis, suggesting a key role for AMPD3 in determining RBC half-life. Mice were also found to be resistant to infection with the rodent malaria Plasmodium chabaudi. We propose that resistance to P. chabaudi is mediated by increased RBC turnover and higher rates of erythropoiesis during infection.


Asunto(s)
AMP Desaminasa , Eritrocitos/inmunología , Inmunidad Innata , Malaria , Mutación , Plasmodium chabaudi/inmunología , AMP Desaminasa/genética , AMP Desaminasa/inmunología , Animales , Senescencia Celular/genética , Senescencia Celular/inmunología , Eritrocitos/parasitología , Eritropoyesis/genética , Eritropoyesis/inmunología , Etilnitrosourea/toxicidad , Semivida , Malaria/genética , Malaria/inmunología , Masculino , Ratones
10.
BMC Genomics ; 16: 866, 2015 Oct 26.
Artículo en Inglés | MEDLINE | ID: mdl-26503232

RESUMEN

BACKGROUND: N-ethyl-N-nitrosourea (ENU) mutagen has become the method of choice for inducing random mutations for forward genetics applications. However, distinguishing induced mutations from sequencing errors or sporadic mutations is difficult, which has hampered surveys of potential biases in the methodology in the past. Addressing this issue, we created a large cohort of mice with biological replicates enabling the confident calling of induced mutations, which in turn allowed us to conduct a comprehensive analysis of potential biases in mutation properties and genomic location. RESULTS: In the exome sequencing data we observe the known preference of ENU to cause A:T=>G:C transitions in longer genes. Mutations were frequently clustered and inherited in blocks hampering attempts to pinpoint individual causative mutations by genome analysis only. Furthermore, ENU mutations were biased towards areas in the genome that are accessible in testis, potentially limiting the scope of forward genetic approaches to only 1-10% of the genome. CONCLUSION: ENU provides a powerful tool for exploring the genome-phenome relationship, however forward genetic applications that require the mutation to be passed on through the germ line may be limited to explore only genes that are accessible in testis.


Asunto(s)
Etilnitrosourea/toxicidad , Mutágenos/toxicidad , Mutación/genética , Animales , Exoma/efectos de los fármacos , Exoma/genética , Estudio de Asociación del Genoma Completo , Masculino , Ratones , Mutagénesis/efectos de los fármacos , Mutagénesis/genética , Testículo/efectos de los fármacos , Testículo/metabolismo
11.
BMC Genomics ; 16: 1052, 2015 Dec 10.
Artículo en Inglés | MEDLINE | ID: mdl-26651996

RESUMEN

BACKGROUND: Genomic information is increasingly used in medical practice giving rise to the need for efficient analysis methodology able to cope with thousands of individuals and millions of variants. The widely used Hadoop MapReduce architecture and associated machine learning library, Mahout, provide the means for tackling computationally challenging tasks. However, many genomic analyses do not fit the Map-Reduce paradigm. We therefore utilise the recently developed SPARK engine, along with its associated machine learning library, MLlib, which offers more flexibility in the parallelisation of population-scale bioinformatics tasks. The resulting tool, VARIANTSPARK provides an interface from MLlib to the standard variant format (VCF), offers seamless genome-wide sampling of variants and provides a pipeline for visualising results. RESULTS: To demonstrate the capabilities of VARIANTSPARK, we clustered more than 3,000 individuals with 80 Million variants each to determine the population structure in the dataset. VARIANTSPARK is 80 % faster than the SPARK-based genome clustering approach, ADAM, the comparable implementation using Hadoop/Mahout, as well as ADMIXTURE, a commonly used tool for determining individual ancestries. It is over 90 % faster than traditional implementations using R and Python. CONCLUSION: The benefits of speed, resource consumption and scalability enables VARIANTSPARK to open up the usage of advanced, efficient machine learning algorithms to genomic data.


Asunto(s)
Biología Computacional/métodos , Genotipo , Algoritmos , Análisis por Conglomerados , Humanos , Polimorfismo de Nucleótido Simple , Programas Informáticos
12.
Genome Res ; 22(7): 1372-81, 2012 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-22550012

RESUMEN

Double-stranded DNA is able to form triple-helical structures by accommodating a third nucleotide strand in its major groove. This sequence-specific process offers a potent mechanism for targeting genomic loci of interest that is of great value for biotechnological and gene-therapeutic applications. It is likely that nature has leveraged this addressing system for gene regulation, because computational studies have uncovered an abundance of putative triplex target sites in various genomes, with enrichment particularly in gene promoters. However, to draw a more complete picture of the in vivo role of triplexes, not only the putative targets but also the sequences acting as the third strand and their capability to pair with the predicted target sites need to be studied. Here we present Triplexator, the first computational framework that integrates all aspects of triplex formation, and showcase its potential by discussing research examples for which the different aspects of triplex formation are important. We find that chromatin-associated RNAs have a significantly higher fraction of sequence features able to form triplexes than expected at random, suggesting their involvement in gene regulation. We furthermore identify hundreds of human genes that contain sequence features in their promoter predicted to be able to form a triplex with a target within the same promoter, suggesting the involvement of triplexes in feedback-based gene regulation. With focus on biotechnological applications, we screen mammalian genomes for high-affinity triplex target sites that can be used to target genomic loci specifically and find that triplex formation offers a resolution of ~1300 nt.


Asunto(s)
Algoritmos , Perfilación de la Expresión Génica/métodos , Genómica/métodos , Oligonucleótidos/química , Proteínas de Unión al ARN/química , Animales , Cromatina/química , Cromatina/genética , Dicroismo Circular , Biología Computacional/métodos , ADN/química , ADN/genética , Sitios Genéticos , Genoma Humano , Humanos , Enlace de Hidrógeno , Conformación de Ácido Nucleico , Oligonucleótidos/genética , Regiones Promotoras Genéticas , Estabilidad del ARN , Proteínas de Unión al ARN/genética , Factores de Tiempo
13.
Bioinformatics ; 30(19): 2723-32, 2014 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-24919879

RESUMEN

MOTIVATION: Bioinformatics tools, such as assemblers and aligners, are expected to produce more accurate results when given better quality sequence data as their starting point. This expectation has led to the development of stand-alone tools whose sole purpose is to detect and remove sequencing errors. A good error-correcting tool would be a transparent component in a bioinformatics pipeline, simply taking sequence data in any of the standard formats and producing a higher quality version of the same data containing far fewer errors. It should not only be able to correct all of the types of errors found in real sequence data (substitutions, insertions, deletions and uncalled bases), but it has to be both fast enough and scalable enough to be usable on the large datasets being produced by current sequencing technologies, and work on data derived from both haploid and diploid organisms. RESULTS: This article presents Blue, an error-correction algorithm based on k-mer consensus and context. Blue can correct substitution, deletion and insertion errors, as well as uncalled bases. It accepts both FASTQ and FASTA formats, and corrects quality scores for corrected bases. Blue also maintains the pairing of reads, both within a file and between pairs of files, making it compatible with downstream tools that depend on read pairing. Blue is memory efficient, scalable and faster than other published tools, and usable on large sequencing datasets. On the tests undertaken, Blue also proved to be generally more accurate than other published algorithms, resulting in more accurately aligned reads and the assembly of longer contigs containing fewer errors. One significant feature of Blue is that its k-mer consensus table does not have to be derived from the set of reads being corrected. This decoupling makes it possible to correct one dataset, such as small set of 454 mate-pair reads, with the consensus derived from another dataset, such as Illumina reads derived from the same DNA sample. Such cross-correction can greatly improve the quality of small (and expensive) sets of long reads, leading to even better assemblies and higher quality finished genomes. AVAILABILITY AND IMPLEMENTATION: The code for Blue and its related tools are available from http://www.bioinformatics.csiro.au/Blue. These programs are written in C# and run natively under Windows and under Mono on Linux.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Análisis de Secuencia de ADN/métodos , Animales , ADN Bacteriano/análisis , Bases de Datos Genéticas , Genoma , Genoma Bacteriano , Genoma Humano , Humanos , Ploidias , Reproducibilidad de los Resultados , Eliminación de Secuencia , Programas Informáticos
14.
Bioinformatics ; 30(10): 1471-2, 2014 May 15.
Artículo en Inglés | MEDLINE | ID: mdl-24470576

RESUMEN

SUMMARY: The initial steps in the analysis of next-generation sequencing data can be automated by way of software 'pipelines'. However, individual components depreciate rapidly because of the evolving technology and analysis methods, often rendering entire versions of production informatics pipelines obsolete. Constructing pipelines from Linux bash commands enables the use of hot swappable modular components as opposed to the more rigid program call wrapping by higher level languages, as implemented in comparable published pipelining systems. Here we present Next Generation Sequencing ANalysis for Enterprises (NGSANE), a Linux-based, high-performance-computing-enabled framework that minimizes overhead for set up and processing of new projects, yet maintains full flexibility of custom scripting when processing raw sequence data. AVAILABILITY AND IMPLEMENTATION: Ngsane is implemented in bash and publicly available under BSD (3-Clause) licence via GitHub at https://github.com/BauerLab/ngsane. CONTACT: Denis.Bauer@csiro.au SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Automatización de Laboratorios , Humanos , Programas Informáticos
15.
Crit Rev Microbiol ; 41(3): 326-40, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-24645635

RESUMEN

The capacity of our gut microbial communities to maintain a stable and balanced state, termed 'resilience', in spite of perturbations is vital to our achieving and maintaining optimal health. A loss of microbial resilience is observed in a number of diseases including obesity, diabetes and metabolic syndrome. There are large gaps in our understanding of why an individual's co-evolved microflora consortium fail to develop resilience thereby establishing a trajectory towards poor metabolic health. This review examines the connections between the developing gut microbiota and intestinal barrier function in the neonate, infant and during the first years of life. We propose that the effects of early life events on the gut microflora and permeability, whilst it is in a dynamic and vulnerable state, are fundamental in shaping the microbial consortia's resilience and that it is the maintenance of resilience that is pivotal for metabolic health throughout life. We review the literature supporting this concept suggesting new potential research directions aimed at developing a greater understanding of the longitudinal effects of the gut microflora on metabolic health and potential interventions to recalibrate the 'at risk' infant gut microflora in the direction of enhanced metabolic health.


Asunto(s)
Microbioma Gastrointestinal/fisiología , Tracto Gastrointestinal/microbiología , Mucosa Intestinal/microbiología , Consorcios Microbianos/fisiología , Permeabilidad , Factores de Edad , Antiinfecciosos/farmacología , Femenino , Humanos , Lactante , Recién Nacido , Mucosa Intestinal/inmunología , Embarazo , Uniones Estrechas/fisiología
16.
Bioinformatics ; 29(15): 1895-7, 2013 Aug 01.
Artículo en Inglés | MEDLINE | ID: mdl-23740745

RESUMEN

SUMMARY: At the heart of many modern biotechnological and therapeutic applications lies the need to target specific genomic loci with pinpoint accuracy. Although landmark experiments demonstrate technological maturity in manufacturing and delivering genetic material, the genomic sequence analysis to find suitable targets lags behind. We provide a computational aid for the sophisticated design of sequence-specific ligands and selection of appropriate targets, taking gene location and genomic architecture into account. AVAILABILITY: Source code and binaries are downloadable from www.bioinformatics.org.au/triplexator/inspector. CONTACT: t.bailey@uq.edu.au SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
ADN/química , Marcación de Gen , Programas Informáticos , Sitios Genéticos , Genómica , Humanos , Ácidos Nucleicos de Péptidos/química
17.
Nucleic Acids Res ; 40(16): 7633-43, 2012 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-22705792

RESUMEN

Recent advances in RNA sequencing technology (RNA-Seq) enables comprehensive profiling of RNAs by producing millions of short sequence reads from size-fractionated RNA libraries. Although conventional tools for detecting and distinguishing non-coding RNAs (ncRNAs) from reference-genome data can be applied to sequence data, ncRNA detection can be improved by harnessing the full information content provided by this new technology. Here we present NorahDesk, the first unbiased and universally applicable method for small ncRNAs detection from RNA-Seq data. NorahDesk utilizes the coverage-distribution of small RNA sequence data as well as thermodynamic assessments of secondary structure to reliably predict and annotate ncRNA classes. Using publicly available mouse sequence data from brain, skeletal muscle, testis and ovary, we evaluated our method with an emphasis on the performance for microRNAs (miRNAs) and piwi-interacting small RNA (piRNA). We compared our method with Dario and mirDeep2 and found that NorahDesk produces longer transcripts with higher read coverage. This feature makes it the first method particularly suitable for the prediction of both known and novel piRNAs.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , ARN Pequeño no Traducido/química , Análisis de Secuencia de ARN , Programas Informáticos , Animales , Ratones , MicroARNs/química , Conformación de Ácido Nucleico , Hibridación de Ácido Nucleico , Precursores del ARN/química , ARN Interferente Pequeño/química , ARN Pequeño no Traducido/clasificación , ARN Pequeño no Traducido/metabolismo
18.
Stud Health Technol Inform ; 310: 770-774, 2024 Jan 25.
Artículo en Inglés | MEDLINE | ID: mdl-38269913

RESUMEN

With the advancement of genomic engineering and genetic modification techniques, the uptake of computational tools to design guide RNA increased drastically. Searching for genomic targets to design guides with maximum on-target activity (efficiency) and minimum off-target activity (specificity) is now an essential part of genome editing experiments. Today, a variety of tools exist that allow the search of genomic targets and let users customize their search parameters to better suit their experiments. Here we present an overview of different ways to visualize these searched CRISPR target sites along with specific downstream information like primer design, restriction enzyme activity and mutational outcome prediction after a double-stranded break. We discuss the importance of a good visualization summary to interpret information along with different ways to represent similar information effectively.


Asunto(s)
Sistemas CRISPR-Cas , Visualización de Datos , ARN Guía de Sistemas CRISPR-Cas , Ingeniería , Genómica
19.
Stud Health Technol Inform ; 310: 1021-1025, 2024 Jan 25.
Artículo en Inglés | MEDLINE | ID: mdl-38269969

RESUMEN

Coronary artery disease (CAD) has the highest disease burden worldwide. To manage this burden, predictive models are required to screen patients for preventative treatment. A range of variables have been explored for their capacity to predict disease, including phenotypic (age, sex, BMI and smoking status), medical imaging (carotid artery thickness) and genotypic. We use a machine learning models and the UK Biobank cohort to measure the prediction capacity of these 3 variable categories, both in combination and isolation. We demonstrate that phenotypic variables from the Framingham risk score have the best prediction capacity, although a combination of phenotypic, medical imaging and genotypic variables deliver the most specific models. Furthermore, we demonstrate that Variant Spark, a random forest based GWAS platform, performs effective feature selection for SNP-based genotype variables, identifying 115 significantly associated SNPs to the CAD phenotype.


Asunto(s)
Enfermedad de la Arteria Coronaria , Humanos , Enfermedad de la Arteria Coronaria/diagnóstico por imagen , Enfermedad de la Arteria Coronaria/genética , Grosor Intima-Media Carotídeo , Fenotipo , Genotipo , Aprendizaje Automático
20.
Stud Health Technol Inform ; 310: 810-814, 2024 Jan 25.
Artículo en Inglés | MEDLINE | ID: mdl-38269921

RESUMEN

Genetic data is limited and generating new datasets is often an expensive, time-consuming process, involving countless moving parts to genotype and phenotype individuals. While sharing data is beneficial for quality control and software development, privacy and security are of utmost importance. Generating synthetic data is a practical solution to mitigate the cost, time and sensitivities that hamper developers and researchers in producing and validating novel biotechnological solutions to data intensive problems. Existing methods focus on mutation frequencies at specific loci while ignoring epistatic interactions. Alternatively, programs that do consider epistasis are limited to two-way interactions or apply genomic constraints that make synthetic data generation arduous or computationally intensive. To solve this, we developed Polygenic Epistatic Phenotype Simulator (PEPS). Our tool is a probabilistic model that can generate synthetic phenotypes with a controllable level of complexity.


Asunto(s)
Biotecnología , Modelos Estadísticos , Humanos , Simulación por Computador , Fenotipo , Genotipo
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA