Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 61
Filtrar
1.
Brief Bioinform ; 22(1): 308-314, 2021 01 18.
Artigo em Inglês | MEDLINE | ID: mdl-32008042

RESUMO

The use of machine learning (ML) has become prevalent in the genome engineering space, with applications ranging from predicting target site efficiency to forecasting the outcome of repair events. However, jargon and ML-specific accuracy measures have made it hard to assess the validity of individual approaches, potentially leading to misinterpretation of ML results. This review aims to close the gap by discussing ML approaches and pitfalls in the context of CRISPR gene-editing applications. Specifically, we address common considerations, such as algorithm choice, as well as problems, such as overestimating accuracy and data interoperability, by providing tangible examples from the genome-engineering domain. Equipping researchers with the knowledge to effectively use ML to better design gene-editing experiments and predict experimental outcomes will help advance the field more rapidly.


Assuntos
Sistemas CRISPR-Cas , Edição de Genes/métodos , Aprendizado de Máquina , Animais , Edição de Genes/normas , Genômica/métodos , Genômica/normas , Humanos
2.
Prenat Diagn ; 43(1): 109-116, 2023 01.
Artigo em Inglês | MEDLINE | ID: mdl-36484552

RESUMO

OBJECTIVE: European and Australian guidelines for cystic fibrosis (CF) reproductive carrier screening recommend testing a small number of high frequency CF causing variants, rather than comprehensive CFTR sequencing. The study objective was to determine variant detection rates of commercially available targeted reproductive carrier screening tests in Australia. METHODS: Next-generation DNA sequencing of the CFTR gene was performed on 2552 individuals from a whole population sample to identify CF causing variants. The variant detection rates of two commercially available Australian reproductive carrier screening tests, which target 50 or 175 CF causing variants, in this population were calculated. The ethnicity of individuals was determined using principal component analysis. RESULTS: Variant detection rates of the tests for 50 and 175 CF causing variants were 88.2% and 90.8%, respectively. No CF causing variants in individuals of East Asian ethnicity (n = 3) were detected by either test, while >86.6% (n = 69) of CF causing variants in Europeans would be identified by either test. CONCLUSIONS: Reproductive carrier screening tests for a targeted set of high frequency CF variants are unable to detect approximately 10% of CF variants in a multiethnic Australian population, and individuals of East Asian ethnicity are disproportionally affected by this test limitation.


Assuntos
Fibrose Cística , Humanos , Fibrose Cística/diagnóstico , Fibrose Cística/epidemiologia , Fibrose Cística/genética , Regulador de Condutância Transmembrana em Fibrose Cística/genética , Austrália/epidemiologia , Testes Genéticos , Etnicidade , Mutação
3.
Nucleic Acids Res ; 49(18): 10785-10795, 2021 10 11.
Artigo em Inglês | MEDLINE | ID: mdl-34534334

RESUMO

Precise genomic modification using prime editing (PE) holds enormous potential for research and clinical applications. In this study, we generated all-in-one prime editing (PEA1) constructs that carry all the components required for PE, along with a selection marker. We tested these constructs (with selection) in HEK293T, K562, HeLa and mouse embryonic stem (ES) cells. We discovered that PE efficiency in HEK293T cells was much higher than previously observed, reaching up to 95% (mean 67%). The efficiency in K562 and HeLa cells, however, remained low. To improve PE efficiency in K562 and HeLa, we generated a nuclease prime editor and tested this system in these cell lines as well as mouse ES cells. PE-nuclease greatly increased prime editing initiation, however, installation of the intended edits was often accompanied by extra insertions derived from the repair template. Finally, we show that zygotic injection of the nuclease prime editor can generate correct modifications in mouse fetuses with up to 100% efficiency.


Assuntos
Proteína 9 Associada à CRISPR , Edição de Genes , Animais , Proteína 9 Associada à CRISPR/genética , Células Cultivadas , Células-Tronco Embrionárias/metabolismo , Células HEK293 , Células HeLa , Humanos , Células K562 , Camundongos , Plasmídeos/genética , Zigoto
4.
Brief Bioinform ; 21(6): 1920-1936, 2020 12 01.
Artigo em Inglês | MEDLINE | ID: mdl-31774481

RESUMO

Oncogenesis and cancer can arise as a consequence of a wide range of genomic aberrations including mutations, copy number alterations, expression changes and epigenetic modifications encompassing multiple omics layers. Integrating genomic, transcriptomic, proteomic and epigenomic datasets via multi-omics analysis provides the opportunity to derive a deeper and holistic understanding of the development and progression of cancer. There are two primary approaches to integrating multi-omics data: multi-staged (focused on identifying genes driving cancer) and meta-dimensional (focused on establishing clinically relevant tumour or sample classifications). A number of ready-to-use bioinformatics tools are available to perform both multi-staged and meta-dimensional integration of multi-omics data. In this study, we compared nine different integration tools using real and simulated cancer datasets. The performance of the multi-staged integration tools were assessed at the gene, function and pathway levels, while meta-dimensional integration tools were assessed based on the sample classification performance. Additionally, we discuss the influence of factors such as data representation, sample size, signal and noise on multi-omics data integration. Our results provide current and much needed guidance regarding selection and use of the most appropriate and best performing multi-omics integration tools.


Assuntos
Biologia Computacional , Genômica , Neoplasias , Proteômica , Biologia Computacional/métodos , Variações do Número de Cópias de DNA , Epigenômica , Perfilação da Expressão Gênica , Genômica/métodos , Humanos , Neoplasias/genética , Oncogenes , Transcriptoma
5.
J Med Genet ; 2020 May 14.
Artigo em Inglês | MEDLINE | ID: mdl-32409511

RESUMO

BACKGROUND: Amyotrophic lateral sclerosis (ALS) is a fatal neurodegenerative disease with phenotypic and genetic heterogeneity. Approximately 10% of cases are familial, while remaining cases are classified as sporadic. To date, >30 genes and several hundred genetic variants have been implicated in ALS. METHODS: Seven hundred and fifty-seven sporadic ALS cases were recruited from Australian neurology clinics. Detailed clinical data and whole genome sequencing (WGS) data were available from 567 and 616 cases, respectively, of which 426 cases had both datasets available. As part of a comprehensive genetic analysis, 853 genetic variants previously reported as ALS-linked mutations or disease-associated alleles were interrogated in sporadic ALS WGS data. Statistical analyses were performed to identify correlation between clinical variables, and between phenotype and the number of ALS-implicated variants carried by an individual. Relatedness between individuals carrying identical variants was assessed using identity-by-descent analysis. RESULTS: Forty-three ALS-implicated variants from 18 genes, including C9orf72, ATXN2, TARDBP, SOD1, SQSTM1 and SETX, were identified in Australian sporadic ALS cases. One-third of cases carried at least one variant and 6.82% carried two or more variants, implicating a potential oligogenic or polygenic basis of ALS. Relatedness was detected between two sporadic ALS cases carrying a SOD1 p.I114T mutation, and among three cases carrying a SQSTM1 p.K238E mutation. Oligogenic/polygenic sporadic ALS cases showed earlier age of onset than those with no reported variant. CONCLUSION: We confirm phenotypic associations among ALS cases, and highlight the contribution of genetic variation to all forms of ALS.

6.
Brief Bioinform ; 19(2): 179-187, 2018 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-27802932

RESUMO

Motivation: Despite being essential for numerous clinical and research applications, high-resolution human leukocyte antigen (HLA) typing remains challenging and laboratory tests are also time-consuming and labour intensive. With next-generation sequencing data becoming widely accessible, on-demand in silico HLA typing offers an economical and efficient alternative. Results: In this study we evaluate the HLA typing accuracy and efficiency of five computational HLA typing methods by comparing their predictions against a curated set of > 1000 published polymerase chain reaction-derived HLA genotypes on three different data sets (whole genome sequencing, whole exome sequencing and transcriptomic sequencing data). The highest accuracy at clinically relevant resolution (four digits) we observe is 81% on RNAseq data by PHLAT and 99% accuracy by OptiType when limited to Class I genes only. We also observed variability between the tools for resource consumption, with runtime ranging from an average of 5 h (HLAminer) to 7 min (seq2HLA) and memory from 12.8 GB (HLA-VBSeq) to 0.46 GB (HLAminer) per sample. While a minimal coverage is required, other factors also determine prediction accuracy and the results between tools do not correlate well. Therefore, by combining tools, there is the potential to develop a highly accurate ensemble method that is able to deliver fast, economical HLA typing from existing sequencing data.


Assuntos
Algoritmos , Antígenos HLA/genética , Teste de Histocompatibilidade/métodos , Análise de Sequência de DNA/métodos , Biologia Computacional/métodos , Exoma , Genótipo , Humanos
7.
Genome Res ; 26(6): 719-31, 2016 06.
Artigo em Inglês | MEDLINE | ID: mdl-27053337

RESUMO

A three-dimensional chromatin state underpins the structural and functional basis of the genome by bringing regulatory elements and genes into close spatial proximity to ensure proper, cell-type-specific gene expression profiles. Here, we performed Hi-C chromosome conformation capture sequencing to investigate how three-dimensional chromatin organization is disrupted in the context of copy-number variation, long-range epigenetic remodeling, and atypical gene expression programs in prostate cancer. We find that cancer cells retain the ability to segment their genomes into megabase-sized topologically associated domains (TADs); however, these domains are generally smaller due to establishment of additional domain boundaries. Interestingly, a large proportion of the new cancer-specific domain boundaries occur at regions that display copy-number variation. Notably, a common deletion on 17p13.1 in prostate cancer spanning the TP53 tumor suppressor locus results in bifurcation of a single TAD into two distinct smaller TADs. Change in domain structure is also accompanied by novel cancer-specific chromatin interactions within the TADs that are enriched at regulatory elements such as enhancers, promoters, and insulators, and associated with alterations in gene expression. We also show that differential chromatin interactions across regulatory regions occur within long-range epigenetically activated or silenced regions of concordant gene activation or repression in prostate cancer. Finally, we present a novel visualization tool that enables integrated exploration of Hi-C interaction data, the transcriptome, and epigenome. This study provides new insights into the relationship between long-range epigenetic and genomic dysregulation and changes in higher-order chromatin interactions in cancer.


Assuntos
Cromatina/genética , Epigênese Genética , Neoplasias/genética , Fator de Ligação a CCCTC , Linhagem Celular Tumoral , Elementos Facilitadores Genéticos , Regulação Neoplásica da Expressão Gênica , Genoma Humano , Histonas/metabolismo , Humanos , Anotação de Sequência Molecular , Neoplasias/metabolismo , Ligação Proteica , Processamento de Proteína Pós-Traducional , Proteínas Repressoras/fisiologia
8.
BMC Biotechnol ; 19(1): 40, 2019 06 27.
Artigo em Inglês | MEDLINE | ID: mdl-31248401

RESUMO

BACKGROUND: Natural variations in a genome can drastically alter the CRISPR-Cas9 off-target landscape by creating or removing sites. Despite the resulting potential side-effects from such unaccounted for sites, current off-target detection pipelines are not equipped to include variant information. To address this, we developed VARiant-aware detection and SCoring of Off-Targets (VARSCOT). RESULTS: VARSCOT identifies only 0.6% of off-targets to be common between 4 individual genomes and the reference, with an average of 82% of off-targets unique to an individual. VARSCOT is the most sensitive detection method for off-targets, finding 40 to 70% more experimentally verified off-targets compared to other popular software tools and its machine learning model allows for CRISPR-Cas9 concentration aware off-target activity scoring. CONCLUSIONS: VARSCOT allows researchers to take genomic variation into account when designing individual or population-wide targeting strategies. VARSCOT is available from https://github.com/BauerLab/VARSCOT .


Assuntos
Sistemas CRISPR-Cas , Biologia Computacional/métodos , Edição de Genes/métodos , Marcação de Genes/métodos , Genômica/métodos , Software , Edição de Genes/normas , Marcação de Genes/normas , Genômica/normas , Internet , Reprodutibilidade dos Testes
9.
Blood ; 128(9): 1290-301, 2016 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-27465915

RESUMO

The factors that determine red blood cell (RBC) lifespan and the rate of RBC aging have not been fully elucidated. In several genetic conditions, including sickle cell disease, thalassemia, and G6PD deficiency, erythrocyte lifespan is significantly shortened. Many of these diseases are also associated with protection from severe malaria, suggesting a role for accelerated RBC senescence and clearance in malaria resistance. Here, we report a novel, N-ethyl-N-nitrosourea-induced mutation that causes a gain of function in adenosine 5'-monophosphate deaminase (AMPD3). Mice carrying the mutation exhibit rapid RBC turnover, with increased erythropoiesis, dramatically shortened RBC lifespan, and signs of increased RBC senescence/eryptosis, suggesting a key role for AMPD3 in determining RBC half-life. Mice were also found to be resistant to infection with the rodent malaria Plasmodium chabaudi. We propose that resistance to P. chabaudi is mediated by increased RBC turnover and higher rates of erythropoiesis during infection.


Assuntos
AMP Desaminase , Eritrócitos/imunologia , Imunidade Inata , Malária , Mutação , Plasmodium chabaudi/imunologia , AMP Desaminase/genética , AMP Desaminase/imunologia , Animais , Senescência Celular/genética , Senescência Celular/imunologia , Eritrócitos/parasitologia , Eritropoese/genética , Eritropoese/imunologia , Etilnitrosoureia/toxicidade , Meia-Vida , Malária/genética , Malária/imunologia , Masculino , Camundongos
10.
BMC Genomics ; 16: 866, 2015 Oct 26.
Artigo em Inglês | MEDLINE | ID: mdl-26503232

RESUMO

BACKGROUND: N-ethyl-N-nitrosourea (ENU) mutagen has become the method of choice for inducing random mutations for forward genetics applications. However, distinguishing induced mutations from sequencing errors or sporadic mutations is difficult, which has hampered surveys of potential biases in the methodology in the past. Addressing this issue, we created a large cohort of mice with biological replicates enabling the confident calling of induced mutations, which in turn allowed us to conduct a comprehensive analysis of potential biases in mutation properties and genomic location. RESULTS: In the exome sequencing data we observe the known preference of ENU to cause A:T=>G:C transitions in longer genes. Mutations were frequently clustered and inherited in blocks hampering attempts to pinpoint individual causative mutations by genome analysis only. Furthermore, ENU mutations were biased towards areas in the genome that are accessible in testis, potentially limiting the scope of forward genetic approaches to only 1-10% of the genome. CONCLUSION: ENU provides a powerful tool for exploring the genome-phenome relationship, however forward genetic applications that require the mutation to be passed on through the germ line may be limited to explore only genes that are accessible in testis.


Assuntos
Etilnitrosoureia/toxicidade , Mutagênicos/toxicidade , Mutação/genética , Animais , Exoma/efeitos dos fármacos , Exoma/genética , Estudo de Associação Genômica Ampla , Masculino , Camundongos , Mutagênese/efeitos dos fármacos , Mutagênese/genética , Testículo/efeitos dos fármacos , Testículo/metabolismo
11.
BMC Genomics ; 16: 1052, 2015 Dec 10.
Artigo em Inglês | MEDLINE | ID: mdl-26651996

RESUMO

BACKGROUND: Genomic information is increasingly used in medical practice giving rise to the need for efficient analysis methodology able to cope with thousands of individuals and millions of variants. The widely used Hadoop MapReduce architecture and associated machine learning library, Mahout, provide the means for tackling computationally challenging tasks. However, many genomic analyses do not fit the Map-Reduce paradigm. We therefore utilise the recently developed SPARK engine, along with its associated machine learning library, MLlib, which offers more flexibility in the parallelisation of population-scale bioinformatics tasks. The resulting tool, VARIANTSPARK provides an interface from MLlib to the standard variant format (VCF), offers seamless genome-wide sampling of variants and provides a pipeline for visualising results. RESULTS: To demonstrate the capabilities of VARIANTSPARK, we clustered more than 3,000 individuals with 80 Million variants each to determine the population structure in the dataset. VARIANTSPARK is 80 % faster than the SPARK-based genome clustering approach, ADAM, the comparable implementation using Hadoop/Mahout, as well as ADMIXTURE, a commonly used tool for determining individual ancestries. It is over 90 % faster than traditional implementations using R and Python. CONCLUSION: The benefits of speed, resource consumption and scalability enables VARIANTSPARK to open up the usage of advanced, efficient machine learning algorithms to genomic data.


Assuntos
Biologia Computacional/métodos , Genótipo , Algoritmos , Análise por Conglomerados , Humanos , Polimorfismo de Nucleotídeo Único , Software
12.
Genome Res ; 22(7): 1372-81, 2012 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-22550012

RESUMO

Double-stranded DNA is able to form triple-helical structures by accommodating a third nucleotide strand in its major groove. This sequence-specific process offers a potent mechanism for targeting genomic loci of interest that is of great value for biotechnological and gene-therapeutic applications. It is likely that nature has leveraged this addressing system for gene regulation, because computational studies have uncovered an abundance of putative triplex target sites in various genomes, with enrichment particularly in gene promoters. However, to draw a more complete picture of the in vivo role of triplexes, not only the putative targets but also the sequences acting as the third strand and their capability to pair with the predicted target sites need to be studied. Here we present Triplexator, the first computational framework that integrates all aspects of triplex formation, and showcase its potential by discussing research examples for which the different aspects of triplex formation are important. We find that chromatin-associated RNAs have a significantly higher fraction of sequence features able to form triplexes than expected at random, suggesting their involvement in gene regulation. We furthermore identify hundreds of human genes that contain sequence features in their promoter predicted to be able to form a triplex with a target within the same promoter, suggesting the involvement of triplexes in feedback-based gene regulation. With focus on biotechnological applications, we screen mammalian genomes for high-affinity triplex target sites that can be used to target genomic loci specifically and find that triplex formation offers a resolution of ~1300 nt.


Assuntos
Algoritmos , Perfilação da Expressão Gênica/métodos , Genômica/métodos , Oligonucleotídeos/química , Proteínas de Ligação a RNA/química , Animais , Cromatina/química , Cromatina/genética , Dicroísmo Circular , Biologia Computacional/métodos , DNA/química , DNA/genética , Loci Gênicos , Genoma Humano , Humanos , Ligação de Hidrogênio , Conformação de Ácido Nucleico , Oligonucleotídeos/genética , Regiões Promotoras Genéticas , Estabilidade de RNA , Proteínas de Ligação a RNA/genética , Fatores de Tempo
13.
Bioinformatics ; 30(19): 2723-32, 2014 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-24919879

RESUMO

MOTIVATION: Bioinformatics tools, such as assemblers and aligners, are expected to produce more accurate results when given better quality sequence data as their starting point. This expectation has led to the development of stand-alone tools whose sole purpose is to detect and remove sequencing errors. A good error-correcting tool would be a transparent component in a bioinformatics pipeline, simply taking sequence data in any of the standard formats and producing a higher quality version of the same data containing far fewer errors. It should not only be able to correct all of the types of errors found in real sequence data (substitutions, insertions, deletions and uncalled bases), but it has to be both fast enough and scalable enough to be usable on the large datasets being produced by current sequencing technologies, and work on data derived from both haploid and diploid organisms. RESULTS: This article presents Blue, an error-correction algorithm based on k-mer consensus and context. Blue can correct substitution, deletion and insertion errors, as well as uncalled bases. It accepts both FASTQ and FASTA formats, and corrects quality scores for corrected bases. Blue also maintains the pairing of reads, both within a file and between pairs of files, making it compatible with downstream tools that depend on read pairing. Blue is memory efficient, scalable and faster than other published tools, and usable on large sequencing datasets. On the tests undertaken, Blue also proved to be generally more accurate than other published algorithms, resulting in more accurately aligned reads and the assembly of longer contigs containing fewer errors. One significant feature of Blue is that its k-mer consensus table does not have to be derived from the set of reads being corrected. This decoupling makes it possible to correct one dataset, such as small set of 454 mate-pair reads, with the consensus derived from another dataset, such as Illumina reads derived from the same DNA sample. Such cross-correction can greatly improve the quality of small (and expensive) sets of long reads, leading to even better assemblies and higher quality finished genomes. AVAILABILITY AND IMPLEMENTATION: The code for Blue and its related tools are available from http://www.bioinformatics.csiro.au/Blue. These programs are written in C# and run natively under Windows and under Mono on Linux.


Assuntos
Algoritmos , Biologia Computacional/métodos , Análise de Sequência de DNA/métodos , Animais , DNA Bacteriano/análise , Bases de Dados Genéticas , Genoma , Genoma Bacteriano , Genoma Humano , Humanos , Ploidias , Reprodutibilidade dos Testes , Deleção de Sequência , Software
14.
Bioinformatics ; 30(10): 1471-2, 2014 May 15.
Artigo em Inglês | MEDLINE | ID: mdl-24470576

RESUMO

SUMMARY: The initial steps in the analysis of next-generation sequencing data can be automated by way of software 'pipelines'. However, individual components depreciate rapidly because of the evolving technology and analysis methods, often rendering entire versions of production informatics pipelines obsolete. Constructing pipelines from Linux bash commands enables the use of hot swappable modular components as opposed to the more rigid program call wrapping by higher level languages, as implemented in comparable published pipelining systems. Here we present Next Generation Sequencing ANalysis for Enterprises (NGSANE), a Linux-based, high-performance-computing-enabled framework that minimizes overhead for set up and processing of new projects, yet maintains full flexibility of custom scripting when processing raw sequence data. AVAILABILITY AND IMPLEMENTATION: Ngsane is implemented in bash and publicly available under BSD (3-Clause) licence via GitHub at https://github.com/BauerLab/ngsane. CONTACT: Denis.Bauer@csiro.au SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Automação Laboratorial , Humanos , Software
15.
Crit Rev Microbiol ; 41(3): 326-40, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-24645635

RESUMO

The capacity of our gut microbial communities to maintain a stable and balanced state, termed 'resilience', in spite of perturbations is vital to our achieving and maintaining optimal health. A loss of microbial resilience is observed in a number of diseases including obesity, diabetes and metabolic syndrome. There are large gaps in our understanding of why an individual's co-evolved microflora consortium fail to develop resilience thereby establishing a trajectory towards poor metabolic health. This review examines the connections between the developing gut microbiota and intestinal barrier function in the neonate, infant and during the first years of life. We propose that the effects of early life events on the gut microflora and permeability, whilst it is in a dynamic and vulnerable state, are fundamental in shaping the microbial consortia's resilience and that it is the maintenance of resilience that is pivotal for metabolic health throughout life. We review the literature supporting this concept suggesting new potential research directions aimed at developing a greater understanding of the longitudinal effects of the gut microflora on metabolic health and potential interventions to recalibrate the 'at risk' infant gut microflora in the direction of enhanced metabolic health.


Assuntos
Microbioma Gastrointestinal/fisiologia , Trato Gastrointestinal/microbiologia , Mucosa Intestinal/microbiologia , Consórcios Microbianos/fisiologia , Permeabilidade , Fatores Etários , Anti-Infecciosos/farmacologia , Feminino , Humanos , Lactente , Recém-Nascido , Mucosa Intestinal/imunologia , Gravidez , Junções Íntimas/fisiologia
16.
Bioinformatics ; 29(15): 1895-7, 2013 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-23740745

RESUMO

SUMMARY: At the heart of many modern biotechnological and therapeutic applications lies the need to target specific genomic loci with pinpoint accuracy. Although landmark experiments demonstrate technological maturity in manufacturing and delivering genetic material, the genomic sequence analysis to find suitable targets lags behind. We provide a computational aid for the sophisticated design of sequence-specific ligands and selection of appropriate targets, taking gene location and genomic architecture into account. AVAILABILITY: Source code and binaries are downloadable from www.bioinformatics.org.au/triplexator/inspector. CONTACT: t.bailey@uq.edu.au SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
DNA/química , Marcação de Genes , Software , Loci Gênicos , Genômica , Humanos , Ácidos Nucleicos Peptídicos/química
17.
Nucleic Acids Res ; 40(16): 7633-43, 2012 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-22705792

RESUMO

Recent advances in RNA sequencing technology (RNA-Seq) enables comprehensive profiling of RNAs by producing millions of short sequence reads from size-fractionated RNA libraries. Although conventional tools for detecting and distinguishing non-coding RNAs (ncRNAs) from reference-genome data can be applied to sequence data, ncRNA detection can be improved by harnessing the full information content provided by this new technology. Here we present NorahDesk, the first unbiased and universally applicable method for small ncRNAs detection from RNA-Seq data. NorahDesk utilizes the coverage-distribution of small RNA sequence data as well as thermodynamic assessments of secondary structure to reliably predict and annotate ncRNA classes. Using publicly available mouse sequence data from brain, skeletal muscle, testis and ovary, we evaluated our method with an emphasis on the performance for microRNAs (miRNAs) and piwi-interacting small RNA (piRNA). We compared our method with Dario and mirDeep2 and found that NorahDesk produces longer transcripts with higher read coverage. This feature makes it the first method particularly suitable for the prediction of both known and novel piRNAs.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Pequeno RNA não Traduzido/química , Análise de Sequência de RNA , Software , Animais , Camundongos , MicroRNAs/química , Conformação de Ácido Nucleico , Hibridização de Ácido Nucleico , Precursores de RNA/química , RNA Interferente Pequeno/química , Pequeno RNA não Traduzido/classificação , Pequeno RNA não Traduzido/metabolismo
18.
Stud Health Technol Inform ; 310: 770-774, 2024 Jan 25.
Artigo em Inglês | MEDLINE | ID: mdl-38269913

RESUMO

With the advancement of genomic engineering and genetic modification techniques, the uptake of computational tools to design guide RNA increased drastically. Searching for genomic targets to design guides with maximum on-target activity (efficiency) and minimum off-target activity (specificity) is now an essential part of genome editing experiments. Today, a variety of tools exist that allow the search of genomic targets and let users customize their search parameters to better suit their experiments. Here we present an overview of different ways to visualize these searched CRISPR target sites along with specific downstream information like primer design, restriction enzyme activity and mutational outcome prediction after a double-stranded break. We discuss the importance of a good visualization summary to interpret information along with different ways to represent similar information effectively.


Assuntos
Sistemas CRISPR-Cas , Visualização de Dados , RNA Guia de Sistemas CRISPR-Cas , Engenharia , Genômica
19.
Stud Health Technol Inform ; 310: 1021-1025, 2024 Jan 25.
Artigo em Inglês | MEDLINE | ID: mdl-38269969

RESUMO

Coronary artery disease (CAD) has the highest disease burden worldwide. To manage this burden, predictive models are required to screen patients for preventative treatment. A range of variables have been explored for their capacity to predict disease, including phenotypic (age, sex, BMI and smoking status), medical imaging (carotid artery thickness) and genotypic. We use a machine learning models and the UK Biobank cohort to measure the prediction capacity of these 3 variable categories, both in combination and isolation. We demonstrate that phenotypic variables from the Framingham risk score have the best prediction capacity, although a combination of phenotypic, medical imaging and genotypic variables deliver the most specific models. Furthermore, we demonstrate that Variant Spark, a random forest based GWAS platform, performs effective feature selection for SNP-based genotype variables, identifying 115 significantly associated SNPs to the CAD phenotype.


Assuntos
Doença da Artéria Coronariana , Humanos , Doença da Artéria Coronariana/diagnóstico por imagem , Doença da Artéria Coronariana/genética , Espessura Intima-Media Carotídea , Fenótipo , Genótipo , Aprendizado de Máquina
20.
Mutat Res Rev Mutat Res ; : 108509, 2024 Jul 06.
Artigo em Inglês | MEDLINE | ID: mdl-38977176

RESUMO

Autism spectrum disorder (ASD) is a complex neurodevelopmental disorder (NDD) influenced by genetic, epigenetic, and environmental factors. Recent advancements in genomic analysis have shed light on numerous genes associated with ASD, highlighting the significant role of both common and rare genetic mutations, as well as copy number variations (CNVs), single nucleotide polymorphisms (SNPs) and unique de novo variants. These genetic variations disrupt neurodevelopmental pathways, contributing to the disorder's complexity. Notably, CNVs are present in 10%-20% of individuals with autism, with 3%-7% detectable through cytogenetic methods. While the role of submicroscopic CNVs in ASD has been recently studied, their association with genomic loci and genes has not been thoroughly explored. In this review, we focus on 47 CNV regions linked to ASD, encompassing 1,632 genes, including protein-coding genes and long non-coding RNAs (lncRNAs), of which 659 show significant brain expression. Using a list of ASD-associated genes from SFARI, we detect 17 regions harboring at least one known ASD-related protein-coding gene. Of the remaining 30 regions, we identify 24 regions containing at least one protein-coding gene with brain-enriched expression and a nervous system phenotype in mouse mutants, and one lncRNA with both brain-enriched expression and upregulation in iPSC to neuron differentiation. This review not only expands our understanding of the genetic diversity associated with ASD but also underscores the potential of lncRNAs in contributing to its etiology. Additionally, the discovered CNVs will be a valuable resource for future diagnostic, therapeutic, and research endeavors aimed at prioritizing genetic variations in ASD.

SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa