Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 43
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Funct Integr Genomics ; 24(5): 139, 2024 Aug 19.
Artículo en Inglés | MEDLINE | ID: mdl-39158621

RESUMEN

Recent advancements in biomedical technologies and the proliferation of high-dimensional Next Generation Sequencing (NGS) datasets have led to significant growth in the bulk and density of data. The NGS high-dimensional data, characterized by a large number of genomics, transcriptomics, proteomics, and metagenomics features relative to the number of biological samples, presents significant challenges for reducing feature dimensionality. The high dimensionality of NGS data poses significant challenges for data analysis, including increased computational burden, potential overfitting, and difficulty in interpreting results. Feature selection and feature extraction are two pivotal techniques employed to address these challenges by reducing the dimensionality of the data, thereby enhancing model performance, interpretability, and computational efficiency. Feature selection and feature extraction can be categorized into statistical and machine learning methods. The present study conducts a comprehensive and comparative review of various statistical, machine learning, and deep learning-based feature selection and extraction techniques specifically tailored for NGS and microarray data interpretation of humankind. A thorough literature search was performed to gather information on these techniques, focusing on array-based and NGS data analysis. Various techniques, including deep learning architectures, machine learning algorithms, and statistical methods, have been explored for microarray, bulk RNA-Seq, and single-cell, single-cell RNA-Seq (scRNA-Seq) technology-based datasets surveyed here. The study provides an overview of these techniques, highlighting their applications, advantages, and limitations in the context of high-dimensional NGS data. This review provides better insights for readers to apply feature selection and feature extraction techniques to enhance the performance of predictive models, uncover underlying biological patterns, and gain deeper insights into massive and complex NGS and microarray data.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Aprendizaje Automático , Humanos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Aprendizaje Profundo
2.
Brief Bioinform ; 22(1): 55-65, 2021 01 18.
Artículo en Inglés | MEDLINE | ID: mdl-32249310

RESUMEN

Precision medicine promises to revolutionize treatment, shifting therapeutic approaches from the classical one-size-fits-all to those more tailored to the patient's individual genomic profile, lifestyle and environmental exposures. Yet, to advance precision medicine's main objective-ensuring the optimum diagnosis, treatment and prognosis for each individual-investigators need access to large-scale clinical and genomic data repositories. Despite the vast proliferation of these datasets, locating and obtaining access to many remains a challenge. We sought to provide an overview of available patient-level datasets that contain both genotypic data, obtained by next-generation sequencing, and phenotypic data-and to create a dynamic, online catalog for consultation, contribution and revision by the research community. Datasets included in this review conform to six specific inclusion parameters that are: (i) contain data from more than 500 human subjects; (ii) contain both genotypic and phenotypic data from the same subjects; (iii) include whole genome sequencing or whole exome sequencing data; (iv) include at least 100 recorded phenotypic variables per subject; (v) accessible through a website or collaboration with investigators and (vi) make access information available in English. Using these criteria, we identified 30 datasets, reviewed them and provided results in the release version of a catalog, which is publicly available through a dynamic Web application and on GitHub. Users can review as well as contribute new datasets for inclusion (Web: https://avillachlab.shinyapps.io/genophenocatalog/; GitHub: https://github.com/hms-dbmi/GenoPheno-CatalogShiny).


Asunto(s)
Bases de Datos Genéticas , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Fenotipo , Medicina de Precisión/métodos , Predisposición Genética a la Enfermedad , Humanos , Secuenciación Completa del Genoma/métodos
3.
Genet Epidemiol ; 45(1): 36-45, 2021 02.
Artículo en Inglés | MEDLINE | ID: mdl-32864779

RESUMEN

The breakthroughs in next generation sequencing have allowed us to access data consisting of both common and rare variants, and in particular to investigate the impact of rare genetic variation on complex diseases. Although rare genetic variants are thought to be important components in explaining genetic mechanisms of many diseases, discovering these variants remains challenging, and most studies are restricted to population-based designs. Further, despite the shift in the field of genome-wide association studies (GWAS) towards studying rare variants due to the "missing heritability" phenomenon, little is known about rare X-linked variants associated with complex diseases. For instance, there is evidence that X-linked genes are highly involved in brain development and cognition when compared with autosomal genes; however, like most GWAS for other complex traits, previous GWAS for mental diseases have provided poor resources to deal with identification of rare variant associations on X-chromosome. In this paper, we address the two issues described above by proposing a method that can be used to test X-linked variants using sequencing data on families. Our method is much more general than existing methods, as it can be applied to detect both common and rare variants, and is applicable to autosomes as well. Our simulation study shows that the method is efficient, and exhibits good operational characteristics. An application to the University of Miami Study on Genetics of Autism and Related Disorders also yielded encouraging results.


Asunto(s)
Genes Ligados a X , Estudio de Asociación del Genoma Completo , Variación Genética , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Modelos Genéticos , Herencia Multifactorial
4.
Clin Chem ; 68(2): 313-321, 2022 02 01.
Artículo en Inglés | MEDLINE | ID: mdl-34871369

RESUMEN

BACKGROUND: To date, the usage of Galaxy, an open-source bioinformatics platform, has been reported primarily in research. We report 5 years' experience (2015 to 2020) with Galaxy in our hospital, as part of the "Assistance Publique-Hôpitaux de Paris" (AP-HP), to demonstrate its suitability for high-throughput sequencing (HTS) data analysis in a clinical laboratory setting. METHODS: Our Galaxy instance has been running since July 2015 and is used daily to study inherited diseases, cancer, and microbiology. For the molecular diagnosis of hereditary diseases, 6970 patients were analyzed with Galaxy (corresponding to a total of 7029 analyses). RESULTS: Using Galaxy, the time to process a batch of 23 samples-equivalent to a targeted DNA sequencing MiSeq run-from raw data to an annotated variant call file was generally less than 2 h for panels between 1 and 500 kb. Over 5 years, we only restarted the server twice for hardware maintenance and did not experience any significant troubles, demonstrating the robustness of our Galaxy installation in conjunction with HTCondor as a job scheduler and a PostgreSQL database. The quality of our targeted exome sequencing method was externally evaluated annually by the European Molecular Genetics Quality Network (EMQN). Sensitivity was mean (SD)% 99 (2)% for single nucleotide variants and 93 (9)% for small insertion-deletions. CONCLUSION: Our experience with Galaxy demonstrates it to be a suitable platform for HTS data analysis with vast potential to benefit patient care in a clinical laboratory setting.


Asunto(s)
Biología Computacional , Laboratorios Clínicos , Biología Computacional/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Análisis de Secuencia de ADN , Programas Informáticos
5.
Curr Issues Mol Biol ; 43(3): 1937-1949, 2021 Nov 06.
Artículo en Inglés | MEDLINE | ID: mdl-34889894

RESUMEN

The worldwide emergence and spread of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) since 2019 has highlighted the importance of rapid and reliable diagnostic testing to prevent and control the viral transmission. However, inaccurate results may occur due to false negatives (FN) caused by polymorphisms or point mutations related to the virus evolution and compromise the accuracy of the diagnostic tests. Therefore, PCR-based SARS-CoV-2 diagnostics should be evaluated and evolve together with the rapidly increasing number of new variants appearing around the world. However, even by using a large collection of samples, laboratories are not able to test a representative collection of samples that deals with the same level of diversity that is continuously evolving worldwide. In the present study, we proposed a methodology based on an in silico and in vitro analysis. First, we used all information offered by available whole-genome sequencing data for SARS-CoV-2 for the selection of the two PCR assays targeting two different regions in the genome, and to monitor the possible impact of virus evolution on the specificity of the primers and probes of the PCR assays during and after the development of the assays. Besides this first essential in silico evaluation, a minimal set of testing was proposed to generate experimental evidence on the method performance, such as specificity, sensitivity and applicability. Therefore, a duplex reverse-transcription droplet digital PCR (RT-ddPCR) method was evaluated in silico by using 154 489 whole-genome sequences of SARS-CoV-2 strains that were representative for the circulating strains around the world. The RT-ddPCR platform was selected as it presented several advantages to detect and quantify SARS-CoV-2 RNA in clinical samples and wastewater. Next, the assays were successfully experimentally evaluated for their sensitivity and specificity. A preliminary evaluation of the applicability of the developed method was performed using both clinical and wastewater samples.


Asunto(s)
Prueba de Ácido Nucleico para COVID-19/métodos , COVID-19/virología , Pruebas Diagnósticas de Rutina/métodos , Evolución Molecular , ARN Viral/genética , SARS-CoV-2/genética , COVID-19/diagnóstico , Humanos , Curva ROC , SARS-CoV-2/aislamiento & purificación
6.
Methods ; 173: 61-68, 2020 02 15.
Artículo en Inglés | MEDLINE | ID: mdl-31271880

RESUMEN

Structural variants (SVs) are a class of genomic variation shared by members of the same species. Though relatively rare, they represent an increasingly important class of variation, as SVs have been associated with diseases and susceptibility to some types of cancer. Common approaches to SV detection require the sequencing and mapping of fragments from a test genome to a high-quality reference genome. Candidate SVs correspond to fragments with discordant mapped configurations. However, because errors in the sequencing and mapping will also create discordant arrangements, many of these predictions will be spurious. When sequencing coverage is low, distinguishing true SVs from errors is even more challenging. In recent work, we have developed SV detection methods that exploit genome information of closely related individuals - parents and children. Our previous approaches were based on the assumption that any SV present in a child's genome must have come from one of their parents. However, using this strict restriction may have resulted in failing to predict any rare but novel variants present only in the child. In this work, we generalize our previous approaches to allow the child to carry novel variants. We consider a constrained optimization approach where variants in the child are of two types either inherited - and therefore must be present in a parent - or novel. For simplicity, we consider only a single parent and single child each of which have a haploid genome. However, even in this restricted case, our approach has the power to improve variant prediction. We present results on both simulated candidate variant regions, parent-child trios from the 1000 Genomes Project, and a subset of the 17 Platinum Genomes.


Asunto(s)
Genoma Humano/genética , Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Variación Estructural del Genoma/genética , Humanos
7.
BMC Genomics ; 21(Suppl 6): 405, 2020 Dec 21.
Artículo en Inglés | MEDLINE | ID: mdl-33349236

RESUMEN

BACKGROUND: Analysis of heterogeneous populations such as viral quasispecies is one of the most challenging bioinformatics problems. Although machine learning models are becoming to be widely employed for analysis of sequence data from such populations, their straightforward application is impeded by multiple challenges associated with technological limitations and biases, difficulty of selection of relevant features and need to compare genomic datasets of different sizes and structures. RESULTS: We propose a novel preprocessing approach to transform irregular genomic data into normalized image data. Such representation allows to restate the problems of classification and comparison of heterogeneous populations as image classification problems which can be solved using variety of available machine learning tools. We then apply the proposed approach to two important problems in molecular epidemiology: inference of viral infection stage and detection of viral transmission clusters using next-generation sequencing data. The infection staging method has been applied to HCV HVR1 samples collected from 108 recently and 257 chronically infected individuals. The SVM-based image classification approach achieved more than 95% accuracy for both recently and chronically HCV-infected individuals. Clustering has been performed on the data collected from 33 epidemiologically curated outbreaks, yielding more than 97% accuracy. CONCLUSIONS: Sequence image normalization method allows for a robust conversion of genomic data into numerical data and overcomes several issues associated with employing machine learning methods to viral populations. Image data also help in the visualization of genomic data. Experimental results demonstrate that the proposed method can be successfully applied to different problems in molecular epidemiology and surveillance of viral diseases. Simple binary classifiers and clustering techniques applied to the image data are equally or more accurate than other models.


Asunto(s)
Genómica , Aprendizaje Automático , Algoritmos , Análisis por Conglomerados , Biología Computacional , Humanos , Cuasiespecies
8.
Stat Appl Genet Mol Biol ; 18(4)2019 05 30.
Artículo en Inglés | MEDLINE | ID: mdl-31145697

RESUMEN

Modeling the high-throughput next generation sequencing (NGS) data, resulting from experiments with the goal of profiling tumor and control samples for the study of DNA copy number variants (CNVs), remains to be a challenge in various ways. In this application work, we provide an efficient method for detecting multiple CNVs using NGS reads ratio data. This method is based on a multiple statistical change-points model with the penalized regression approach, 1d fused LASSO, that is designed for ordered data in a one-dimensional structure. In addition, since the path algorithm traces the solution as a function of a tuning parameter, the number and locations of potential CNV region boundaries can be estimated simultaneously in an efficient way. For tuning parameter selection, we then propose a new modified Bayesian information criterion, called JMIC, and compare the proposed JMIC with three different Bayes information criteria used in the literature. Simulation results have shown the better performance of JMIC for tuning parameter selection, in comparison with the other three criterion. We applied our approach to the sequencing data of reads ratio between the breast tumor cell lines HCC1954 and its matched normal cell line BL 1954 and the results are in-line with those discovered in the literature.


Asunto(s)
Variaciones en el Número de Copia de ADN , Genómica/métodos , Análisis de Secuencia de ADN/métodos , Algoritmos , Teorema de Bayes , Línea Celular Tumoral , Simulación por Computador , Genoma Humano , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Análisis de Regresión , Programas Informáticos
9.
BMC Genomics ; 20(Suppl 12): 1001, 2019 Dec 30.
Artículo en Inglés | MEDLINE | ID: mdl-31888490

RESUMEN

BACKGROUND: Inadvertent sample swaps are a real threat to data quality in any medium to large scale omics studies. While matches between samples from the same individual can in principle be identified from a few well characterized single nucleotide polymorphisms (SNPs), omics data types often only provide low to moderate coverage, thus requiring integration of evidence from a large number of SNPs to determine if two samples derive from the same individual or not. METHODS: We select about six thousand SNPs in the human genome and develop a Bayesian framework that is able to robustly identify sample matches between next generation sequencing data sets. RESULTS: We validate our approach on a variety of data sets. Most importantly, we show that our approach can establish identity between different omics data types such as Exome, RNA-Seq, and MethylCap-Seq. We demonstrate how identity detection degrades with sample quality and read coverage, but show that twenty million reads of a fairly low quality RNA-Seq sample are still sufficient for reliable sample identification. CONCLUSION: Our tool, SMASH, is able to identify sample mismatches in next generation sequencing data sets between different sequencing modalities and for low quality sequencing data.


Asunto(s)
Genómica/métodos , Polimorfismo de Nucleótido Simple/genética , Programas Informáticos , Teorema de Bayes , Genoma Humano/genética , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Reproducibilidad de los Resultados , Análisis de Secuencia de ADN
10.
BMC Bioinformatics ; 19(Suppl 4): 79, 2018 05 08.
Artículo en Inglés | MEDLINE | ID: mdl-29745849

RESUMEN

BACKGROUND: As one possible solution to the "missing heritability" problem, many methods have been proposed that apply pathway-based analyses, using rare variants that are detected by next generation sequencing technology. However, while a number of methods for pathway-based rare-variant analysis of multiple phenotypes have been proposed, no method considers a unified model that incorporate multiple pathways. RESULTS: Simulation studies successfully demonstrated advantages of multivariate analysis, compared to univariate analysis, and comparison studies showed the proposed approach to outperform existing methods. Moreover, real data analysis of six type 2 diabetes-related traits, using large-scale whole exome sequencing data, identified significant pathways that were not found by univariate analysis. Furthermore, strong relationships between the identified pathways, and their associated metabolic disorder risk factors, were found via literature search, and one of the identified pathway, was successfully replicated by an analysis with an independent dataset. CONCLUSIONS: Herein, we present a powerful, pathway-based approach to investigate associations between multiple pathways and multiple phenotypes. By reflecting the natural hierarchy of biological behavior, and considering correlation between pathways and phenotypes, the proposed method is capable of analyzing multiple phenotypes and multiple pathways simultaneously.


Asunto(s)
Variación Genética , Transducción de Señal/genética , Algoritmos , Simulación por Computador , Bases de Datos Genéticas , Diabetes Mellitus Tipo 2/genética , Exoma/genética , Humanos , Modelos Genéticos , Análisis Multivariante , Fenotipo
11.
Genet Epidemiol ; 41(4): 363-371, 2017 05.
Artículo en Inglés | MEDLINE | ID: mdl-28300291

RESUMEN

Recent advances in genotyping with high-density markers allow researchers access to genomic variants including rare ones. Linkage disequilibrium (LD) is widely used to provide insight into evolutionary history. It is also the basis for association mapping in humans and other species. Better understanding of the genomic LD structure may lead to better-informed statistical tests that can improve the power of association studies. Although rare variant associations with common diseases (RVCD) have been extensively studied recently, there is very limited understanding, and even controversial view of LD structures among rare variants and between rare and common variants. In fact, many popular RVCD tests make the assumptions that rare variants are independent. In this report, we show that two commonly used LD measures are not capable of detecting LD when rare variants are involved. We present this argument from two perspectives, both the LD measures themselves and the computational issues associated with them. To address these issues, we propose an alternative LD measure, the polychoric correlation, that was originally designed for detecting associations among categorical variables. Using simulated as well as the 1000 Genomes data, we explore the performances of LD measures in detail and discuss their implications in association studies.


Asunto(s)
Variación Genética , Estudio de Asociación del Genoma Completo , Cromosomas Humanos Par 21/genética , Simulación por Computador , Frecuencia de los Genes/genética , Genotipo , Humanos , Desequilibrio de Ligamiento/genética , Polimorfismo de Nucleótido Simple/genética
12.
BMC Bioinformatics ; 18(1): 426, 2017 Sep 26.
Artículo en Inglés | MEDLINE | ID: mdl-28950836

RESUMEN

BACKGROUND: Constructing alignments and phylogenies for a given locus from large genome sequencing studies with relevant outgroups allow novel evolutionary and anthropological insights. However, no user-friendly tool has been developed to integrate thousands of recently available and anthropologically relevant genome sequences to construct complete sequence alignments and phylogenies. RESULTS: Here, we provide VCFtoTree, a user friendly tool with a graphical user interface that directly accesses online databases to download, parse and analyze genome variation data for regions of interest. Our pipeline combines popular sequence datasets and tree building algorithms with custom data parsing to generate accurate alignments and phylogenies using all the individuals from the 1000 Genomes Project, Neanderthal and Denisovan genomes, as well as reference genomes of Chimpanzee and Rhesus Macaque. It can also be applied to other phased human genomes, as well as genomes from other species. The output of our pipeline includes an alignment in FASTA format and a tree file in newick format. CONCLUSION: VCFtoTree fulfills the increasing demand for constructing alignments and phylogenies for a given loci from thousands of available genomes. Our software provides a user friendly interface for a wider audience without prerequisite knowledge in programming. VCFtoTree can be accessed from https://github.com/duoduoo/VCFtoTree_3.0.0 .


Asunto(s)
Sitios Genéticos , Genoma Humano , Filogenia , Alineación de Secuencia/métodos , Programas Informáticos , Algoritmos , Animales , Secuencia de Bases , Humanos , Mutación INDEL/genética , Primates , Análisis de Secuencia de ADN , Interfaz Usuario-Computador
13.
Curr Genomics ; 18(4): 360-365, 2017 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-29081691

RESUMEN

BACKGROUND: Recently, identification and functional studies of circular RNAs, a type of non-coding RNAs arising from a ligation of 3' and 5' ends of a linear RNA molecule, were conducted in mammalian cells with the development of RNA-seq technology. METHOD: Since compared with animals, studies on circular RNAs in plants are less thorough, a genome-wide identification of circular RNA candidates in Arabidopsis was conducted with our own developed bioinformatics tool to several existing RNA-seq datasets specifically for non-coding RNAs. RESULTS: A total of 164 circular RNA candidates were identified from RNA-seq data, and 4 circular RNA transcripts, including both exonic and intronic circular RNAs, were experimentally validated. Interestingly, our results show that circular RNA transcripts are enriched in the photosynthesis system for the leaf tissue and correlated to the higher expression levels of their parent genes. Sixteen out of all 40 genes that have circular RNA candidates are related to the photosynthesis system, and out of the total 146 exonic circular RNA candidates, 63 are found in chloroplast.

14.
Ann Hum Genet ; 79(3): 199-208, 2015 May.
Artículo en Inglés | MEDLINE | ID: mdl-25875492

RESUMEN

Because next generation sequencing technology that can rapidly genotype most genetic variations genome, there is considerable interest in investigating the effects of rare variants on complex diseases. In this paper, we propose four Kullback-Leibler distance-based Tests (KLTs) for detecting genotypic differences between cases and controls. There are several features that set the proposed tests apart from existing ones. First, by explicitly considering and comparing the distributions of genotypes, existence of variants with opposite directional effects does not compromise the power of KLTs. Second, it is not necessary to set a threshold for rare variants as the KL definition makes it reasonable to consider rare and common variants together without worrying about the contribution from one type overshadowing the other. Third, KLTs are robust to null variants thanks to a built-in noise fighting mechanism. Finally, correlation among variants is taken into account implicitly so the KLTs work well regardless of the underlying LD structure. Through extensive simulations, we demonstrated good performance of KLTs compared to the sum of squared score test (SSU) and optimal sequence kernel association test (SKAT-O). Moreover, application to the Dallas Heart Study data illustrates the feasibility and performance of KLTs in a realistic setting.


Asunto(s)
Estudio de Asociación del Genoma Completo/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos , Simulación por Computador , Variación Genética , Genotipo , Humanos , Modelos Genéticos
15.
Biometrics ; 70(2): 430-40, 2014 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-24571656

RESUMEN

The photoactivatable ribonucleoside enhanced cross-linking immunoprecipitation (PAR-CLIP) has been increasingly used for the global mapping of RNA-protein interaction sites. There are two key features of the PAR-CLIP experiments: The sequence read tags are likely to form an enriched peak around each RNA-protein interaction site; and the cross-linking procedure is likely to introduce a specific mutation in each sequence read tag at the interaction site. Several ad hoc methods have been developed to identify the RNA-protein interaction sites using either sequence read counts or mutation counts alone; however, rigorous statistical methods for analyzing PAR-CLIP are still lacking. In this article, we propose an integrative model to establish a joint distribution of observed read and mutation counts. To pinpoint the interaction sites at single base-pair resolution, we developed a novel modeling approach that adopts non-homogeneous hidden Markov models to incorporate the nucleotide sequence at each genomic location. Both simulation studies and data application showed that our method outperforms the ad hoc methods, and provides reliable inferences for the RNA-protein binding sites from PAR-CLIP data.


Asunto(s)
Modelos Estadísticos , Proteínas/química , ARN/química , Teorema de Bayes , Sitios de Unión , Biometría/métodos , Simulación por Computador , Reactivos de Enlaces Cruzados , Proteína de la Discapacidad Intelectual del Síndrome del Cromosoma X Frágil/química , Proteína de la Discapacidad Intelectual del Síndrome del Cromosoma X Frágil/metabolismo , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Inmunoprecipitación , Cadenas de Markov , Conformación de Ácido Nucleico , Estructura Secundaria de Proteína , Proteínas/metabolismo , ARN/genética , ARN/metabolismo , Proteína FUS de Unión a ARN/química , Proteína FUS de Unión a ARN/metabolismo , Análisis de Secuencia de ARN
16.
Genes Genet Syst ; 2024 Oct 28.
Artículo en Inglés | MEDLINE | ID: mdl-39462538

RESUMEN

Next-generation sequencing (NGS) has become widely available and is routinely used in basic research and clinical practice. The reference genome sequence is an essential resource for NGS analysis, and several population-specific reference genomes have recently been constructed to provide a choice to deal with the vast genetic diversity of human samples. However, resources supporting population-specific references are insufficient, and it is burdensome to perform analysis using these reference genomes. Here, we constructed a set of resources to support NGS analysis using the Japanese reference genome, JG. We created resources for variant calling, variant-effect prediction, gene and repeat element annotations, read mappability, and RNA-seq analysis. We also provide a resource for reference coordinate conversion for further annotation enrichment. We then provide a variant calling protocol with JG. Our resources provide a guide to prepare sufficient resources for the use of population-specific reference genomes and can facilitate the migration of reference genomes.

17.
Stud Health Technol Inform ; 305: 194-197, 2023 Jun 29.
Artículo en Inglés | MEDLINE | ID: mdl-37386994

RESUMEN

The paper presents a current situation of the FHIR Genomics resource and an assessment of FAIR data usage and possible future directions. FHIR Genomics forges a path towards data interoperability. By integrating both the FAIR principles and the FHIR resources, we can achieve a higher standardization across healthcare data collection and a smoother data exchange. By exemplifying on the FHIR Genomics resource, we want to pave the way towards the integration of genomic data into an Obstetrics-Gynecology Information system as a future direction to be able to identify possible disease predisposition in fetus.


Asunto(s)
Ginecología , Obstetricia , Femenino , Embarazo , Humanos , Genómica , Recolección de Datos , Feto
18.
Comput Struct Biotechnol J ; 21: 5382-5393, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-38022693

RESUMEN

Analysis and interpretation of high-throughput transcriptional and chromatin accessibility data at single-cell (sc) resolution are still open challenges in the biomedical field. The existence of countless bioinformatics tools, for the different analytical steps, increases the complexity of data interpretation and the difficulty to derive biological insights. In this article, we present SCALA, a bioinformatics tool for analysis and visualization of single-cell RNA sequencing (scRNA-seq) and Assay for Transposase-Accessible Chromatin using sequencing (scATAC-seq) datasets, enabling either independent or integrative analysis of the two modalities. SCALA combines standard types of analysis by integrating multiple software packages varying from quality control to the identification of distinct cell populations and cell states. Additional analysis options enable functional enrichment, cellular trajectory inference, ligand-receptor analysis, and regulatory network reconstruction. SCALA is fully parameterizable, presenting data in tabular format and producing publication-ready visualizations. The different available analysis modules can aid biomedical researchers in exploring, analyzing, and visualizing their data without any prior experience in coding. We demonstrate the functionality of SCALA through two use-cases related to TNF-driven arthritic mice, handling both scRNA-seq and scATAC-seq datasets. SCALA is developed in R, Shiny and JavaScript and is mainly available as a standalone version, while an online service of more limited capacity can be found at http://scala.pavlopouloslab.info or https://scala.fleming.gr.

19.
Front Immunol ; 14: 1146826, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37180102

RESUMEN

The human leukocyte antigen (HLA) locus plays a central role in adaptive immune function and has significant clinical implications for tissue transplant compatibility and allelic disease associations. Studies using bulk-cell RNA sequencing have demonstrated that HLA transcription may be regulated in an allele-specific manner and single-cell RNA sequencing (scRNA-seq) has the potential to better characterize these expression patterns. However, quantification of allele-specific expression (ASE) for HLA loci requires sample-specific reference genotyping due to extensive polymorphism. While genotype prediction from bulk RNA sequencing is well described, the feasibility of predicting HLA genotypes directly from single-cell data is unknown. Here we evaluate and expand upon several computational HLA genotyping tools by comparing predictions from human single-cell data to gold-standard, molecular genotyping. The highest 2-field accuracy averaged across all loci was 76% by arcasHLA and increased to 86% using a composite model of multiple genotyping tools. We also developed a highly accurate model (AUC 0.93) for predicting HLA-DRB345 copy number in order to improve genotyping accuracy of the HLA-DRB locus. Genotyping accuracy improved with read depth and was reproducible at repeat sampling. Using a metanalytic approach, we also show that HLA genotypes from PHLAT and OptiType can generate ASE ratios that are highly correlated (R2 = 0.8 and 0.94, respectively) with those derived from gold-standard genotyping.


Asunto(s)
Antígenos HLA , Transcriptoma , Humanos , Análisis de Secuencia de ADN , Antígenos HLA/genética , Antígenos de Histocompatibilidad Clase I/genética , Genotipo , Antígenos de Histocompatibilidad Clase II/genética
20.
Mol Ecol Resour ; 2022 Dec 02.
Artículo en Inglés | MEDLINE | ID: mdl-36458971

RESUMEN

Polyploids are cells or organisms with a genome consisting of more than two sets of homologous chromosomes. Polyploid plants have important traits that facilitate speciation and are thus often model systems for evolutionary, molecular ecology and agricultural studies. However, due to their unusual mode of inheritance and double-reduction, diploid models of population genetic analysis cannot properly be applied to autopolyploids. To overcome this problem, we developed a software package entitled vcfpop to perform a variety of population genetic analyses for autopolyploids, such as parentage analysis, analysis of molecular variance, principal coordinates analysis, hierarchical clustering analysis and Bayesian clustering. We used three data sets to evaluate the capability of vcfpop to analyse large data sets on a desktop computer. This software is freely available at http://github.com/huangkang1987/vcfpop.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA