Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 89
Filtrar
1.
BMC Bioinformatics ; 25(1): 290, 2024 Sep 03.
Artículo en Inglés | MEDLINE | ID: mdl-39227760

RESUMEN

BACKGROUND: Advancements over the past decade in DNA sequencing technology and computing power have created the potential to revolutionize medicine. There has been a marked increase in genetic data available, allowing for the advancement of areas such as personalized medicine. A crucial type of data in this context is genetic variant data which is stored in variant call format (VCF) files. However, the rapid growth in genomics has presented challenges in analyzing and comparing VCF files. RESULTS: In response to the limitations of existing tools, this paper introduces a novel web application that provides a user-friendly solution for VCF file analyses and comparisons. The software tool enables researchers and clinicians to perform high-level analysis with ease and enhances productivity. The application's interface allows users to conveniently upload, analyze, and visualize their VCF files using simple drag-and-drop and point-and-click operations. Essential visualizations such as Venn diagrams, clustergrams, and precision-recall plots are provided to users. A key feature of the application is its support for metadata-based file grouping, accomplished through flexible data matrix uploads, streamlining organization and analysis of user-defined categories. Additionally, the application facilitates standardized benchmarking of VCF files by integrating user-provided ground truth regions and variant lists. CONCLUSIONS: By providing a user-friendly interface and supporting essential visualizations, this software enhances the accessibility of VCF file analysis and assists researchers and clinicians in their scientific inquiries.


Asunto(s)
Programas Informáticos , Genómica/métodos , Interfaz Usuario-Computador , Humanos , Variación Genética
2.
BMC Bioinformatics ; 25(1): 288, 2024 Sep 03.
Artículo en Inglés | MEDLINE | ID: mdl-39227781

RESUMEN

BACKGROUND: The variant call format (VCF) file is a structured and comprehensive text file crucial for researchers and clinicians in interpreting and understanding genomic variation data. It contains essential information about variant positions in the genome, along with alleles, genotype calls, and quality scores. Analyzing and visualizing these files, however, poses significant challenges due to the need for diverse resources and robust features for in-depth exploration. RESULTS: To address these challenges, we introduce variant graph craft (VGC), a VCF file visualization and analysis tool. VGC offers a wide range of features for exploring genetic variations, including extraction of variant data, intuitive visualization, and graphical representation of samples with genotype information. VGC is designed primarily for the analysis of patient cohorts, but it can also be adapted for use with individual probands or families. It integrates seamlessly with external resources, providing insights into gene function and variant frequencies in sample data. VGC includes gene function and pathway information from Molecular Signatures Database (MSigDB) for GO terms, KEGG, Biocarta, Pathway Interaction Database, and Reactome. Additionally, it dynamically links to gnomAD for variant information and incorporates ClinVar data for pathogenic variant information. VGC supports the Human Genome Assembly Hg37 and Hg38, ensuring compatibility with a wide range of data sets, and accommodates various approaches to exploring genetic variation data. It can be tailored to specific user needs with optional phenotype input data. CONCLUSIONS: In summary, VGC provides a comprehensive set of features tailored to researchers working with genomic variation data. Its intuitive interface, rapid filtering capabilities, and the flexibility to perform queries using custom groups make it an effective tool in identifying variants potentially associated with diseases. VGC operates locally, ensuring data security and privacy by eliminating the need for cloud-based VCF uploads, making it a secure and user-friendly tool. It is freely available at https://github.com/alperuzun/VGC .


Asunto(s)
Variación Genética , Programas Informáticos , Humanos , Variación Genética/genética , Bases de Datos Genéticas , Genómica/métodos , Genotipo
3.
Data Brief ; 55: 110759, 2024 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-39169997

RESUMEN

Forty-five accessions of the genus Phaseolus from the orthodox seed collection of the National Center for Genetic Resources (CNRG) of the National Institute of Forestry, Agricultural, and Livestock Research (INIFAP) of Mexico were sequenced using RADseq. The species utilized were: P. acutifolius (14), P. coccineus (12), P. lunatus (8), P. dumosus (6), P. leptostachyus (2), P. filiformis (2), and P. vulgaris (1). A variant call file (VCF) was generated using GATK with the P. vulgaris reference genome GCF_000499845.1, identifying 97,103 shared SNPs among the species. These data have the potential to be used for studies of genetic diversity intra and interspecies, phylogeny, evolution, genetic resource conservation, and agricultural improvement.

4.
Gigascience ; 132024 01 02.
Artículo en Inglés | MEDLINE | ID: mdl-39028587

RESUMEN

BACKGROUND: With the rise of large-scale genome sequencing projects, genotyping of thousands of samples has produced immense variant call format (VCF) files. It is becoming increasingly challenging to store, transfer, and analyze these voluminous files. Compression methods have been used to tackle these issues, aiming for both high compression ratio and fast random access. However, existing methods have not yet achieved a satisfactory compromise between these 2 objectives. FINDINGS: To address the aforementioned issue, we introduce GSC (Genotype Sparse Compression), a specialized and refined lossless compression tool for VCF files. In benchmark tests conducted across various open-source datasets, GSC showcased exceptional performance in genotype data compression. Compared with the industry's most advanced tools (namely, GBC and GTC), GSC achieved compression ratios that were higher by 26.9% to 82.4% over GBC and GTC on the datasets, respectively. In lossless compression scenarios, GSC also demonstrated robust performance, with compression ratios 1.5× to 6.5× greater than general-purpose tools like gzip, zstd, and BCFtools-a mode not supported by either GBC or GTC. Achieving such high compression ratios did require some reasonable trade-offs, including longer decompression times, with GSC being 1.2× to 2× slower than GBC, yet 1.1× to 1.4× faster than GTC. Moreover, GSC maintained decompression query speeds that were equivalent to its competitors. In terms of RAM usage, GSC outperformed both counterparts. Overall, GSC's comprehensive performance surpasses that of the most advanced technologies. CONCLUSION: GSC balances high compression ratios with rapid data access, enhancing genomic data management. It supports seamless PLINK binary format conversion, simplifying downstream analysis.


Asunto(s)
Compresión de Datos , Programas Informáticos , Compresión de Datos/métodos , Humanos , Genotipo , Biología Computacional/métodos , Algoritmos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos
5.
Diagnostics (Basel) ; 14(9)2024 Apr 29.
Artículo en Inglés | MEDLINE | ID: mdl-38732343

RESUMEN

BACKGROUND AND OBJECTIVE: The symptoms of most neurodegenerative diseases, including Parkinson's disease (PD), usually do not occur until substantial neuronal loss occurs. This makes the process of early diagnosis very challenging. Hence, this research used variant call format (VCF) analysis to detect variants and novel genes that could be used as prognostic indicators in the early diagnosis of prodromal PD. MATERIALS AND METHODS: Data were obtained from the Parkinson's Progression Markers Initiative (PPMI), and we analyzed prodromal patients with gVCF data collected in the 2021 cohort. A total of 304 participants were included, including 100 healthy controls, 146 prodromal genetic individuals, 21 prodromal hyposmia individuals, and 37 prodromal individuals with RBD. A pipeline was developed to process the samples from gVCF to reach variant annotation and pathway and disease association analysis. RESULTS: Novel variant percentages were detected in the analyzed prodromal subgroups. The prodromal subgroup analysis revealed novel variations of 1.0%, 1.2%, 0.6%, 0.3%, 0.5%, and 0.4% for the genetic male, genetic female, hyposmia male, hyposmia female, RBD male, and RBD female groups, respectively. Interestingly, 12 potentially novel loci (MTF2, PIK3CA, ADD1, SYBU, IRS2, USP8, PIGL, FASN, MYLK2, USP25, EP300, and PPP6R2) that were recently detected in PD patients were detected in the prodromal stage of PD. CONCLUSIONS: Genetic biomarkers are crucial for the early detection of Parkinson's disease and its prodromal stage. The novel PD genes detected in prodromal patients could aid in the use of gene biomarkers for early diagnosis of the prodromal stage without relying only on phenotypic traits.

6.
BMC Bioinformatics ; 25(1): 173, 2024 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-38693489

RESUMEN

Principal component analysis (PCA) is an important and widely used unsupervised learning method that determines population structure based on genetic variation. Genome sequencing of thousands of individuals usually generate tens of millions of SNPs, making it challenging for PCA analysis and interpretation. Here we present VCF2PCACluster, a simple, fast and memory-efficient tool for Kinship estimation, PCA and clustering analysis, and visualization based on VCF formatted SNPs. We implemented five Kinship estimation methods and three clustering methods for its users to choose from. Moreover, unlike other PCA tools, VCF2PCACluster possesses a clustering function based on PCA result, which enabling users to automatically and clearly know about population structure. We demonstrated the same accuracy but a higher performance of this tool in performing PCA analysis on tens of millions of SNPs compared to another popular PLINK2 software, especially in peak memory usage that is independent of the number of SNPs in VCF2PCACluster.


Asunto(s)
Polimorfismo de Nucleótido Simple , Análisis de Componente Principal , Programas Informáticos , Análisis por Conglomerados , Humanos
7.
Front Oncol ; 14: 1291055, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38665945

RESUMEN

Background: Multiple myeloma is diagnosed in 5,800 people in the United Kingdom (UK) each year with up to 64% having vertebral compression fractures at the time of diagnosis. Painful vertebral compression fractures can be of significant detriment to patients' quality of life. Percutaneous vertebroplasty aims to provide long-term pain relief and stabilize fractured vertebrae. Methods and materials: Data was collected from all cases of percutaneous vertebroplasty performed on patients with multiple myeloma from November 2017 to January 2019. Pain scores were measured using the Visual Analogue Scale (VAS) and Oswestry Disability Index (ODI) pre-procedure, 2 months post procedure and 4 years post-procedure. Procedure related complications and analgesia use were also documented. Results: 22 patients were included with a total of 119 vertebrae treated. Patients reported a significant improvement in overall pain score with a median pre-procedure VAS of 8 and a median post-procedure VAS of 3.5 (p<0.0001). There was a median pre-procedure ODI score of 60% and a median post-procedure ODI score of 36% (p<0000.1). There was improvement across all ODI domains and a 77% reduction in analgesic requirement. There were small cement leaks into paravertebral veins or endplates at 15 levels (12%) which were asymptomatic. There were 8 responders to the long-term follow-up questionnaire at 4 years. This demonstrated an overall stable degree of pain relief in responders with a median VAS of 3.5 and median ODI of 30%. Conclusion: At this center, vertebroplasty has been shown to reduce both VAS and ODI pain scores and reduce analgesia requirements in patients with VCFs secondary to multiple myeloma with long lasting relief at 4 years post-procedure.

8.
BMC Bioinformatics ; 25(1): 68, 2024 Feb 14.
Artículo en Inglés | MEDLINE | ID: mdl-38350858

RESUMEN

BACKGROUND: The advent of Next-Generation Sequencing (NGS) has catalyzed a paradigm shift in medical genetics, enabling the identification of disease-associated variants. However, the vast quantum of data produced by NGS necessitates a robust and dependable mechanism for filtering irrelevant variants. Annotation-based variant filtering, a pivotal step in this process, demands a profound understanding of the case-specific conditions and the relevant annotation instruments. To tackle this complex task, we sought to design an accessible, efficient and more importantly easy to understand variant filtering tool. RESULTS: Our efforts culminated in the creation of 123VCF, a tool capable of processing both compressed and uncompressed Variant Calling Format (VCF) files. Built on a Java framework, the tool employs a disk-streaming real-time filtering algorithm, allowing it to manage sizable variant files on conventional desktop computers. 123VCF filters input variants in accordance with a predefined filter sequence applied to the input variants. Users are provided the flexibility to define various filtering parameters, such as quality, coverage depth, and variant frequency within the populations. Additionally, 123VCF accommodates user-defined filters tailored to specific case requirements, affording users enhanced control over the filtering process. We evaluated the performance of 123VCF by analyzing different types of variant files and comparing its runtimes to the most similar algorithms like BCFtools filter and GATK VariantFiltration. The results indicated that 123VCF performs relatively well. The tool's intuitive interface and potential for reproducibility make it a valuable asset for both researchers and clinicians. CONCLUSION: The 123VCF filtering tool provides an effective, dependable approach for filtering variants in both research and clinical settings. As an open-source tool available at https://project123vcf.sourceforge.io , it is accessible to the global scientific and clinical community, paving the way for the discovery of disease-causing variants and facilitating the advancement of personalized medicine.


Asunto(s)
Algoritmos , Programas Informáticos , Reproducibilidad de los Resultados , Secuenciación de Nucleótidos de Alto Rendimiento
9.
Cancers (Basel) ; 15(24)2023 Dec 12.
Artículo en Inglés | MEDLINE | ID: mdl-38136357

RESUMEN

(1) Purpose: To assess the safety and effectivity of stereotactic body radiotherapy (SBRT) on spinal metastases utilizing a simultaneous integrated boost (SIB) concept in oligometastatic cancer patients. (2) Methods: 62 consecutive patients with 71 spinal metastases received SIB-SBRT between 01/2013 and 09/2022 at our institution. We retrospectively analyzed toxicity, local tumor control (LC), and progression-free (PFS) and overall survival (OS) following SIB-SBRT and assessed possible influencing factors (Kaplan-Meier estimator, log-rank test and Cox proportional-hazards model). (3) Results: SIB-SBRT was delivered in five fractions, mostly with 25/40 Gy (n = 43; 60.56%) and 25/35 Gy (n = 19, 26.76%). Estimated rates of freedom from VCF were 96.1/90.4% at one/two years. VCF development was significantly associated with osteoporosis (p < 0.001). No ≥ grade III acute and one grade III late toxicity (VCF) were observed. Estimated LC rates at one/two years were 98.6/96.4%, and histology was significantly associated with local treatment failure (p = 0.039). Median PFS/OS was 10 months (95% CI 6.01-13.99)/not reached. Development of metastases ≥ one year after initial diagnosis and Karnofsky Performance Score ≥ 90% were predictors for superior PFS (p = 0.038) and OS (p = 0.012), respectively. (4) Conclusion: Spinal SIB-SBRT yields low toxicity and excellent LC. It may be utilized in selected oligometastatic patients to improve prognosis. To the best of our knowledge, we provide the first clinical data on the toxicity and effectivity of SIB-SBRT in spinal metastases in a larger patient cohort.

11.
BMC Bioinformatics ; 24(1): 354, 2023 Sep 21.
Artículo en Inglés | MEDLINE | ID: mdl-37735350

RESUMEN

BACKGROUND: Plummeting DNA sequencing cost in recent years has enabled genome sequencing projects to scale up by several orders of magnitude, which is transforming genomics into a highly data-intensive field of research. This development provides the much needed statistical power required for genotype-phenotype predictions in complex diseases. METHODS: In order to efficiently leverage the wealth of information, we here assessed several genomic data science tools. The rationale to focus on on-premise installations is to cope with situations where data confidentiality and compliance regulations etc. rule out cloud based solutions. We established a comprehensive qualitative and quantitative comparison between BCFtools, SnpSift, Hail, GEMINI, and OpenCGA. The tools were compared in terms of data storage technology, query speed, scalability, annotation, data manipulation, visualization, data output representation, and availability. RESULTS: Tools that leverage sophisticated data structures are noted as the most suitable for large-scale projects in varying degrees of scalability in comparison to flat-file manipulation (e.g., BCFtools, and SnpSift). Remarkably, for small to mid-size projects, even lightweight relational database. CONCLUSION: The assessment criteria provide insights into the typical questions posed in scalable genomics and serve as guidance for the development of scalable computational infrastructure in genomics.


Asunto(s)
Ciencia de los Datos , Genómica , Mapeo Cromosómico , Bases de Datos Factuales , Análisis de Secuencia de ADN
12.
J Clin Med ; 12(11)2023 Jun 05.
Artículo en Inglés | MEDLINE | ID: mdl-37298048

RESUMEN

Most studies of vertebral compression fractures (VCF) caused by stereotactic body radiotherapy (SBRT) do not discuss the symptoms of this complication. In this paper, we aimed to determine the rate and prognostic factors of painful VCF caused by SBRT for spinal metastases. Spinal segments with VCF in patients treated with spine SBRT between 2013 and 2021 were retrospectively reviewed. The primary endpoint was the rate of painful VCF (grades 2-3). Patient demographic and clinical characteristics were evaluated as prognosticators. In total, 779 spinal segments in 391 patients were analyzed. The median follow-up after SBRT was 18 (range: 1-107) months. Sixty iatrogenic VCFs (7.7%) were identified. The rate of painful VCF was 2.4% (19/779). Eight (1.0%) VCFs required surgery for internal fixation or spinal canal decompression. The painful VCF rate was significantly higher in patients with no posterolateral tumor involvement than in those with bilateral or unilateral involvement (50% vs. 23%; p = 0.042); it was also higher in patients with spine without fixation than in those with fixation (44% vs. 0%; p < 0.001). Painful VCFs were confirmed in only 2.4% of all the irradiated spinal segments. The absence of posterolateral tumor involvement and no fixation was significantly associated with painful VCF.

13.
BMC Bioinformatics ; 24(1): 121, 2023 Mar 28.
Artículo en Inglés | MEDLINE | ID: mdl-36978010

RESUMEN

BACKGROUND: In recent years, advances in high-throughput sequencing technologies have enabled the use of genomic information in many fields, such as precision medicine, oncology, and food quality control. The amount of genomic data being generated is growing rapidly and is expected to soon surpass the amount of video data. The majority of sequencing experiments, such as genome-wide association studies, have the goal of identifying variations in the gene sequence to better understand phenotypic variations. We present a novel approach for compressing gene sequence variations with random access capability: the Genomic Variant Codec (GVC). We use techniques such as binarization, joint row- and column-wise sorting of blocks of variations, as well as the image compression standard JBIG for efficient entropy coding. RESULTS: Our results show that GVC provides the best trade-off between compression and random access compared to the state of the art: it reduces the genotype information size from 758 GiB down to 890 MiB on the publicly available 1000 Genomes Project (phase 3) data, which is 21% less than the state of the art in random-access capable methods. CONCLUSIONS: By providing the best results in terms of combined random access and compression, GVC facilitates the efficient storage of large collections of gene sequence variations. In particular, the random access capability of GVC enables seamless remote data access and application integration. The software is open source and available at https://github.com/sXperfect/gvc/ .


Asunto(s)
Compresión de Datos , Compresión de Datos/métodos , Algoritmos , Estudio de Asociación del Genoma Completo , Genómica/métodos , Programas Informáticos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos
14.
Orthop Traumatol Surg Res ; 109(2): 103416, 2023 04.
Artículo en Inglés | MEDLINE | ID: mdl-36967702

RESUMEN

INTRODUCTION: Spinal fractures with a split component present specific bone union problems (pseudarthrosis). The purpose of this study was to assess the rate of pseudarthrosis after stand-alone percutaneous kyphoplasties and analyze clinical and radiographic parameters that are predictive of its efficacy in thoracolumbar spine fractures with a split-type of injury. HYPOTHESIS: Stand-alone kyphoplasty results in satisfactory bone union of the treated vertebral body despite the diastasis of fracture fragments. MATERIALS AND METHODS: A retrospective single-center study of 36 patients with posttraumatic monosegmental thoracolumbar vertebral fractures, that were classified as either Magerl A2 or A3.2, without any neurologic deficits. Patients were treated with percutaneous kyphoplasty and PMMA bone cement. The assessment included both clinical (visual analog pain scale [VAS] and Oswestry disability index) and radiographic (pseudarthrosis, fracture gap, disk incarceration, vertebral height and length, and vertebral and regional kyphosis) criteria. RESULTS: A total of 36 patients (mean age 58years) were included, with a mean follow-up of 19.1months. Five of these patients (14%) had a pseudarthrosis. The fracture gap was significantly greater in these patients than in those who had bone union preoperatively (+3.94 mm, p<0.001) and at the last follow-up consultation (+9.3 mm, p<0.001). There was an association between the incarceration of adjacent disks located above (p=0.008) and below (p=0.003) the fracture site and the pseudarthrosis. The mean VAS decreased significantly on the first postoperative day (p<0.001) and remained lower than the initial assessment until the last follow-up (p<0.001). DISCUSSION: Stabilization by stand-alone kyphoplasty produces good clinical and radiographic results for split fractures, provided that the extent of the fragment diastasis has been carefully assessed preoperatively to prevent the risk of pseudarthrosis. LEVEL OF EVIDENCE: IV; retrospective.


Asunto(s)
Fracturas por Compresión , Cifoplastia , Fracturas Osteoporóticas , Seudoartrosis , Fracturas de la Columna Vertebral , Humanos , Persona de Mediana Edad , Cifoplastia/métodos , Estudios Retrospectivos , Seudoartrosis/etiología , Seudoartrosis/cirugía , Resultado del Tratamiento , Fracturas por Compresión/tratamiento farmacológico , Fracturas por Compresión/cirugía , Vértebras Lumbares/diagnóstico por imagen , Vértebras Lumbares/cirugía , Vértebras Lumbares/lesiones , Fracturas de la Columna Vertebral/diagnóstico por imagen , Fracturas de la Columna Vertebral/cirugía , Cementos para Huesos/uso terapéutico , Fracturas Osteoporóticas/cirugía
15.
BMC Musculoskelet Disord ; 24(1): 165, 2023 Mar 06.
Artículo en Inglés | MEDLINE | ID: mdl-36879285

RESUMEN

BACKGROUND: We evaluated the diagnostic efficacy of deep learning radiomics (DLR) and hand-crafted radiomics (HCR) features in differentiating acute and chronic vertebral compression fractures (VCFs). METHODS: A total of 365 patients with VCFs were retrospectively analysed based on their computed tomography (CT) scan data. All patients completed MRI examination within 2 weeks. There were 315 acute VCFs and 205 chronic VCFs. Deep transfer learning (DTL) features and HCR features were extracted from CT images of patients with VCFs using DLR and traditional radiomics, respectively, and feature fusion was performed to establish the least absolute shrinkage and selection operator. The MRI display of vertebral bone marrow oedema was used as the gold standard for acute VCF, and the model performance was evaluated using the receiver operating characteristic (ROC).To separately evaluate the effectiveness of DLR, traditional radiomics and feature fusion in the differential diagnosis of acute and chronic VCFs, we constructed a nomogram based on the clinical baseline data to visualize the classification evaluation. The predictive power of each model was compared using the Delong test, and the clinical value of the nomogram was evaluated using decision curve analysis (DCA). RESULTS: Fifty DTL features were obtained from DLR, 41 HCR features were obtained from traditional radiomics, and 77 features fusion were obtained after feature screening and fusion of the two. The area under the curve (AUC) of the DLR model in the training cohort and test cohort were 0.992 (95% confidence interval (CI), 0.983-0.999) and 0.871 (95% CI, 0.805-0.938), respectively. While the AUCs of the conventional radiomics model in the training cohort and test cohort were 0.973 (95% CI, 0.955-0.990) and 0.854 (95% CI, 0.773-0.934), respectively. The AUCs of the features fusion model in the training cohort and test cohort were 0.997 (95% CI, 0.994-0.999) and 0.915 (95% CI, 0.855-0.974), respectively. The AUCs of nomogram constructed by the features fusion in combination with clinical baseline data were 0.998 (95% CI, 0.996-0.999) and 0.946 (95% CI, 0.906-0.987) in the training cohort and test cohort, respectively. The Delong test showed that the differences between the features fusion model and the nomogram in the training cohort and the test cohort were not statistically significant (P values were 0.794 and 0.668, respectively), and the differences in the other prediction models in the training cohort and the test cohort were statistically significant (P < 0.05). DCA showed that the nomogram had high clinical value. CONCLUSION: The features fusion model can be used for the differential diagnosis of acute and chronic VCFs, and its differential diagnosis ability is improved when compared with that when either radiomics is used alone. At the same time, the nomogram has a high predictive value for acute and chronic VCFs and can be a potential decision-making tool to assist clinicians, especially when a patient is unable to undergo spinal MRI examination.


Asunto(s)
Fracturas por Compresión , Fracturas de la Columna Vertebral , Humanos , Fracturas por Compresión/diagnóstico por imagen , Estudios Retrospectivos , Fracturas de la Columna Vertebral/diagnóstico por imagen , Tomografía Computarizada por Rayos X , Aprendizaje Automático
16.
Comput Struct Biotechnol J ; 20: 3729-3733, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35891781

RESUMEN

RNA sequence data are commonly summarized as read counts. By contrast, so far there is no alternative to genotype calling for investigating the relationship between genetic variants determined by next-generation sequencing (NGS) and a phenotype of interest. Here we propose and evaluate the direct analysis of allele counts for genetic association tests. Specifically, we assess the potential advantage of the ratio of alternative allele counts to the total number of reads aligned at a specific position of the genome (coverage) over called genotypes. We simulated association studies based on NGS data from HapMap individuals. Genotype quality scores and allele counts were simulated using NGS data from the Personal Genome Project. Real data from the 1000 Genomes Project was also used to compare the two competing approaches. The average proportions of probability values lower or equal to 0.05 amounted to 0.0496 for called genotypes and 0.0485 for the ratio of alternative allele counts to coverage in the null scenario, and to 0.69 for called genotypes and 0.75 for the ratio of alternative allele counts to coverage in the alternative scenario (9% power increase). The advantage in statistical power of the novel approach increased with decreasing coverage, with decreasing genotype quality and with decreasing allele frequency - 124% power increase for variants with a minor allele frequency lower than 0.05. We provide computer code in R to implement the novel approach, which does not preclude the use of complementary data quality filters before or after identification of the most promising association signals. Author summary: Genetic association tests usually rely on called genotypes. We postulate here that the direct analysis of allele counts from sequence data improves the quality of statistical inference. To evaluate this hypothesis, we investigate simulated and real data using distinct statistical approaches. We demonstrate that association tests based on allele counts rather than called genotypes achieve higher statistical power with controlled type I error rates.

17.
F1000Res ; 112022.
Artículo en Inglés | MEDLINE | ID: mdl-35811804

RESUMEN

In this opinion article, we discuss the formatting of files from (plant) genotyping studies, in particular the formatting of (meta-) data in Variant Call Format (VCF) files. The flexibility of the VCF format specification facilitates its use as a generic interchange format across domains but can lead to inconsistency between files in the presentation of metadata. To enable fully autonomous machine actionable data flow, generic elements need to be further specified. We strongly support the merits of the FAIR principles and see the need to facilitate them also through technical implementation specifications. VCF files are an established standard for the exchange and publication of genotyping data. Other data formats are also used to capture variant call data (for example, the HapMap format and the gVCF format), but none currently have the reach of VCF. In VCF, only the sites of variation are described, whereas in gVCF, all positions are listed, and confidence values are also provided. For the sake of simplicity, we will only discuss VCF and our recommendations for its use. However, the part of the VCF standard relating to metadata (as opposed to the actual variant calls) defines a syntactic format but no vocabulary, unique identifier or recommended content. In practice, often only sparse (if any) descriptive metadata is included. When descriptive metadata is provided, proprietary metadata fields are frequently added that have not been agreed upon within the community which may limit long-term and comprehensive interoperability. To address this, we propose recommendations for supplying and encoding metadata, focusing on use cases from the plant sciences. We expect there to be overlap, but also divergence, with the needs of other domains.


Asunto(s)
Metadatos , Programas Informáticos , Genotipo
18.
Methods Mol Biol ; 2510: 193-216, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35776326

RESUMEN

The long intracellular P2X7 C-terminus accounts for diverse downstream effects of P2X7 activation. Although the recent determination of the cryo-EM structure of the full-length P2X7 receptor finally revealed the structure and several unexpected features of the large cytoplasmic domain, its molecular function remains enigmatic. Incorporation of unnatural amino acids (UAA) via an amber Stop codon has been a powerful tool for structure-function analysis of proteins. Voltage clamp fluorometry (VCF) with the fluorescent unnatural amino acid L-3-(6-acetylnaphthalen-2-ylamino)-2-aminopropanoic acid (ANAP) provides a means to study intracellular domain movements of ion channel receptors. In the Xenopus laevis oocyte expression system, site-specific introduction of this environment-sensitive fluorophore can be achieved by the nuclear injection of cDNA encoding an orthogonal amber suppressor tRNA/aminoacyl-tRNA synthetase pair and subsequent cytoplasmic injection of ANAP together with the respective cRNA containing the amber Stop codon. Here, we describe this protocol for expression of ANAP-labeled P2X7. In addition, we provide a simplified alternative protocol, in which we coinject cRNAs encoding the tRNA synthetase and mutant P2X7 together with the synthesized amber suppressor tRNA and ANAP in one step into the cytosol. We found that the new protocol yielded more reproducible results and was less harmful for the oocytes. By selective fluorescence labeling of the ANAP-labeled P2X7 protein in the oocyte plasma membrane and VCF recordings, we show that this method results in comparable levels of functional ANAP-labeled P2X7 protein.


Asunto(s)
Aminoácidos , Aminoacil-ARNt Sintetasas , Aminoácidos/química , Aminoacil-ARNt Sintetasas/genética , Aminoacil-ARNt Sintetasas/metabolismo , Animales , Codón de Terminación , Oocitos/metabolismo , ARN de Transferencia/genética , Receptores Purinérgicos P2X7/genética , Xenopus laevis/genética , Xenopus laevis/metabolismo
19.
Methods Mol Biol ; 2481: 161-172, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35641764

RESUMEN

Structural variants (SVs) are known to have large functional impacts on phenotypes of agricultural interest, but they have yet to be routinely used for GWAS. Apart from the difficulty in obtaining high-quality SV genotype data for large populations, one of the main hurdles to using SVs for GWAS lies in formatting of genotype data for use with popular GWAS programs. This protocol describes how typical SV genotype data can be formatted for input to three GWAS programs commonly used by the plant genetics community: TASSEL, GAPIT, and mrMLM.


Asunto(s)
Estudio de Asociación del Genoma Completo , Estudio de Asociación del Genoma Completo/métodos , Genotipo , Fenotipo
20.
Brief Bioinform ; 23(3)2022 05 13.
Artículo en Inglés | MEDLINE | ID: mdl-35438138

RESUMEN

Since its launch in 2008, the European Genome-Phenome Archive (EGA) has been leading the archiving and distribution of human identifiable genomic data. In this regard, one of the community concerns is the potential usability of the stored data, as of now, data submitters are not mandated to perform any quality control (QC) before uploading their data and associated metadata information. Here, we present a new File QC Portal developed at EGA, along with QC reports performed and created for 1 694 442 files [Fastq, sequence alignment map (SAM)/binary alignment map (BAM)/CRAM and variant call format (VCF)] submitted at EGA. QC reports allow anonymous EGA users to view summary-level information regarding the files within a specific dataset, such as quality of reads, alignment quality, number and type of variants and other features. Researchers benefit from being able to assess the quality of data prior to the data access decision and thereby, increasing the reusability of data (https://ega-archive.org/blog/data-upcycling-powered-by-ega/).


Asunto(s)
Genoma , Genómica , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Metadatos , Control de Calidad , Programas Informáticos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA