Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 507
Filter
1.
J Med Libr Assoc ; 112(2): 81-87, 2024 Apr 01.
Article in English | MEDLINE | ID: mdl-39119170

ABSTRACT

Background: NYU Langone Health offers a collaborative research block for PGY3 Primary Care residents that employs a secondary data analysis methodology. As discussions of data reuse and secondary data analysis have grown in the data library literature, we sought to understand what attitudes internal medicine residents at a large urban academic medical center had around secondary data analysis. This case report describes a novel survey on resident attitudes around data sharing. Methods: We surveyed internal medicine residents in three tracks: Primary Care (PC), Categorical, and Clinician-Investigator (CI) tracks as part of a larger pilot study on implementation of a research block. All three tracks are in our institution's internal medicine program. In discussions with residency directors and the chief resident, the term "secondary data analysis" was chosen over "data reuse" due to this being more familiar to clinicians, but examples were given to define the concept. Results: We surveyed a population of 162 residents, and 67 residents responded, representing a 41.36% response rate. Strong majorities of residents exhibited positive views of secondary data analysis. Moreover, in our sample, those with exposure to secondary data analysis research opined that secondary data analysis takes less time and is less difficult to conduct compared to the other residents without curricular exposure to secondary analysis. Discussion: The survey reflects that residents believe secondary data analysis is worthwhile and this highlights opportunities for data librarians. As current residents matriculate into professional roles as clinicians, educators, and researchers, libraries have an opportunity to bolster support for data curation and education.


Subject(s)
Attitude of Health Personnel , Internal Medicine , Internship and Residency , Internship and Residency/statistics & numerical data , Humans , Internal Medicine/education , Surveys and Questionnaires , Male , Female , Adult , Information Dissemination/methods
2.
Front Pharmacol ; 15: 1444733, 2024.
Article in English | MEDLINE | ID: mdl-39170704

ABSTRACT

Background and Objective: Chronic atrophic gastritis (CAG) is a complex chronic disease caused by multiple factors that frequently occurs disease in the clinic. The worldwide prevalence of CAG is high. Interestingly, clinical CAG patients often present with a variety of symptom phenotypes, which makes it more difficult for clinicians to treat. Therefore, there is an urgent need to improve our understanding of the complexity of the clinical CAG population, obtain more accurate disease subtypes, and explore the relationship between clinical symptoms and medication. Therefore, based on the integrated platform of complex networks and clinical research, we classified the collected patients with CAG according to their different clinical characteristics and conducted correlation analysis on the classification results to identify more accurate disease subtypes to aid in personalized clinical treatment. Method: Traditional Chinese medicine (TCM) offers an empirical understanding of the clinical subtypes of complicated disorders since TCM therapy is tailored to the patient's symptom profile. We gathered 6,253 TCM clinical electronic medical records (EMRs) from CAG patients and manually annotated, extracted, and preprocessed the data. A shared symptom-patient similarity network (PSN) was created. CAG patient subgroups were established, and their clinical features were determined through enrichment analysis employing community identification methods. Different clinical features of relevant subgroups were correlated based on effectiveness to identify symptom-botanical botanical drugs correspondence. Moreover, network pharmacology was employed to identify possible biological relationships between screened symptoms and medications and to identify various clinical and molecular aspects of the key subtypes using Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis. Results: 5,132 patients were included in the study: 2,699 males (52.60%) and 2,433 females (47.41%). The population was divided into 176 modules. We selected the first 3 modules (M29, M3, and M0) to illustrate the characteristic phenotypes and genotypes of CAG disease subtypes. The M29 subgroup was characterized by gastric fullness disease and internal syndrome of turbidity and poison. The M3 subgroup was characterized by epigastric pain and disharmony between the liver and stomach. The M0 subgroup was characterized by epigastric pain and dampness-heat syndrome. In symptom analysis, The top symptoms for symptom improvement in all three subgroups were stomach pain, bloating, insomnia, poor appetite, and heartburn. However, the three groups were different. The M29 subgroup was more likely to have stomach distention, anorexia, and palpitations. Citrus medica, Solanum nigrum, Jiangcan, Shan ci mushrooms, and Dillon were the most popular botanical drugs. The M3 subgroup has a higher incidence of yellow urine, a bitter tongue, and stomachaches. Smilax glabra, Cyperus rotundus, Angelica sinensis, Conioselinum anthriscoides, and Paeonia lactiflora were the botanical drugs used. Vomiting, nausea, stomach pain, and appetite loss are common in the M0 subgroup. The primary medications are Scutellaria baicalensis, Smilax glabra, Picrorhiza kurroa, Lilium lancifolium, and Artemisia scoparia. Through GO and KEGG pathway analysis, We found that in the M29 subgroup, Citrus medica, Solanum nigrum, Jiangcan, Shan ci mushrooms, and Dillon may exert their therapeutic effects on the symptoms of gastric distension, anorexia, and palpitations by modulating apoptosis and NF-κB signaling pathways. In the M3 subgroup, Smilax glabra, Cyperus rotundus, Angelica sinensis, Conioselinum anthriscoides, and Paeonia lactiflora may be treated by NF-κB and JAK-STAT signaling pathway for the treatment of stomach pain, bitter mouth, and yellow urine. In the M0 subgroup, Scutellaria baicalensis, Smilax glabra, Picrorhiza kurroa, Lilium lancifolium, and Artemisia scoparia may exert their therapeutic effects on poor appetite, stomach pain, vomiting, and nausea through the PI3K-Akt signaling pathway. Conclusion: Based on PSN identification and community detection analysis, CAG population division can provide useful recommendations for clinical CAG treatment. This method is useful for CAG illness classification and genotyping investigations and can be used for other complicated chronic diseases.

3.
Am J Hum Genet ; 2024 Aug 09.
Article in English | MEDLINE | ID: mdl-39142283

ABSTRACT

The ENIGMA research consortium develops and applies methods to determine clinical significance of variants in hereditary breast and ovarian cancer genes. An ENIGMA BRCA1/2 classification sub-group, formed in 2015 as a ClinGen external expert panel, evolved into a ClinGen internal Variant Curation Expert Panel (VCEP) to align with Food and Drug Administration recognized processes for ClinVar contributions. The VCEP reviewed American College of Medical Genetics and Genomics/Association of Molecular Pathology (ACMG/AMP) classification criteria for relevance to interpreting BRCA1 and BRCA2 variants. Statistical methods were used to calibrate evidence strength for different data types. Pilot specifications were tested on 40 variants and documentation revised for clarity and ease of use. The original criterion descriptions for 13 evidence codes were considered non-applicable or overlapping with other criteria. Scenario of use was extended or re-purposed for eight codes. Extensive analysis and/or data review informed specification descriptions and weights for all codes. Specifications were applied to pilot variants with pre-existing ClinVar classification as follows: 13 uncertain significance or conflicting, 14 pathogenic and/or likely pathogenic, and 13 benign and/or likely benign. Review resolved classification for 11/13 uncertain significance or conflicting variants and retained or improved confidence in classification for the remaining variants. Alignment of pre-existing ENIGMA research classification processes with ACMG/AMP classification guidelines highlighted several gaps in the research processes and the baseline ACMG/AMP criteria. Calibration of evidence strength was key to justify utility and strength of different data types for gene-specific application. The gene-specific criteria demonstrated value for improving ACMG/AMP-aligned classification of BRCA1 and BRCA2 variants.

4.
Front Genet ; 15: 1296797, 2024.
Article in English | MEDLINE | ID: mdl-39036704

ABSTRACT

Objective: Fructose-1,6-bisphosphatase deficiency (FBP1D) is a rare inborn error due to mutations in the FBP1 gene. The genetic spectrum of FBP1D in China is unknown, also nonspecific manifestations confuse disease diagnosis. We systematically estimated the FBP1D prevalence in Chinese and explored genotype-phenotype association. Methods: We collected 101 FBP1 variants from our cohort and public resources, and manually curated pathogenicity of these variants. Ninety-seven pathogenic or likely pathogenic variants were used in our cohort to estimate Chinese FBP1D prevalence by three methods: 1) carrier frequency, 2) permutation and combination, 3) Bayesian framework. Allele frequencies (AFs) of these variants in our cohort, China Metabolic Analytics Project (ChinaMAP) and gnomAD were compared to reveal the different hotspots in Chinese and other populations. Clinical and genetic information of 122 FBP1D patients from our cohort and published literature were collected to analyze the genotype-phenotypes association. Phenotypes of 68 hereditary fructose intolerance (HFI) patients from our previous study were used to compare the phenotypic differences between these two fructose metabolism diseases. Results: The estimated Chinese FBP1D prevalence was 1/1,310,034. In the Chinese population, c.490G>A and c.355G>A had significantly higher AFs than in the non-Finland European population, and c.841G>A had significantly lower AF value than in the South Asian population (all p values < 0.05). The genotype-phenotype association analyses showed that patients carrying homozygous c.841G>A were more likely to present increased urinary glycerol, carrying two CNVs (especially homozygous exon1 deletion) were often with hepatic steatosis, carrying compound heterozygous variants were usually with lethargy, and carrying homozygous variants were usually with ketosis and hepatic steatosis (all p values < 0.05). By comparing to phenotypes of HFI patients, FBP1D patients were more likely to present hypoglycemia, metabolic acidosis, and seizures (all p-value < 0.05). Conclusion: The prevalence of FBP1D in the Chinese population is extremely low. Genetic sequencing could effectively help to diagnose FBP1D.

5.
J Cheminform ; 16(1): 82, 2024 Jul 19.
Article in English | MEDLINE | ID: mdl-39030583

ABSTRACT

PURPOSE: Reaction databases are a key resource for a wide variety of applications in computational chemistry and biochemistry, including Computer-aided Synthesis Planning (CASP) and the large-scale analysis of metabolic networks. The full potential of these resources can only be realized if datasets are accurate and complete. Missing co-reactants and co-products, i.e., unbalanced reactions, however, are the rule rather than the exception. The curation and correction of such incomplete entries is thus an urgent need. METHODS: The SynRBL framework addresses this issue with a dual-strategy: a rule-based method for non-carbon compounds, using atomic symbols and counts for prediction, alongside a Maximum Common Subgraph (MCS)-based technique for carbon compounds, aimed at aligning reactants and products to infer missing entities. RESULTS: The rule-based method exceeded 99% accuracy, while MCS-based accuracy varied from 81.19 to 99.33%, depending on reaction properties. Furthermore, an applicability domain and a machine learning scoring function were devised to quantify prediction confidence. The overall efficacy of this framework was delineated through its success rate and accuracy metrics, which spanned from 89.83 to 99.75% and 90.85 to 99.05%, respectively. CONCLUSION: The SynRBL framework offers a novel solution for recalibrating chemical reactions, significantly enhancing reaction completeness. With rigorous validation, it achieved groundbreaking accuracy in reaction rebalancing. This sets the stage for future improvement in particular of atom-atom mapping techniques as well as of downstream tasks such as automated synthesis planning. SCIENTIFIC CONTRIBUTION: SynRBL features a novel computational approach to correcting unbalanced entries in chemical reaction databases. By combining heuristic rules for inferring non-carbon compounds and common subgraph searches to address carbon unbalance, SynRBL successfully addresses most instances of this problem, which affects the majority of data in most large-scale resources. Compared to alternative solutions, SynRBL achieves a dramatic increase in both success rate and accurary, and provides the first freely available open source solution for this problem.

6.
Front Med (Lausanne) ; 11: 1455319, 2024.
Article in English | MEDLINE | ID: mdl-39045419

ABSTRACT

[This corrects the article DOI: 10.3389/fmed.2024.1365501.].

7.
Health Inf Manag ; : 18333583241256049, 2024 Jul 24.
Article in English | MEDLINE | ID: mdl-39045683

ABSTRACT

In 2022 the Australian Data Availability and Transparency Act (DATA) commenced, enabling accredited "data users" to access data from "accredited data service providers." However, the DATA Scheme lacks guidance on "trustworthiness" of the data to be utilised for reuse purposes. Objectives: To determine: (i) Do researchers using government health datasets trust the data? (ii) What factors influence their perceptions of data trustworthiness? and (iii) What are the implications for government and data custodians? Method: Authors of published studies (2008-2020) that utilised Victorian government health datasets were surveyed via a case study approach. Twenty-eight trust constructs (identified via literature review) were grouped into data factors, management properties and provider factors. Results: Fifty experienced health researchers responded. Most (88%) believed that Victorian government health data were trustworthy. When grouped, data factors and management properties were more important than data provider factors in building trust. The most important individual trust constructs were: "compliant with ethical regulation" (100%) and "monitoring privacy and confidentiality" (98%). Constructs of least importance were knowledge of "participant consent" (56%) and "major focus of the data provider was research" (50%). Conclusion: Overall, the researchers trusted government health data, but data factors and data management properties were more important than data provider factors in building trust. Implications: Government should ensure the DATA Scheme incorporates mechanisms to validate those data utilised by accredited data users and data providers have sufficient quality (intrinsic and extrinsic) to meet the requirements of "trustworthiness," and that evidentiary documentation is provided to support these "accredited data."

8.
Ophthalmic Genet ; : 1-7, 2024 Jul 17.
Article in English | MEDLINE | ID: mdl-39016008

ABSTRACT

PURPOSE: The biallelic variant of MAB21L1 has previously been documented in conjunction with the autosomal recessive cerebellar, ocular, craniofacial, and genital syndrome (COFG). The purpose of this study was to investigate the gene-disease association of MAB21L1 and the newly discovered autosomal dominant (AD) microphthalmia. METHODS: We report the presence of an exceptionally rare missense variant in a single allele of the Arg51 codon of MAB21L1 among four individuals from a single family diagnosed with microphthalmia, which suggesting an autosomal dominant inheritance pattern. Subsequently, based on comprehensive literature review, we identified another 13 families that have reported cases of autosomal dominant microphthalmos. RESULTS: Genotype-phenotype analysis revealed that patients with a single allele missense variant in MAB21L1 exhibited solely eye abnormalities. This starkly diverged from the clinical presentation of COFG, typified by the concurrent occurrence of ocular and extraocular symptoms stemming from the biallelic variant in MAB21L1. Our findings revealed that the heterozygous pathogenic variant in MAB21L1 resulted in the emergence of autosomal dominant microphthalmia. By combining these genetic and experimental evidence, the clinical validity of MAB21L1 and the emerging autosomal dominant microphthalmia can be regarded as moderate. CONCLUSION: In summary, there is sufficient convincing evidence to prove that MAB21L1 is a novel pathogenic gene responsible for autosomal dominant microphthalmia, thus offering valuable insights for precise diagnosis and targeted therapeutic interventions in cases of microphthalmia.

9.
BMC Med ; 22(1): 288, 2024 Jul 10.
Article in English | MEDLINE | ID: mdl-38987774

ABSTRACT

BACKGROUND: Ethnicity is known to be an important correlate of health outcomes, particularly during the COVID-19 pandemic, where some ethnic groups were shown to be at higher risk of infection and adverse outcomes. The recording of patients' ethnic groups in primary care can support research and efforts to achieve equity in service provision and outcomes; however, the coding of ethnicity is known to present complex challenges. We therefore set out to describe ethnicity coding in detail with a view to supporting the use of this data in a wide range of settings, as part of wider efforts to robustly describe and define methods of using administrative data. METHODS: We describe the completeness and consistency of primary care ethnicity recording in the OpenSAFELY-TPP database, containing linked primary care and hospital records in > 25 million patients in England. We also compared the ethnic breakdown in OpenSAFELY-TPP with that of the 2021 UK census. RESULTS: 78.2% of patients registered in OpenSAFELY-TPP on 1 January 2022 had their ethnicity recorded in primary care records, rising to 92.5% when supplemented with hospital data. The completeness of ethnicity recording was higher for women than for men. The rate of primary care ethnicity recording ranged from 77% in the South East of England to 82.2% in the West Midlands. Ethnicity recording rates were higher in patients with chronic or other serious health conditions. For each of the five broad ethnicity groups, primary care recorded ethnicity was within 2.9 percentage points of the population rate as recorded in the 2021 Census for England as a whole. For patients with multiple ethnicity records, 98.7% of the latest recorded ethnicities matched the most frequently coded ethnicity. Patients whose latest recorded ethnicity was categorised as Other were most likely to have a discordant ethnicity recording (32.2%). CONCLUSIONS: Primary care ethnicity data in OpenSAFELY is present for over three quarters of all patients, and combined with data from other sources can achieve a high level of completeness. The overall distribution of ethnicities across all English OpenSAFELY-TPP practices was similar to the 2021 Census, with some regional variation. This report identifies the best available codelist for use in OpenSAFELY and similar electronic health record data.


Subject(s)
Ethnicity , Primary Health Care , State Medicine , Adult , Aged , Female , Humans , Male , Middle Aged , Cohort Studies , England , Ethnicity/statistics & numerical data , Primary Health Care/statistics & numerical data , Infant, Newborn , Infant , Child, Preschool , Child , Adolescent , Young Adult , Aged, 80 and over
10.
Front Neuroinform ; 18: 1385526, 2024.
Article in English | MEDLINE | ID: mdl-38828185

ABSTRACT

There is an increasing desire to study neurodevelopmental disorders (NDDs) together to understand commonalities to develop generic health promotion strategies and improve clinical treatment. Common data elements (CDEs) collected across studies involving children with NDDs afford an opportunity to answer clinically meaningful questions. We undertook a retrospective, secondary analysis of data pertaining to sleep in children with different NDDs collected through various research studies. The objective of this paper is to share lessons learned for data management, collation, and harmonization from a sleep study in children within and across NDDs from large, collaborative research networks in the Ontario Brain Institute (OBI). Three collaborative research networks contributed demographic data and data pertaining to sleep, internalizing symptoms, health-related quality of life, and severity of disorder for children with six different NDDs: autism spectrum disorder; attention deficit/hyperactivity disorder; obsessive compulsive disorder; intellectual disability; cerebral palsy; and epilepsy. Procedures for data harmonization, derivations, and merging were shared and examples pertaining to severity of disorder and sleep disturbances were described in detail. Important lessons emerged from data harmonizing procedures: prioritizing the collection of CDEs to ensure data completeness; ensuring unprocessed data are uploaded for harmonization in order to facilitate timely analytic procedures; the value of maintaining variable naming that is consistent with data dictionaries at time of project validation; and the value of regular meetings with the research networks to discuss and overcome challenges with data harmonization. Buy-in from all research networks involved at study inception and oversight from a centralized infrastructure (OBI) identified the importance of collaboration to collect CDEs and facilitate data harmonization to improve outcomes for children with NDDs.

11.
J Cheminform ; 16(1): 74, 2024 Jun 27.
Article in English | MEDLINE | ID: mdl-38937840

ABSTRACT

This paper presents AutoTemplate, an innovative data preprocessing protocol, addressing the crucial need for high-quality chemical reaction datasets in the realm of machine learning applications in organic chemistry. Recent advances in artificial intelligence have expanded the application of machine learning in chemistry, particularly in yield prediction, retrosynthesis, and reaction condition prediction. However, the effectiveness of these models hinges on the integrity of chemical reaction datasets, which are often plagued by inconsistencies like missing reactants, incorrect atom mappings, and outright erroneous reactions. AutoTemplate introduces a two-stage approach to refine these datasets. The first stage involves extracting meaningful reaction transformation rules and formulating generic reaction templates using a simplified SMARTS representation. This simplification broadens the applicability of templates across various chemical reactions. The second stage is template-guided reaction curation, where these templates are systematically applied to validate and correct the reaction data. This process effectively amends missing reactant information, rectifies atom-mapping errors, and eliminates incorrect data entries. A standout feature of AutoTemplate is its capability to concurrently identify and correct false chemical reactions. It operates on the premise that most reactions in datasets are accurate, using these as templates to guide the correction of flawed entries. The protocol demonstrates its efficacy across a range of chemical reactions, significantly enhancing dataset quality. This advancement provides a more robust foundation for developing reliable machine learning models in chemistry, thereby improving the accuracy of forward and retrosynthetic predictions. AutoTemplate marks a significant progression in the preprocessing of chemical reaction datasets, bridging a vital gap and facilitating more precise and efficient machine learning applications in organic synthesis. SCIENTIFIC CONTRIBUTION: The proposed automated preprocessing tool for chemical reaction data aims to identify errors within chemical databases. Specifically, if the errors involve atom mapping or the absence of reactant types, corrections can be systematically applied using reaction templates, ultimately elevating the overall quality of the database.

12.
Mol Genet Metab ; 142(3): 108514, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38905920

ABSTRACT

Phenylketonuria (PKU) is a genetic disorder caused by variations in the phenylalanine hydroxylase (PAH) gene. Among the 3369 reported PAH variants, 33.7% are missense alterations. Unfortunately, 30% of these missense variants are classified as variants of unknown significance (VUS), posing challenges for genetic risk assessment. In our study, we focused on analyzing 836 missense PAH variants following the American College of Medical Genetics and Genomics/Association for Molecular Pathology (ACMG/AMP) guidelines specified by ClinGen PAH Variant Curation Expert Panel (VCEP) criteria. We utilized and compared variant annotator tools like Franklin and Varsome, conducted 3D structural analysis of PAH, and examined active and regulatory site hotspots. In addition, we assessed potential splicing effect of apparent missense variants. By evaluating phenotype data from 22962 PKU patients, our aim was to reassess the pathogenicity of missense variants. Our comprehensive approach successfully reclassified 309 VUSs out of 836 missense variants as likely pathogenic or pathogenic (37%), upgraded 370 likely pathogenic variants to pathogenic, and reclassified one previously considered likely benign variant as likely pathogenic. Phenotypic information was available for 636 missense variants, with 441 undergoing 3D structural analysis and active site hotspot identification for 180 variants. After our analysis, only 6% of missense variants were classified as VUSs, and three of them (c.23A>C/p.Asn8Thr, c.59_60delinsCC/p.Gln20Pro, and c.278A >T/p.Asn93Ile) may be influenced by abnormal splicing. Moreover, a pathogenic variant (c.168G>T/p.Glu56Asp) was identified to have a risk exceeding 98% for modifications of the consensus splice site, with high scores indicating a donor loss of 0.94. The integration of ACMG/AMP guidelines with in silico structural analysis and phenotypic data significantly reduced the number of missense VUSs, providing a strong basis for genetic counseling and emphasizing the importance of metabolic phenotype information in variant curation. This study also sheds light on the current landscape of PAH variants.


Subject(s)
Mutation, Missense , Phenotype , Phenylalanine Hydroxylase , Phenylketonurias , Humans , Phenylalanine Hydroxylase/genetics , Phenylalanine Hydroxylase/chemistry , Phenylketonurias/genetics , Phenylketonurias/pathology , Computer Simulation
13.
Genomics Inform ; 22(1): 7, 2024 Jun 17.
Article in English | MEDLINE | ID: mdl-38907285

ABSTRACT

This study evaluated large language models (LLMs), particularly the GPT-4 with vision (GPT-4 V) and GPT-4 Turbo, for annotating biomedical figures, focusing on cellular senescence. We assessed the ability of LLMs to categorize and annotate complex biomedical images to enhance their accuracy and efficiency. Our experiments employed prompt engineering with figures from review articles, achieving more than 70% accuracy for label extraction and approximately 80% accuracy for node-type classification. Challenges were noted in the correct annotation of the relationship between directionality and inhibitory processes, which were exacerbated as the number of nodes increased. Using figure legends was a more precise identification of sources and targets than using captions, but sometimes lacked pathway details. This study underscores the potential of LLMs in decoding biological mechanisms from text and outlines avenues for improving inhibitory relationship representations in biomedical informatics.

14.
Food Chem X ; 22: 101398, 2024 Jun 30.
Article in English | MEDLINE | ID: mdl-38694542

ABSTRACT

Since ancient times food has been preserved in vegetable oils for curation. Nevertheless, the transfer of bioactive compounds from these oils to curated foods has not been studied. This research has evaluated the phenolic enrichment of foods curated in olive oil. For this purpose, six foods (fish, vegetables, and cheese) were immersed in olive oil for 30 days and analyzed to determine these antioxidant phenols by LC-MS/MS. Oleuropein aglycone, hydroxytyrosol and tyrosol were the main phenols quantitatively enriched in the foods (up to 42.1, 26.2 and 53.0 mg/kg, respectively). The total phenolic content ranged from 5.8 to 12.1 mg in the evaluated foods taking as reference the recommended daily intake (150 g for fish, 200 g for vegetables, and 50 g for cheese). This research proves the phenolic enrichment of foods curated in olive oil, which can hypothetically increase their antioxidant and bioactive properties.

15.
Mob DNA ; 15(1): 10, 2024 May 06.
Article in English | MEDLINE | ID: mdl-38711146

ABSTRACT

BACKGROUND: The advancement of sequencing technologies results in the rapid release of hundreds of new genome assemblies a year providing unprecedented resources for the study of genome evolution. Within this context, the significance of in-depth analyses of repetitive elements, transposable elements (TEs) in particular, is increasingly recognized in understanding genome evolution. Despite the plethora of available bioinformatic tools for identifying and annotating TEs, the phylogenetic distance of the target species from a curated and classified database of repetitive element sequences constrains any automated annotation effort. Moreover, manual curation of raw repeat libraries is deemed essential due to the frequent incompleteness of automatically generated consensus sequences. RESULTS: Here, we present an example of a crowd-sourcing effort aimed at curating and annotating TE libraries of two non-model species built around a collaborative, peer-reviewed teaching process. Manual curation and classification are time-consuming processes that offer limited short-term academic rewards and are typically confined to a few research groups where methods are taught through hands-on experience. Crowd-sourcing efforts could therefore offer a significant opportunity to bridge the gap between learning the methods of curation effectively and empowering the scientific community with high-quality, reusable repeat libraries. CONCLUSIONS: The collaborative manual curation of TEs from two tardigrade species, for which there were no TE libraries available, resulted in the successful characterization of hundreds of new and diverse TEs in a reasonable time frame. Our crowd-sourcing setting can be used as a teaching reference guide for similar projects: A hidden treasure awaits discovery within non-model organisms.

16.
Methods Cell Biol ; 186: 107-130, 2024.
Article in English | MEDLINE | ID: mdl-38705596

ABSTRACT

Mass cytometry permits the high dimensional analysis of cellular systems at single-cell resolution with high throughput in various areas of biomedical research. Here, we provide a state-of-the-art protocol for the analysis of human peripheral blood mononuclear cells (PBMC) by mass cytometry. We focus on the implementation of measures promoting the harmonization of large and complex studies to aid robustness and reproducibility of immune phenotyping data.


Subject(s)
Flow Cytometry , Leukocytes, Mononuclear , Humans , Leukocytes, Mononuclear/cytology , Leukocytes, Mononuclear/immunology , Flow Cytometry/methods , Flow Cytometry/standards , Immunophenotyping/methods , Single-Cell Analysis/methods
17.
BMC Bioinformatics ; 25(1): 184, 2024 May 09.
Article in English | MEDLINE | ID: mdl-38724907

ABSTRACT

BACKGROUND: Major advances in sequencing technologies and the sharing of data and metadata in science have resulted in a wealth of publicly available datasets. However, working with and especially curating public omics datasets remains challenging despite these efforts. While a growing number of initiatives aim to re-use previous results, these present limitations that often lead to the need for further in-house curation and processing. RESULTS: Here, we present the Omics Dataset Curation Toolkit (OMD Curation Toolkit), a python3 package designed to accompany and guide the researcher during the curation process of metadata and fastq files of public omics datasets. This workflow provides a standardized framework with multiple capabilities (collection, control check, treatment and integration) to facilitate the arduous task of curating public sequencing data projects. While centered on the European Nucleotide Archive (ENA), the majority of the provided tools are generic and can be used to curate datasets from different sources. CONCLUSIONS: Thus, it offers valuable tools for the in-house curation previously needed to re-use public omics data. Due to its workflow structure and capabilities, it can be easily used and benefit investigators in developing novel omics meta-analyses based on sequencing data.


Subject(s)
Data Curation , Software , Workflow , Data Curation/methods , Metadata , Databases, Genetic , Genomics/methods , Computational Biology/methods
18.
J Bioinform Comput Biol ; 22(2): 2450005, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38779780

ABSTRACT

Enzymes catalyze diverse biochemical reactions and are building blocks of cellular and metabolic pathways. Data and metadata of enzymes are distributed across databases and are archived in various formats. The enzyme databases provide utilities for efficient searches and downloading enzyme records in batch mode but do not support organism-specific extraction of subsets of data. Users are required to write scripts for parsing entries for customized data extraction prior to downstream analysis. Integrated Customized Extraction of Enzyme Data (iCEED) has been developed to provide organism-specific customized data extraction utilities for seven commonly used enzyme databases and brings these resources under an integrated portal. iCEED provides dropdown menus and search boxes using typehead utility for submission of queries as well as enzyme class-based browsing utility. A utility to facilitate mapping and visualization of functionally important features on the three-dimensional (3D) structures of enzymes is integrated. The customized data extraction utilities provided in iCEED are expected to be useful for biochemists, biotechnologists, computational biologists, and life science researchers to build curated datasets of their choice through an easy to navigate web-based interface. The integrated feature visualization system is useful for a fine-grained understanding of the enzyme structure-function relationship. Desired subsets of data, extracted and curated using iCEED can be subsequently used for downstream processing, analyses, and knowledge discovery. iCEED can also be used for training and teaching purposes.


Subject(s)
Databases, Protein , Enzymes , Software , Enzymes/chemistry , Enzymes/metabolism , Computational Biology/methods , User-Computer Interface , Internet
19.
Orphanet J Rare Dis ; 19(1): 213, 2024 May 22.
Article in English | MEDLINE | ID: mdl-38778413

ABSTRACT

BACKGROUND: Vascular anomalies caused by somatic (postzygotic) variants are clinically and genetically heterogeneous diseases with overlapping or distinct entities. The genetic knowledge in this field is rapidly growing, and genetic testing is now part of the diagnostic workup alongside the clinical, radiological and histopathological data. Nonetheless, access to genetic testing is still limited, and there is significant heterogeneity across the approaches used by the diagnostic laboratories, with direct consequences on test sensitivity and accuracy. The clinical utility of genetic testing is expected to increase progressively with improved theragnostics, which will be based on information about the efficacy and safety of the emerging drugs and future molecules. The aim of this study was to make recommendations for optimising and guiding the diagnostic genetic testing for somatic variants in patients with vascular malformations. RESULTS: Physicians and lab specialists from 11 multidisciplinary European centres for vascular anomalies reviewed the genes identified to date as being involved in non-hereditary vascular malformations, evaluated gene-disease associations, and made recommendations about the technical aspects for identification of low-level mosaicism and variant interpretation. A core list of 24 genes were selected based on the current practices in the participating laboratories, the ISSVA classification and the literature. In total 45 gene-phenotype associations were evaluated: 16 were considered definitive, 16 strong, 3 moderate, 7 limited and 3 with no evidence. CONCLUSIONS: This work provides a detailed evidence-based view of the gene-disease associations in the field of vascular malformations caused by somatic variants. Knowing both the gene-phenotype relationships and the strength of the associations greatly help laboratories in data interpretation and eventually in the clinical diagnosis. This study reflects the state of knowledge as of mid-2023 and will be regularly updated on the VASCERN-VASCA website (VASCERN-VASCA, https://vascern.eu/groupe/vascular-anomalies/ ).


Subject(s)
Genetic Testing , Vascular Malformations , Humans , Genetic Testing/methods , Vascular Malformations/genetics , Vascular Malformations/diagnosis , Vascular Malformations/pathology , Genetic Association Studies
20.
Curr Med Res Opin ; : 1-7, 2024 May 14.
Article in English | MEDLINE | ID: mdl-38700245

ABSTRACT

According to its own description, the biomedical meta-database PubMed exists "with the aim of improving health-both globally and personally." Unfortunately, PubMed contains an increasing amount of low-quality research that may detract from this goal. Currently, PubMed warns its users and protects itself from such problems with a disclaimer stating that the presence of any article, book, or document in PubMed does not imply an endorsement of, or concurrence with, its contents by the NLM, the National Institutes of Health (NIH), or the U.S. Federal Government. However, we are critical of a "disclaimer-only" stance and encourage PubMed to take further action against low-quality research being found and indexed in its database, and thus available for use. To address this problem, we offer two lines of reasoning to argue that PubMed should not function merely as a passive index of health-related research. Instead, we first argue that only trustworthy published research is able to further PubMed's goal of health improvement. Secondly, on the basis of surveys, we argue that researchers place a high level of trust in articles that are referenced in this meta-database. We cannot expect any one set of actors to ensure trustworthy content on PubMed, which requires collective responsibility among authors, peer reviewers, editors, and indexers alike. Instead, we propose a curation-based model that incorporates three mechanisms of collaborative content curation: open expert feedback on indexed content, journal auditing, and constant transparent reassessment of indexed entities.

SELECTION OF CITATIONS
SEARCH DETAIL