Búsqueda | Portal Regional de la BVS Paraguay

1.

Extracting Knowledge from MS Clinical Metabolomic Data: Processing and Analysis Strategies.

Meister, Isabel; Boccard, Julien; Rudaz, Serge.

Methods Mol Biol ; 2855: 539-554, 2025.

Artículo en Inglés | MEDLINE | ID: mdl-39354326

RESUMEN

Assessing potential alterations of metabolic pathways using large-scale approaches plays today a central role in clinical research. Because several thousands of mass features can be measured for each sample with separation techniques hyphenated to mass spectrometry (MS) detection, adapted strategies have to be implemented to detect altered pathways and help to elucidate the mechanisms of pathologies. These procedures include peak detection, sample alignment, normalization, statistical analysis, and metabolite annotation. Interestingly, considerable advances have been made over the last years in terms of analytics, bioinformatics, and chemometrics to help massive and complex metabolomic data to be more adequately handled with automated processing and data analysis workflows. Recent developments and remaining challenges related to MS signal processing, metabolite annotation, and biomarker discovery based on statistical models are illustrated in this chapter in light of their application to clinical research.

Asunto(s)

Biomarcadores , Espectrometría de Masas , Metabolómica , Metabolómica/métodos , Humanos , Espectrometría de Masas/métodos , Biomarcadores/metabolismo , Biología Computacional/métodos , Metaboloma , Programas Informáticos

2.

A dental intraoral image dataset of gingivitis for image captioning.

Duy, Hoang Bao; Hue, Tran Thi; Son, Tong Minh; Nghia, Le Long; Lan, Luong Thi Hong; Duc, Nguyen Minh; Son, Le Hoang.

Data Brief ; 57: 110960, 2024 Dec.

Artículo en Inglés | MEDLINE | ID: mdl-39386321

RESUMEN

One of the most striking topics in Artificial Intelligence (AI) is Image captioning that aims to integrate computer vision and natural language processing to create descriptions for each image. In this paper, we propose a new dataset designed specifically for image captioning in gingivitis diagnosis using deep learning. It includes 1,096 high-resolution intraoral images of 12 anterior teeth and surrounding gingival tissue that were collected under controlled conditions with professional-grade photography equipment. Each image features detailed labels and descriptive captions. The labeling process involved three periodontists with over ten years of experience who assigned Modified Gingival Index (MGI) scores to each tooth in the images, achieving high inter-rater reliability through a rigorous calibration process. Captions were then created by the same periodontists, offering diverse descriptions of gingivitis severity and locations. The dataset is systematically organized into training, validation, and testing subsets for systematic accessibility. This dataset supports the development of advanced image captioning algorithms and is a valuable educational resource for integrating real-world data into dental research and curriculum.

3.

Adaptation to space conditions of novel bacterial species isolated from the International Space Station revealed by functional gene annotations and comparative genome analysis.

Szydlowski, Lukasz M; Bulbul, Alper A; Simpson, Anna C; Kaya, Deniz E; Singh, Nitin K; Sezerman, Ugur O; Labaj, Pawel P; Kosciolek, Tomasz; Venkateswaran, Kasthuri.

Microbiome ; 12(1): 190, 2024 Oct 04.

Artículo en Inglés | MEDLINE | ID: mdl-39363369

RESUMEN

BACKGROUND: The extreme environment of the International Space Station (ISS) puts selective pressure on microorganisms unintentionally introduced during its 20+ years of service as a low-orbit science platform and human habitat. Such pressure leads to the development of new features not found in the Earth-bound relatives, which enable them to adapt to unfavorable conditions. RESULTS: In this study, we generated the functional annotation of the genomes of five newly identified species of Gram-positive bacteria, four of which are non-spore-forming and one spore-forming, all isolated from the ISS. Using a deep-learning based tool-deepFRI-we were able to functionally annotate close to 100% of protein-coding genes in all studied species, overcoming other annotation tools. Our comparative genomic analysis highlights common characteristics across all five species and specific genetic traits that appear unique to these ISS microorganisms. Proteome analysis mirrored these genomic patterns, revealing similar traits. The collective annotations suggest adaptations to life in space, including the management of hypoosmotic stress related to microgravity via mechanosensitive channel proteins, increased DNA repair activity to counteract heightened radiation exposure, and the presence of mobile genetic elements enhancing metabolism. In addition, our findings suggest the evolution of certain genetic traits indicative of potential pathogenic capabilities, such as small molecule and peptide synthesis and ATP-dependent transporters. These traits, exclusive to the ISS microorganisms, further substantiate previous reports explaining why microbes exposed to space conditions demonstrate enhanced antibiotic resistance and pathogenicity. CONCLUSION: Our findings indicate that the microorganisms isolated from ISS we studied have adapted to life in space. Evidence such as mechanosensitive channel proteins, increased DNA repair activity, as well as metallopeptidases and novel S-layer oxidoreductases suggest a convergent adaptation among these diverse microorganisms, potentially complementing one another within the context of the microbiome. The common genes that facilitate adaptation to the ISS environment may enable bioproduction of essential biomolecules need during future space missions, or serve as potential drug targets, if these microorganisms pose health risks. Video Abstract.

Asunto(s)

Genoma Bacteriano , Vuelo Espacial , Ingravidez , Bacterias/genética , Bacterias/clasificación , Bacterias/aislamiento & purificación , Adaptación Fisiológica/genética , Anotación de Secuencia Molecular , Nave Espacial , Proteoma , Filogenia , Genómica , Humanos

4.

Tissue ontogeny and chemical composition influence bacterial biodiversity in the wood and shoot tip of Populus nigra.

Bose, T; Mahomed, T G; Mbatha, K C; Joubert, J C; Hammerbacher, A.

Plant Biol (Stuttg) ; 2024 Oct 02.

Artículo en Inglés | MEDLINE | ID: mdl-39356199

RESUMEN

Plant-microbe interactions significantly influence plant growth dynamics and adaptability. This study explores the impact of metabolites on microbial biodiversity in shoot tips and wood of Populus nigra under greenhouse conditions, using high-throughput sequencing and metabolite profiling. Branches from P. nigra were harvested, rooted, and transplanted into pots for growth. After 3 months, tissue samples from shoot tips and wood were collected, and metabolites extracted and analysed using GC-MS and LC-MS. Genomic DNA was extracted and subjected to high-throughput sequencing for bacterial biodiversity profiling. Both datasets were analysed using bioinformatic and statistical pipelines. Metabolite profiling indicated that shoot tips had a higher relative abundance of primary and secondary metabolites, including sugars, fatty acids, organic acids, phenolic acid derivatives and salicinoids, while wood was enriched in flavonoids. Bacterial biodiversity also differed significantly between these tissues, with Clostridiales, Bacteroidales and Bacillales dominating in shoot tips, associated with rapid growth and anaerobic fermentation, while wood tissues were characterized by diazotrophs from Rhizobiales, Sphingomonadales and Frankiales. PCoA clustering confirmed tissue-specific microbial differences. Functional analysis revealed an enrichment of fundamental cellular processes in shoot tips, while wood exhibited pathways related to degradation and mortality. Metabolite profiling revealed significant variations in primary and secondary metabolites, highlighting their influence on microbial biodiversity across plant tissues. The dominance of specific bacterial orders and distinct functional pathways in each tissue suggests a tailored microbial response to the unique environments of shoot tips and wood.

5.

Automatic Landmark Annotation and Measurement of 3D Mandibular Morphology Using Non-Rigid Registration: A Preliminary Exploration and Accuracy Assessment.

Chen, Zhewei; Lei, Bowen; Li, Binghang; Ma, Hengyuan; Zhong, Yehong.

Cleft Palate Craniofac J ; : 10556656241288204, 2024 Oct 03.

Artículo en Inglés | MEDLINE | ID: mdl-39360344

RESUMEN

This study aimed to develop an automatic methodology for mandibular landmarking and measurement using non-rigid registration as well as analyze the accuracy of automatic landmarking and measurements.Statistical analysis.Digital technology center, tertiary hospital.130 healthy Chinese adults with equal gender distribution, average age 28.2 ± 5.6 years.Four mean shape mesh templates were generated from 100 head CT scans. Following manual indication of landmarks, these templates were applied for automatic landmark annotation and measurements on mandibles from another 30 head CT scans, using non-rigid iterative closest point registration.Differences of landmark coordinates and measurements between automatic and manual annotation were analyzed using mean difference, centroid size, Euclidean distances and intraclass correlation coefficient (ICC), assessing the accuracy and validity of automatic landmark annotation.The majority of automatic landmarks (16/22) did not exhibit consistent displacement to specific direction. ICCs of all landmark coordinates exceed 0.950, with 87.9% larger than 0.990. The average Euclidean distance between manual and automatic landmarks was 2.038 ± 0.947âmm. Most ICCs of linear and angular measurements between manual and automatic annotation (20/26) exceeded 0.900, with the average errors being 1.425 ± 0.973âmm and 2.257 ± 0.649 °, respectively.A novel and efficient method for automatic landmark annotation was established based on non-rigid registration. Its credibility and accuracy in mandibular annotation and measurements were demonstrated.

6.

Variation in forest root image annotation by experts, novices, and AI.

Handy, Grace; Carter, Imogen; Mackenzie, A Rob; Esquivel-Muelbert, Adriane; Smith, Abraham George; Yaffar, Daniela; Childs, Joanne; Arnaud, Marie.

Plant Methods ; 20(1): 154, 2024 Oct 01.

Artículo en Inglés | MEDLINE | ID: mdl-39350215

RESUMEN

BACKGROUND: The manual study of root dynamics using images requires huge investments of time and resources and is prone to previously poorly quantified annotator bias. Artificial intelligence (AI) image-processing tools have been successful in overcoming limitations of manual annotation in homogeneous soils, but their efficiency and accuracy is yet to be widely tested on less homogenous, non-agricultural soil profiles, e.g., that of forests, from which data on root dynamics are key to understanding the carbon cycle. Here, we quantify variance in root length measured by human annotators with varying experience levels. We evaluate the application of a convolutional neural network (CNN) model, trained on a software accessible to researchers without a machine learning background, on a heterogeneous minirhizotron image dataset taken in a multispecies, mature, deciduous temperate forest. RESULTS: Less experienced annotators consistently identified more root length than experienced annotators. Root length annotation also varied between experienced annotators. The CNN root length results were neither precise nor accurate, taking ~ 10% of the time but significantly overestimating root length compared to expert manual annotation (p = 0.01). The CNN net root length change results were closer to manual (p = 0.08) but there remained substantial variation. CONCLUSIONS: Manual root length annotation is contingent on the individual annotator. The only accessible CNN model cannot yet produce root data of sufficient accuracy and precision for ecological applications when applied to a complex, heterogeneous forest image dataset. A continuing evaluation and development of accessible CNNs for natural ecosystems is required.

7.

Synthetic data at scale: a development model to efficiently leverage machine learning in agriculture.

Klein, Jonathan; Waller, Rebekah; Pirk, Sören; Palubicki, Wojtek; Tester, Mark; Michels, Dominik L.

Front Plant Sci ; 15: 1360113, 2024.

Artículo en Inglés | MEDLINE | ID: mdl-39351023

RESUMEN

The rise of artificial intelligence (AI) and in particular modern machine learning (ML) algorithms during the last decade has been met with great interest in the agricultural industry. While undisputedly powerful, their main drawback remains the need for sufficient and diverse training data. The collection of real datasets and their annotation are the main cost drivers of ML developments, and while promising results on synthetically generated training data have been shown, their generation is not without difficulties on their own. In this paper, we present a development model for the iterative, cost-efficient generation of synthetic training data. Its application is demonstrated by developing a low-cost early disease detector for tomato plants (Solanum lycopersicum) using synthetic training data. A neural classifier is trained by exclusively using synthetic images, whose generation process is iteratively refined to obtain optimal performance. In contrast to other approaches that rely on a human assessment of similarity between real and synthetic data, we instead introduce a structured, quantitative approach. Our evaluation shows superior generalization results when compared to using non-task-specific real training data and a higher cost efficiency of development compared to traditional synthetic training data. We believe that our approach will help to reduce the cost of synthetic data generation in future applications.

8.

The diagnostic utility of CT attenuation values in detecting calcification within costal cartilage.

Wang, Baohong; Dai, Yiqing; Chang, Lufan; Li, Yiyuan; Li, Datao; Xu, Feng; Xu, Zhicheng; Zhang, Qun; Liu, Hao; Chen, Xia; Zhang, Ruhong.

J Plast Reconstr Aesthet Surg ; 99: 103-109, 2024 Sep 20.

Artículo en Inglés | MEDLINE | ID: mdl-39368266

RESUMEN

BACKGROUND: To establish and verify diagnostic criteria for the identification of costal cartilage calcification based on computed tomography (CT) attenuation value. METHODS: 360 chest CT slices of 120 patients were reviewed and annotated retrospectively and receiver operating characteristic curve was used to evaluate the diagnostic ability of CT attenuation value. Another 20 slices containing calcification were randomly selected and annotated by 4 doctors for further validation. hematoxylin and eosin and collagen type X (COLX) staining was performed on the residual costal cartilage. RESULTS: In total 355,129 voxels were detected and 187.5 was confirmed as the optimal CT attenuation value threshold, with a sensitivity of 98.6% and a specificity of 99.7%, for costal cartilage calcification diagnosis. Threshold-based identification of calcification demonstrated a similarity of nearly 80% with specialists' assessments, and exhibited advantages in the identification of subtle calcifications in the further validation. We also observed that CT attenuation values among males demonstrated a centralized distribution, whereas those among females exhibited a bimodal distribution. Threshold-based identified calcification showed a positivity of COLX. CONCLUSIONS: CT attenuation value could validly and reliably diagnose calcification within costal cartilage. Further investigations involving larger cohorts of patients are required to elucidate the risk factors and underlying mechanisms of costal cartilage calcification.

9.

An integrated 3-M workflow for accelerated annotation of natural products: Flavonoids in Daemonorops draco as a case study.

Fan, Wenxiang; Li, Ziwei; Liu, Longchan; Wang, Yu; Chen, Kaixian; Li, Linnan; Wang, Zhengtao; Yang, Li.

Talanta ; 282: 126921, 2024 Sep 25.

Artículo en Inglés | MEDLINE | ID: mdl-39368333

RESUMEN

Efficient annotation and dereplication of metabolites, particularly those from resource-endangered plants lacking reference standards, is crucial for natural products development. Advanced techniques like high resolution mass spectrometry (LC-HRMS) have significantly enhanced metabolite characterization. However, challenges such as redundant spectral data, limited reference databases, and inferior dereplication capacity hinder its broad applicability. In this study, we propose an integrated annotation strategy utilizing various computational tools, including mass defect filters (MDF), molecular fingerprints, and molecular networks (3-M strategy). We demonstrate this approach using Daemonorops draco (D. draco), a renowned yet resource-endangered natural product rich in functional flavonoids. By applying pre-defined flavonoids MDF windows, the MS1 peaks reduced by 85 % (from 10,043 to 1,585) in positive mode. Subsequent de novo molecular formula annotation and molecular fingerprint-based structure elucidation were automatically performed using the SIRIUS machine learning platform. Additionally, two complementary cluster tools were incorporated, including feature-based molecular network (FBMN) and t-distributed stochastic neighbor embedding (t-SNE) molecular network, to efficiently dereplicate metabolites and discover novel flavonoids in D. draco. Totally, 108 flavonoids (containing flavones, flavanes, flavanones, chalcones, chalcanes, dihydrochalcones, anthocyanins, homoisoflavanes, homoisoflavanones, and isoflavones), 18 flavone derivatives, and 54 flavone oligomers were identified. Among them, 25 compounds were firstly reported in D. draco. This 3-M workflow shed light on the composition of D. draco and validate the effectiveness of our approach, which facilitated the rapid annotation and screening of subclass metabolites in complex natural products.

10.

AI-Based Knowledge Extraction from the Bioprinting Literature for Identifying Technology Trends.

Bonatti, Amedeo Franco; Chiarello, Filippo; Vozzi, Giovanni; De Maria, Carmelo.

3D Print Addit Manuf ; 11(4): 1495-1509, 2024 Aug.

Artículo en Inglés | MEDLINE | ID: mdl-39360130

RESUMEN

Bioprinting is a rapidly evolving field, as represented by the exponential growth of articles and reviews published each year on the topic. As the number of publications increases, there is a need for an automatic tool that can help researchers do more comprehensive literature analysis, standardize the nomenclature, and so accelerate the development of novel manufacturing techniques and materials for the field. In this context, we propose an automatic keyword annotation model, based on Natural Language Processing (NLP) techniques, that can be used to find insights in the bioprinting scientific literature. The approach is based on two main data sources, the abstracts and related author keywords, which are used to train a composite model based on (i) an embeddings part (using the FastText algorithm), which generates word vectors for an input keyword, and (ii) a classifier part (using the Support Vector Machine algorithm), to label the keyword based on its word vector into a manufacturing technique, employed material, or application of the bioprinted product. The composite model was trained and optimized based on a two-stage optimization procedure to yield the best classification performance. The annotated author keywords were then reprojected on the abstract collection to both generate a lexicon of the bioprinting field and extract relevant information, like technology trends and the relationship between manufacturing-material-application. The proposed approach can serve as a basis for more complex NLP-related analysis toward the automated analysis of the bioprinting literature.

11.

Draft genome sequence of a Lactiplantibacillus pentosus strain isolated from traditionally fermented rice.

Cheruvari, Athira; Kammara, Rajagopal.

Access Microbiol ; 6(10)2024.

Artículo en Inglés | MEDLINE | ID: mdl-39371602

RESUMEN

Lactiplantibacillus pentosus is a probiotic bacterium reported to be present in various fermented foods, such as fermented olives, and it significantly influences human health. The present study concerns a lactic acid bacterial strain designated L. pentosus krglsrbmofpi2, isolated from traditional fermented rice, and which has been shown to have an assortment of beneficial attributes. Using Illumina technologies, we have sequenced and investigated the whole genome sequence of L. pentosus krglsrbmofpi2 to understand its functionality and safety. The chromosomal genome was 3.7 Mb in size with 46% GC content and 3192 protein-coding genes. Additional extensive bioinformatics investigations were carried out involving whole genome sequence assembly and annotation.

12.

Modification d x 2 - y 2 ${{d}_{{{x}2} - {{y}2}}}$ Orbital Electronic States in Nickel-Based Hydroxides Via Cobalt/Iron Co-Doping for High-Efficiency Methanol Electrooxidation.

Li, Junhua; Wu, Chao; Wang, Zhen; Meng, Haoyan; Zhang, Qi; Tang, Ying; Zou, Anqi; Zhang, Yiming; Xi, Shibo; Xue, Junmin; Wang, Xiaopeng; Wu, Jiagang.

Small ; : e2406829, 2024 Oct 06.

Artículo en Inglés | MEDLINE | ID: mdl-39370665

RESUMEN

The nickel hydroxide-based (Ni(OH)2) methanol-to-formate electrooxidation reaction (MOR) performance is greatly related to the d x 2 - y 2 ${{d}_{{{x}^2} - {{y}^2}}}$ orbital electronic states. Hence, optimizing the d x 2 - y 2 ${{d}_{{{x}^2} - {{y}^2}}}$ orbital electronic states to achieve enhanced MOR activities are highly desired. Here, cobalt (Co) and iron (Fe) doping are used to modify the d x 2 - y 2 ${{d}_{{{x}^2} - {{y}^2}}}$ orbital electronic states. Although both dopants can broaden the d x 2 - y 2 ${{d}_{{{x}^2} - {{y}^2}}}$ orbital; however, Co doping leads to an elevation in the energy level of d x 2 - y 2 ${{d}_{{{x}^2} - {{y}^2}}}$ highest occupied crystal orbital (HOCO), whereas Fe doping results in its reduction. Such a discrepancy in the regulation of d x 2 - y 2 ${{d}_{{{x}^2} - {{y}^2}}}$ orbital electronic states stems from the disparate partial electron transfer mechanisms amongst these transition metal ions, which possess distinct energy level and occupancy of d orbitals. Motivated by this finding, the NiCoFe hydroxide is prepared and exhibited an excellent MOR performance. The results showed that the Co dopants effectively suppress the partial electron transfer from Ni to Fe, combined with the d x 2 - y 2 ${{d}_{{{x}^2} - {{y}^2}}}$ orbital broadening induced by NiO6 octahedra distortion, endowing NiCoFe hydroxide with high d x 2 - y 2 ${{d}_{{{x}^2} - {{y}^2}}}$ HOCO and broad d x 2 - y 2 ${{d}_{{{x}^2} - {{y}^2}}}$ orbital. It is believed that the work gives an in-depth understanding on d x 2 - y 2 ${{d}_{{{x}^2} - {{y}^2}}}$ orbital electronic states regulation in Ni(OH)2, which is beneficial for designing Ni(OH)2-based catalysts with high MOR performance.

13.

Development of HepatIA: A computed tomography annotation platform and database for artificial intelligence training in hepatocellular carcinoma detection at a Brazilian tertiary teaching hospital.

Rocha, Bruno Aragão; Ferreira, Lorena Carneiro; Vianna, Luis Gustavo Rocha; Ciconelle, Ana Claudia Martins; Cortez Filho, João Martins; Nogueira, Lucas Salume Lima; Silva Filho, Maurício Ricardo Moreira da; Leite, Claudia da Costa; Nomura, Cesar Higar; Cerri, Giovanni Guido; Carrilho, Flair José; Ono, Suzane Kioko.

Clinics (Sao Paulo) ; 79: 100512, 2024 Oct 09.

Artículo en Inglés | MEDLINE | ID: mdl-39388738

RESUMEN

BACKGROUND: Hepatocellular carcinoma (HCC) is a prevalent tumor with high mortality rates. Computed tomography (CT) is crucial in the non-invasive diagnosis of HCC. Recent advancements in artificial intelligence (AI) have shown significant potential in medical imaging analysis. However, developing these AI algorithms is hindered by the scarcity of comprehensive, publicly available liver imaging datasets. OBJECTIVES: This study aims to detail the tools, data organization, and database structuring used in creating HepatIA, a medical imaging annotation platform and database at a Brazilian tertiary teaching hospital. HepatIA supports liver disease AI research at the institution. MATERIAL AND METHODS: The authors collected baseline characteristics and CT scans of 656 patients from 2008 to 2021. The database, designed using PostgreSQL and implemented with Django and Vue.js, includes 692 CT volumes from a four-phase abdominal CT protocol. Radiologists made segmentation annotations using the OHIF medical image viewer, incorporating MONAI Label for pre-annotation segmentation models. The annotation process included detailed descriptions of liver morphology and nodule characteristics. RESULTS: The HepatIA database currently includes healthy individuals and those with liver diseases such as HCC and cirrhosis. The database dashboard facilitates user interaction with intuitive plots and histograms. Key patient demographics include 64% males and an average age of 56.89 years. The database supports various filters for detailed searches, enhancing research capabilities. CONCLUSION: A comprehensive data structure was successfully created and integrated with the IT systems of a teaching hospital, enabling research on deep learning algorithms applied to abdominal CT scans for investigating hepatic lesions such as HCC.

14.

SpotGF: Denoising spatially resolved transcriptomics data using an optimal transport-based gene filtering algorithm.

Du, Lin; Kang, Jingmin; Hou, Yong; Sun, Hai-Xi; Zhang, Bohan.

Cell Syst ; 2024 Oct 01.

Artículo en Inglés | MEDLINE | ID: mdl-39378875

RESUMEN

Spatially resolved transcriptomics (SRT) combines gene expression profiles with the physical locations of cells in their native states but suffers from unpredictable spatial noise due to cell damage during cryosectioning and exposure to reagents for staining and mRNA release. To address this noise, we developed SpotGF, an algorithm for denoising SRT data using optimal transport-based gene filtering. SpotGF quantifies diffusion patterns numerically, distinguishing widespread expression genes from aggregated expression genes and filtering out the former as noise. Unlike conventional denoising methods, SpotGF preserves raw sequencing data, thereby avoiding false positives that can arise from imputation. Additionally, SpotGF demonstrates superior performance in cell clustering, identifying potential marker genes, and annotating cell types. Overall, SpotGF has the potential to become a crucial preprocessing step in the downstream analysis of SRT data. The SpotGF software is freely available at GitHub. A record of this paper's transparent peer review process is included in the supplemental information.

15.

Characterizing pituitary adenomas in clinical notes: Corpus construction and its application in LLMs.

Hu, Jiahui; Fu, Jin; Zhao, Wanqing; Lou, Pei; Feng, Ming; Ren, Huiling; Feng, Shanshan; Li, Yansheng; Fang, An.

Health Informatics J ; 30(4): 14604582241291442, 2024.

Artículo en Inglés | MEDLINE | ID: mdl-39379071

RESUMEN

Objective: Faced with the challenges of differential diagnosis caused by the complex clinical manifestations and high pathological heterogeneity of pituitary adenomas, this study aims to construct a high-quality annotated corpus to characterize pituitary adenomas in clinical notes containing rich diagnosis and treatment information. Methods: A dataset from a pituitary adenomas neurosurgery treatment center of a tertiary first-class hospital in China was retrospectively collected. A semi-automatic corpus construction framework was designed. A total of 2000 documents containing 9430 sentences and 524,232 words were annotated, and the text corpus of pituitary adenomas (TCPA) was constructed and analyzed. Its potential application in large language models (LLMs) was explored through fine-tuning and prompting experiments. Results: TCPA had 4782 medical entities and 28,998 tokens, achieving good quality with the inter-annotator agreement value of 0.862-0.986. The LLMs experiments showed that TCPA can be used to automatically identify clinical information from free texts, and introducing instances with clinical characteristics can effectively reduce the need for training data, thereby reducing labor costs. Conclusion: This study characterized pituitary adenomas in clinical notes, and the proposed method were able to serve as references for relevant research in medical natural language scenarios with highly specialized language structure and terminology.

Asunto(s)

Procesamiento de Lenguaje Natural , Neoplasias Hipofisarias , Humanos , Neoplasias Hipofisarias/diagnóstico , China , Estudios Retrospectivos , Adenoma/diagnóstico , Registros Electrónicos de Salud/estadística & datos numéricos

16.

Fast Context-Aware Analysis of Genome Annotation Colocalization.

Gafurov, Askar; Vinar, Tomás; Medvedev, Paul; Brejová, Brona.

J Comput Biol ; 2024 Oct 09.

Artículo en Inglés | MEDLINE | ID: mdl-39381845

RESUMEN

An annotation is a set of genomic intervals sharing a particular function or property. Examples include genes or their exons, sequence repeats, regions with a particular epigenetic state, and copy number variants. A common task is to compare two annotations to determine if one is enriched or depleted in the regions covered by the other. We study the problem of assigning statistical significance to such a comparison based on a null model representing random unrelated annotations. To incorporate more background information into such analyses, we propose a new null model based on a Markov chain that differentiates among several genomic contexts. These contexts can capture various confounding factors, such as GC content or assembly gaps. We then develop a new algorithm for estimating p-values by computing the exact expectation and variance of the test statistic and then estimating the p-value using a normal approximation. Compared to the previous algorithm by Gafurov et al., the new algorithm provides three advances: (1) the running time is improved from quadratic to linear or quasi-linear, (2) the algorithm can handle two different test statistics, and (3) the algorithm can handle both simple and context-dependent Markov chain null models. We demonstrate the efficiency and accuracy of our algorithm on synthetic and real data sets, including the recent human telomere-to-telomere assembly. In particular, our algorithm computed p-values for 450 pairs of human genome annotations using 24 threads in under three hours. Moreover, the use of genomic contexts to correct for GC bias resulted in the reversal of some previously published findings.

17.

BDPapayaLeaf: A dataset of papaya leaf for disease detection, classification, and analysis.

Mustofa, Sumaya; Ahad, Md Taimur; Emon, Yousuf Rayhan; Sarker, Arpita.

Data Brief ; 57: 110910, 2024 Dec.

Artículo en Inglés | MEDLINE | ID: mdl-39381009

RESUMEN

Papaya is a popular vegetable and fruit in both developing and developed countries. Nonetheless, Bangladesh's agricultural landscape is significantly influenced by papaya cultivation. However, disease is a common impediment to papaya productivity, adversely affecting papaya quality and yield and leading to substantial economic losses for farmers. Research suggests that computer-aided disease diagnosis and machine learning (ML) models can improve papaya production by detecting and classifying diseases. In this line, a dataset of papaya is required to diagnose the disease. Moreover, like many other fruits, papaya disease may vary from country to country. Therefore, the country-based papaya disease dataset is required. In this study, a papaya dataset is collected from Dhaka, Bangladesh. This dataset contains 2159 original images from five classes, including the healthy control class and four papaya leaf diseases: Anthracnose, Bacterial Spot, Curl, and Ring spot. Besides the original images, the dataset contains 210 annotated data for each of the five classes. The dataset contains two types of data: the whole image and the annotated image. The image will interest data scientists who apply disease detection through a convolutional neural network (CNN) and its variants. Furthermore, the annotated images, such as You Only Look Once (YOLO), U-Net, Mask R-CNN, and Single Shot Detection (SSD), will be helpful for semantic segmentation. Since firm-applicable AI devices and mobile and web applications are in demand, the dataset collected in this study will offer multiple options for integrating ML models into AI devices. In countries with weather and climate similar to Bangladesh, data scientists may use their dataset in that context.

18.

High-throughput protein characterization by complementation using DNA barcoded fragment libraries.

Biggs, Bradley W; Price, Morgan N; Lai, Dexter; Escobedo, Jasmine; Fortanel, Yuridia; Huang, Yolanda Y; Kim, Kyoungmin; Trotter, Valentine V; Kuehl, Jennifer V; Lui, Lauren M; Chakraborty, Romy; Deutschbauer, Adam M; Arkin, Adam P.

Mol Syst Biol ; 2024 Oct 07.

Artículo en Inglés | MEDLINE | ID: mdl-39375541

RESUMEN

Our ability to predict, control, or design biological function is fundamentally limited by poorly annotated gene function. This can be particularly challenging in non-model systems. Accordingly, there is motivation for new high-throughput methods for accurate functional annotation. Here, we used complementation of auxotrophs and DNA barcode sequencing (Coaux-Seq) to enable high-throughput characterization of protein function. Fragment libraries from eleven genetically diverse bacteria were tested in twenty different auxotrophic strains of Escherichia coli to identify genes that complement missing biochemical activity. We recovered 41% of expected hits, with effectiveness ranging per source genome, and observed success even with distant E. coli relatives like Bacillus subtilis and Bacteroides thetaiotaomicron. Coaux-Seq provided the first experimental validation for 53 proteins, of which 11 are less than 40% identical to an experimentally characterized protein. Among the unexpected function identified was a sulfate uptake transporter, an O-succinylhomoserine sulfhydrylase for methionine synthesis, and an aminotransferase. We also identified instances of cross-feeding wherein protein overexpression and nearby non-auxotrophic strains enabled growth. Altogether, Coaux-Seq's utility is demonstrated, with future applications in ecology, health, and engineering.

19.

Improved liver fat and R 2 * quantification at 0.55 T using locally low-rank denoising.

Shih, Shu-Fu; Tasdelen, Bilal; Yagiz, Ecrin; Zhang, Zhaohuan; Zhong, Xiaodong; Cui, Sophia X; Nayak, Krishna S; Wu, Holden H.

Magn Reson Med ; 2024 Oct 09.

Artículo en Inglés | MEDLINE | ID: mdl-39385473

RESUMEN

PURPOSE: To improve liver proton density fat fraction (PDFF) and R 2 * $$ {R}_2^{\ast } $$ quantification at 0.55 T by systematically validating the acquisition parameter choices and investigating the performance of locally low-rank denoising methods. METHODS: A Monte Carlo simulation was conducted to design a protocol for PDFF and R 2 * $$ {R}_2^{\ast } $$ mapping at 0.55 T. Using this proposed protocol, we investigated the performance of robust locally low-rank (RLLR) and random matrix theory (RMT) denoising. In a reference phantom, we assessed quantification accuracy (concordance correlation coefficient [ ρ c $$ {\rho}_c $$ ] vs. reference values) and precision (using SD) across scan repetitions. We performed in vivo liver scans (11 subjects) and used regions of interest to compare means and SDs of PDFF and R 2 * $$ {R}_2^{\ast } $$ measurements. Kruskal-Wallis and Wilcoxon signed-rank tests were performed (p < 0.05 considered significant). RESULTS: In the phantom, RLLR and RMT denoising improved accuracy in PDFF and R 2 * $$ {R}_2^{\ast } $$ with ρ c $$ {\rho}_c $$ >0.992 and improved precision with >67% decrease in SD across 50 scan repetitions versus conventional reconstruction (i.e., no denoising). For in vivo liver scans, the mean PDFF and mean R 2 * $$ {R}_2^{\ast } $$ were not significantly different between the three methods (conventional reconstruction; RLLR and RMT denoising). Without denoising, the SDs of PDFF and R 2 * $$ {R}_2^{\ast } $$ were 8.80% and 14.17 s-1. RLLR denoising significantly reduced the values to 1.79% and 5.31 s-1 (p < 0.001); RMT denoising significantly reduced the values to 2.00% and 4.81 s-1 (p < 0.001). CONCLUSION: We validated an acquisition protocol for improved PDFF and R 2 * $$ {R}_2^{\ast } $$ quantification at 0.55 T. Both RLLR and RMT denoising improved the accuracy and precision of PDFF and R 2 * $$ {R}_2^{\ast } $$ measurements.

20.

MultiSC: a deep learning pipeline for analyzing multiomics single-cell data.

Lin, Xiang; Jiang, Siqi; Gao, Le; Wei, Zhi; Wang, Junwen.

Brief Bioinform ; 25(6)2024 Sep 23.

Artículo en Inglés | MEDLINE | ID: mdl-39376034

RESUMEN

Single-cell technologies enable researchers to investigate cell functions at an individual cell level and study cellular processes with higher resolution. Several multi-omics single-cell sequencing techniques have been developed to explore various aspects of cellular behavior. Using NEAT-seq as an example, this method simultaneously obtains three kinds of omics data for each cell: gene expression, chromatin accessibility, and protein expression of transcription factors (TFs). Consequently, NEAT-seq offers a more comprehensive understanding of cellular activities in multiple modalities. However, there is a lack of tools available for effectively integrating the three types of omics data. To address this gap, we propose a novel pipeline called MultiSC for the analysis of MULTIomic Single-Cell data. Our pipeline leverages a multimodal constraint autoencoder (single-cell hierarchical constraint autoencoder) to integrate the multi-omics data during the clustering process and a matrix factorization-based model (scMF) to predict target genes regulated by a TF. Moreover, we utilize multivariate linear regression models to predict gene regulatory networks from the multi-omics data. Additional functionalities, including differential expression, mediation analysis, and causal inference, are also incorporated into the MultiSC pipeline. Extensive experiments were conducted to evaluate the performance of MultiSC. The results demonstrate that our pipeline enables researchers to gain a comprehensive view of cell activities and gene regulatory networks by fully leveraging the potential of multiomics single-cell data. By employing MultiSC, researchers can effectively integrate and analyze diverse omics data types, enhancing their understanding of cellular processes.

Asunto(s)

Aprendizaje Profundo , Análisis de la Célula Individual , Análisis de la Célula Individual/métodos , Humanos , Factores de Transcripción/metabolismo , Factores de Transcripción/genética , Redes Reguladoras de Genes , Biología Computacional/métodos , Multiómica

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA