Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 20
Filter
1.
Life (Basel) ; 13(1)2023 Jan 04.
Article in English | MEDLINE | ID: mdl-36676093

ABSTRACT

The skin is the human body's largest organ and its cancer is considered among the most dangerous kinds of cancer. Various pathological variations in the human body can cause abnormal cell growth due to genetic disorders. These changes in human skin cells are very dangerous. Skin cancer slowly develops over further parts of the body and because of the high mortality rate of skin cancer, early diagnosis is essential. The visual checkup and the manual examination of the skin lesions are very tricky for the determination of skin cancer. Considering these concerns, numerous early recognition approaches have been proposed for skin cancer. With the fast progression in computer-aided diagnosis systems, a variety of deep learning, machine learning, and computer vision approaches were merged for the determination of medical samples and uncommon skin lesion samples. This research provides an extensive literature review of the methodologies, techniques, and approaches applied for the examination of skin lesions to date. This survey includes preprocessing, segmentation, feature extraction, selection, and classification approaches for skin cancer recognition. The results of these approaches are very impressive but still, some challenges occur in the analysis of skin lesions because of complex and rare features. Hence, the main objective is to examine the existing techniques utilized in the discovery of skin cancer by finding the obstacle that helps researchers contribute to future research.

2.
Curr Oncol ; 29(10): 7498-7511, 2022 10 07.
Article in English | MEDLINE | ID: mdl-36290867

ABSTRACT

The automated classification of brain tumors plays an important role in supporting radiologists in decision making. Recently, vision transformer (ViT)-based deep neural network architectures have gained attention in the computer vision research domain owing to the tremendous success of transformer models in natural language processing. Hence, in this study, the ability of an ensemble of standard ViT models for the diagnosis of brain tumors from T1-weighted (T1w) magnetic resonance imaging (MRI) is investigated. Pretrained and finetuned ViT models (B/16, B/32, L/16, and L/32) on ImageNet were adopted for the classification task. A brain tumor dataset from figshare, consisting of 3064 T1w contrast-enhanced (CE) MRI slices with meningiomas, gliomas, and pituitary tumors, was used for the cross-validation and testing of the ensemble ViT model's ability to perform a three-class classification task. The best individual model was L/32, with an overall test accuracy of 98.2% at 384 × 384 resolution. The ensemble of all four ViT models demonstrated an overall testing accuracy of 98.7% at the same resolution, outperforming individual model's ability at both resolutions and their ensembling at 224 × 224 resolution. In conclusion, an ensemble of ViT models could be deployed for the computer-aided diagnosis of brain tumors based on T1w CE MRI, leading to radiologist relief.


Subject(s)
Brain Neoplasms , Glioma , Humans , Magnetic Resonance Imaging/methods , Brain Neoplasms/diagnostic imaging , Neural Networks, Computer
3.
Comput Intell Neurosci ; 2022: 1672677, 2022.
Article in English | MEDLINE | ID: mdl-35965760

ABSTRACT

Hypertension is the main cause of blood pressure (BP), which further causes various cardiovascular diseases (CVDs). The recent COVID-19 pandemic raised the burden on the healthcare system and also limits the resources to these patients only. The treatment of chronic patients, especially those who suffer from CVD, has fallen behind, resulting in increased deaths from CVD around the world. Regular monitoring of BP is crucial to prevent CVDs as it can be controlled and diagnosed through constant monitoring. To find an effective and convenient procedure for the early diagnosis of CVDs, photoplethysmography (PPG) is recognized as a low-cost technology. Through PPG technology, various cardiovascular parameters, including blood pressure, heart rate, blood oxygen saturation, etc., are detected. Merging the healthcare domain with information technology (IT) is a demanding area to reduce the rehospitalization of CVD patients. In the proposed model, PPG signals from the Internet of things (IoT)-enabled wearable patient monitoring (WPM) devices are used to monitor the heart rate (HR), etc., of the patients remotely. This article investigates various machine learning techniques such as decision tree (DT), naïve Bayes (NB), and support vector machine (SVM) and the deep learning model one-dimensional convolutional neural network-long short-term memory (1D CNN-LSTM) to develop a system that assists physicians during continuous monitoring, which achieved an accuracy of 99.5% using PPG-BP data set. The proposed system provides cost-effective, efficient, and fully connected monitoring systems for cardiac patients.


Subject(s)
COVID-19 , Cardiovascular Diseases , Bayes Theorem , COVID-19/diagnosis , Cardiovascular Diseases/diagnosis , Cloud Computing , Humans , Machine Learning , Pandemics , Photoplethysmography/methods
4.
Comput Intell Neurosci ; 2022: 5475313, 2022.
Article in English | MEDLINE | ID: mdl-35602638

ABSTRACT

Machine learning (ML) often provides applicable high-performance models to facilitate decision-makers in various fields. However, this high performance is achieved at the expense of the interpretability of these models, which has been criticized by practitioners and has become a significant hindrance in their application. Therefore, in highly sensitive decisions, black boxes of ML models are not recommended. We proposed a novel methodology that uses complex supervised ML models and transforms them into simple, interpretable, transparent statistical models. This methodology is like stacking ensemble ML in which the best ML models are used as a base learner to compute relative feature weights. The index of these weights is further used as a single covariate in the simple logistic regression model to estimate the likelihood of an event. We tested this methodology on the primary dataset related to cardiovascular diseases (CVDs), the leading cause of mortalities in recent times. Therefore, early risk assessment is an important dimension that can potentially reduce the burden of CVDs and their related mortality through accurate but interpretable risk prediction models. We developed an artificial neural network and support vector machines based on ML models and transformed them into a simple statistical model and heart risk scores. These simplified models were found transparent, reliable, valid, interpretable, and approximate in predictions. The findings of this study suggest that complex supervised ML models can be efficiently transformed into simple statistical models that can also be validated.


Subject(s)
Cardiovascular Diseases , Supervised Machine Learning , Humans , Machine Learning , Neural Networks, Computer , Risk Factors , Support Vector Machine
5.
BMC Bioinformatics ; 22(Suppl 9): 105, 2021 Aug 25.
Article in English | MEDLINE | ID: mdl-34433410

ABSTRACT

BACKGROUND: Many systems biology studies leverage the integration of multiple data types (across different data sources) to offer a more comprehensive view of the biological system being studied. While SQL (Structured Query Language) databases are popular in the biomedical domain, NoSQL database technologies have been used as a more relationship-based, flexible and scalable method of data integration. RESULTS: We have created a graph database integrating data from multiple sources. In addition to using a graph-based query language (Cypher) for data retrieval, we have developed a web-based dashboard that allows users to easily browse and plot data without the need to learn Cypher. We have also implemented a visual graph query interface for users to browse graph data. Finally, we have built a prototype to allow the user to query the graph database in natural language. CONCLUSION: We have demonstrated the feasibility and flexibility of using a graph database for storing and querying immunological data with complex biological relationships. Querying a graph database through such relationships has the potential to discover novel relationships among heterogeneous biological data and metadata.


Subject(s)
Information Storage and Retrieval , Semantic Web , Databases, Factual , Language , Systems Biology
6.
Interdiscip Sci ; 13(2): 201-211, 2021 Jun.
Article in English | MEDLINE | ID: mdl-33675528

ABSTRACT

BACKGROUND: In the broader healthcare domain, the prediction bears more value than an explanation considering the cost of delays in its services. There are various risk prediction models for cardiovascular diseases (CVDs) in the literature for early risk assessment. However, the substantial increase in CVDs-related mortality is challenging global health systems, especially in developing countries. This situation allows researchers to improve CVDs prediction models using new features and risk computing methods. This study aims to assess nonclinical features that can be easily available in any healthcare systems, in predicting CVDs using advanced and flexible machine learning (ML) algorithms. METHODS: A gender-matched case-control study was conducted in the largest public sector cardiac hospital of Pakistan, and the data of 460 subjects were collected. The dataset comprised of eight nonclinical features. Four supervised ML algorithms were used to train and test the models to predict the CVDs status by considering traditional logistic regression (LR) as the baseline model. The models were validated through the train-test split (70:30) and tenfold cross-validation approaches. RESULTS: Random forest (RF), a nonlinear ML algorithm, performed better than other ML algorithms and LR. The area under the curve (AUC) of RF was 0.851 and 0.853 in the train-test split and tenfold cross-validation approach, respectively. The nonclinical features yielded an admissible accuracy (minimum 71%) through the LR and ML models, exhibiting its predictive capability in risk estimation. CONCLUSION: The satisfactory performance of nonclinical features reveals that these features and flexible computational methodologies can reinforce the existing risk prediction models for better healthcare services.


Subject(s)
Cardiovascular Diseases , Case-Control Studies , Humans , Logistic Models , Machine Learning
7.
J Public Health Res ; 9(4): 1893, 2020 Oct 14.
Article in English | MEDLINE | ID: mdl-33244464

ABSTRACT

Background: Modifiable risk factors are associated with cardiovascular mortality (CVM) which is a leading form of global mortality. However, diverse nature of urbanization and its objective measurement can modify their relationship. This study aims to investigate the moderating role of urbanization in the relationship of combined exposure (CE) of modifiable risk factors and CVM. Design and Methods: This is the first comprehensive study which considers different forms of urbanization to gauge its manifold impact. Therefore, in addition to existing original quantitative form and traditional two categories of urbanization, a new form consisted of four levels of urbanization was duly introduced. This study used data of 129 countries mainly retrieved from a WHO report, Non-Communicable Diseases Country Profile 2014. Factor scores obtained through confirmatory factor analysis were used to compute the CE. Age-income adjusted regression model for CVM was tested as a baseline with three bootstrap regression models developed for the three forms of urbanization. Results: Results revealed that the CE and CVM baseline relationship was significantly moderated through the original quantitative form of urbanization. Contrarily, the two traditional categories of urbanization could not capture the moderating impact. However, the four levels of urbanization were objectively estimated the urbanization impact and subsequently indicated that the CE was more alarming in causing the CVM in levels 2 and 3 urbanized countries, mainly from low-middle-income countries. Conclusion: This study concluded that the urbanization is a strong moderator and it could be gauged effectively through four levels whereas sufficiency of two traditional categories of urbanization is questionable.

8.
Diagnostics (Basel) ; 10(8)2020 Aug 06.
Article in English | MEDLINE | ID: mdl-32781795

ABSTRACT

Manual identification of brain tumors is an error-prone and tedious process for radiologists; therefore, it is crucial to adopt an automated system. The binary classification process, such as malignant or benign is relatively trivial; whereas, the multimodal brain tumors classification (T1, T2, T1CE, and Flair) is a challenging task for radiologists. Here, we present an automated multimodal classification method using deep learning for brain tumor type classification. The proposed method consists of five core steps. In the first step, the linear contrast stretching is employed using edge-based histogram equalization and discrete cosine transform (DCT). In the second step, deep learning feature extraction is performed. By utilizing transfer learning, two pre-trained convolutional neural network (CNN) models, namely VGG16 and VGG19, were used for feature extraction. In the third step, a correntropy-based joint learning approach was implemented along with the extreme learning machine (ELM) for the selection of best features. In the fourth step, the partial least square (PLS)-based robust covariant features were fused in one matrix. The combined matrix was fed to ELM for final classification. The proposed method was validated on the BraTS datasets and an accuracy of 97.8%, 96.9%, 92.5% for BraTs2015, BraTs2017, and BraTs2018, respectively, was achieved.

9.
J Trauma Acute Care Surg ; 89(4): 736-742, 2020 10.
Article in English | MEDLINE | ID: mdl-32773672

ABSTRACT

BACKGROUND: Trauma patients admitted to critical care are at high risk of mortality because of their injuries. Our aim was to develop a machine learning-based model to predict mortality using Fahad-Liaqat-Ahmad Intensive Machine (FLAIM) framework. We hypothesized machine learning could be applied to critically ill patients and would outperform currently used mortality scores. METHODS: The current Deep-FLAIM model evaluates the statistically significant risk factors and then supply these risk factors to deep neural network to predict mortality in trauma patients admitted to the intensive care unit (ICU). We analyzed adult patients (≥18 years) admitted to the trauma ICU in the publicly available database Medical Information Mart for Intensive Care III version 1.4. The first phase selection of risk factor was done using Cox-regression univariate and multivariate analyses. In the second phase, we applied deep neural network and other traditional machine learning models like Linear Discriminant Analysis, Gaussian Naïve Bayes, Decision Tree Model, and k-nearest neighbor models. RESULTS: We identified a total of 3,041 trauma patients admitted to the trauma surgery ICU. We observed that several clinical and laboratory-based variables were statistically significant for both univariate and multivariate analyses while others were not. With most significant being serum anion gap (hazard ratio [HR], 2.46; 95% confidence interval [CI], 1.94-3.11), sodium (HR, 2.11; 95% CI, 1.61-2.77), and chloride (HR, 2.11; 95% CI, 1.69-2.64) abnormalities on laboratories, while clinical variables included the diagnosis of sepsis (HR, 2.03; 95% CI, 1.23-3.37), Quick Sequential Organ Failure Assessment score (HR, 1.52; 95% CI, 1.32-3.76). And Systemic Inflammatory Response Syndrome criteria (HR. 1.41; 95% CI, 1.24-1.26). After we used these clinically significant variables and applied various machine learning models to the data, we found out that our proposed DNN outperformed all the other methods with test set accuracy of 92.25%, sensitivity of 79.13%, and specificity of 94.16%; positive predictive value, 66.42%; negative predictive value, 96.87%; and area under the curve of the receiver-operator curve of 0.91 (1.45-1.29). CONCLUSION: Our novel Deep-FLAIM model outperformed all other machine learning models. The model is easy to implement, user friendly and with high accuracy. LEVEL OF EVIDENCE: Prognostic study, level II.


Subject(s)
Intensive Care Units , Machine Learning , Neural Networks, Computer , Wounds and Injuries/diagnosis , Wounds and Injuries/mortality , Adult , Critical Care , Critical Illness , Female , Hospitalization , Humans , Injury Severity Score , Male , Middle Aged , Multivariate Analysis , Organ Dysfunction Scores , Prognosis , Proportional Hazards Models , ROC Curve , Retrospective Studies , Sepsis/diagnosis , Sodium/blood , Young Adult
10.
Front Big Data ; 3: 22, 2020.
Article in English | MEDLINE | ID: mdl-33693395

ABSTRACT

The Adaptive Immune Receptor Repertoire (AIRR) Community is a research-driven group that is establishing a clear set of community-accepted data and metadata standards; standards-based reference implementation tools; and policies and practices for infrastructure to support the deposit, curation, storage, and use of high-throughput sequencing data from B-cell and T-cell receptor repertoires (AIRR-seq data). The AIRR Data Commons is a distributed system of data repositories that utilizes a common data model, a common query language, and common interoperability formats for storage, query, and downloading of AIRR-seq data. Here is described the principal technical standards for the AIRR Data Commons consisting of the AIRR Data Model for repertoires and rearrangements, the AIRR Data Commons (ADC) API for programmatic query of data repositories, a reference implementation for ADC API services, and tools for querying and validating data repositories that support the ADC API. AIRR-seq data repositories can become part of the AIRR Data Commons by implementing the data model and API. The AIRR Data Commons allows AIRR-seq data to be reused for novel analyses and empowers researchers to discover new biological insights about the adaptive immune system.

11.
J Med Syst ; 44(2): 32, 2019 Dec 17.
Article in English | MEDLINE | ID: mdl-31848728

ABSTRACT

Brain tumor detection depicts a tough job because of its shape, size and appearance variations. In this manuscript, a deep learning model is deployed to predict input slices as a tumor (unhealthy)/non-tumor (healthy). This manuscript employs a high pass filter image to prominent the inhomogeneities field effect of the MR slices and fused with the input slices. Moreover, the median filter is applied to the fused slices. The resultant slices quality is improved with smoothen and highlighted edges of the input slices. After that, based on these slices' intensity, a 4-connected seed growing algorithm is applied, where optimal threshold clusters the similar pixels from the input slices. The segmented slices are then supplied to the fine-tuned two layers proposed stacked sparse autoencoder (SSAE) model. The hyperparameters of the model are selected after extensive experiments. At the first layer, 200 hidden units and at the second layer 400 hidden units are utilized. The testing is performed on the softmax layer for the prediction of the images having tumors and no tumors. The suggested model is trained and checked on BRATS datasets i.e., 2012(challenge and synthetic), 2013, and 2013 Leaderboard, 2014, and 2015 datasets. The presented model is evaluated with a number of performance metrics which demonstrates the improved performance.


Subject(s)
Algorithms , Brain Neoplasms/diagnostic imaging , Brain Neoplasms/pathology , Deep Learning , Diagnosis, Computer-Assisted/methods , Humans , Image Processing, Computer-Assisted/methods
12.
Data Brief ; 27: 104565, 2019 Dec.
Article in English | MEDLINE | ID: mdl-31656834

ABSTRACT

Fishes are most diverse group of vertebrates with more than 33000 species. These are identified based on several visual characters including their shape, color and head. It is difficult for the common people to directly identify the fish species found in the market. Classifying fish species from images based on visual characteristics using computer vision and machine learning techniques is an interesting problem for the researchers. However, the classifier's performance depends upon quality of image dataset on which it has been trained. An imagery dataset is needed to examine the classification and recognition algorithms. This article exhibits Fish-Pak: an image dataset of 6 different fish species, captured by a single camera from different pools located nearby the Head Qadirabad, Chenab River in Punjab, Pakistan. The dataset Fish-Pak are quite useful to compare various factors of classifiers such as learning rate, momentum and their impact on the overall performance. Convolutional Neural Network (CNN) is one of the most widely used architectures for image classification based on visual features. Six data classes i.e. Ctenopharyngodon idella (Grass carp), Cyprinus carpio (Common carp), Cirrhinus mrigala (Mori), Labeo rohita (Rohu), Hypophthalmichthys molitrix (Silver carp), and Catla (Thala), with a different number of images, have been included in the dataset. Fish species are captured by one camera to ensure the fair environment to all data. Fish-Pak is hosted by the Zoology Lab under the mutual affiliation of the Department of Computer Science and the Department of Zoology, University of Gujrat, Gujrat, Pakistan.

13.
Data Brief ; 26: 104340, 2019 Oct.
Article in English | MEDLINE | ID: mdl-31516936

ABSTRACT

Plants are as vulnerable by diseases as animals. Citrus is a major plant grown mainly in the tropical areas of the world due to its richness in vitamin C and other important nutrients. The production of the citrus fruit has been widely affected by citrus diseases which ultimately degrades the fruit quality and causes financial loss to the growers. During the past decade, image processing and computer vision methods have been broadly adopted for the detection and classification of plant diseases. Early detection of diseases in citrus plants helps in preventing them to spread in the orchards which minimize the financial loss to the farmers. In this article, an image dataset citrus fruits, leaves, and stem is presented. The dataset holds citrus fruits and leaves images of healthy and infected plants with diseases such as Black spot, Canker, Scab, Greening, and Melanose. Most of the images were captured in December from the Orchards in Sargodha region of Pakistan when the fruit was about to ripen and maximum diseases were found on citrus plants. The dataset is hosted by the Department of Computer Science, University of Gujrat and acquired under the mutual cooperation of the University of Gujrat and the Citrus Research Center, Government of Punjab, Pakistan. The dataset would potentially be helpful to researchers who use machine learning and computer vision algorithms to develop computer applications to help farmers in early detection of plant diseases. The dataset is freely available at https://data.mendeley.com/datasets/3f83gxmv57/2.

14.
BMC Bioinformatics ; 20(Suppl 5): 182, 2019 Apr 25.
Article in English | MEDLINE | ID: mdl-31272390

ABSTRACT

BACKGROUND: Human immunology studies often rely on the isolation and quantification of cell populations from an input sample based on flow cytometry and related techniques. Such techniques classify cells into populations based on the detection of a pattern of markers. The description of the cell populations targeted in such experiments typically have two complementary components: the description of the cell type targeted (e.g. 'T cells'), and the description of the marker pattern utilized (e.g. CD14-, CD3+). RESULTS: We here describe our attempts to use ontologies to cross-compare cell types and marker patterns (also referred to as gating definitions). We used a large set of such gating definitions and corresponding cell types submitted by different investigators into ImmPort, a central database for immunology studies, to examine the ability to parse gating definitions using terms from the Protein Ontology (PRO) and cell type descriptions, using the Cell Ontology (CL). We then used logical axioms from CL to detect discrepancies between the two. CONCLUSIONS: We suggest adoption of our proposed format for describing gating and cell type definitions to make comparisons easier. We also suggest a number of new terms to describe gating definitions in flow cytometry that are not based on molecular markers captured in PRO, but on forward- and side-scatter of light during data acquisition, which is more appropriate to capture in the Ontology for Biomedical Investigations (OBI). Finally, our approach results in suggestions on what logical axioms and new cell types could be considered for addition to the Cell Ontology.


Subject(s)
Biological Ontologies , Databases, Factual , Humans , Immune System/metabolism , Protein Subunits/metabolism , Proteins/metabolism
15.
Article in English | MEDLINE | ID: mdl-34707915

ABSTRACT

Systems biology involves the integration of multiple data types (across different data sources) to offer a more complete picture of the biological system being studied. While many existing biological databases are implemented using the traditional SQL (Structured Query Language) database technology, NoSQL database technologies have been explored as a more relationship-based, flexible and scalable method of data integration. In this paper, we describe how to use the Neo4J graph database to integrate a variety of types of data sets in the context of systems vaccinology. Specifically, we have converted into a common graph model diverse types of vaccine response measurement data from the NIH/NIAID ImmPort data repository, pathway data from Reactome, influenza virus strains from WHO, and taxonomic data from NCBI Taxon. While Neo4J provides a graph-based query language (Cypher) for data retrieval, we develop a web-based dashboard for users to easily browse and visualize data without the need to learn Cypher. In addition, we have prototyped a natural language query interface for users to interact with our system. In conclusion, we demonstrate the feasibility of using a graph-based database for storing and querying immunological data with complex biological relationships. Querying a graph database through such relationships has the potential to reveal novel relationships among heterogeneous biological data.

16.
Front Immunol ; 9: 2206, 2018.
Article in English | MEDLINE | ID: mdl-30323809

ABSTRACT

Increased interest in the immune system's involvement in pathophysiological phenomena coupled with decreased DNA sequencing costs have led to an explosion of antibody and T cell receptor sequencing data collectively termed "adaptive immune receptor repertoire sequencing" (AIRR-seq or Rep-Seq). The AIRR Community has been actively working to standardize protocols, metadata, formats, APIs, and other guidelines to promote open and reproducible studies of the immune repertoire. In this paper, we describe the work of the AIRR Community's Data Representation Working Group to develop standardized data representations for storing and sharing annotated antibody and T cell receptor data. Our file format emphasizes ease-of-use, accessibility, scalability to large data sets, and a commitment to open and transparent science. It is composed of a tab-delimited format with a specific schema. Several popular repertoire analysis tools and data repositories already utilize this AIRR-seq data format. We hope that others will follow suit in the interest of promoting interoperable standards.


Subject(s)
Antibodies/genetics , Base Sequence , Database Management Systems , Information Dissemination/methods , Receptors, Antigen, T-Cell/genetics , Adaptive Immunity/genetics , Databases, Genetic , Datasets as Topic , High-Throughput Nucleotide Sequencing/economics , Humans , Receptors, Immunologic/genetics , Research Design
17.
Front Immunol ; 9: 1877, 2018.
Article in English | MEDLINE | ID: mdl-30166985

ABSTRACT

The adaptation of high-throughput sequencing to the B cell receptor and T cell receptor has made it possible to characterize the adaptive immune receptor repertoire (AIRR) at unprecedented depth. These AIRR sequencing (AIRR-seq) studies offer tremendous potential to increase the understanding of adaptive immune responses in vaccinology, infectious disease, autoimmunity, and cancer. The increasingly wide application of AIRR-seq is leading to a critical mass of studies being deposited in the public domain, offering the possibility of novel scientific insights through secondary analyses and meta-analyses. However, effective sharing of these large-scale data remains a challenge. The AIRR community has proposed minimal information about adaptive immune receptor repertoire (MiAIRR), a standard for reporting AIRR-seq studies. The MiAIRR standard has been operationalized using the National Center for Biotechnology Information (NCBI) repositories. Submissions of AIRR-seq data to the NCBI repositories typically use a combination of web-based and flat-file templates and include only a minimal amount of terminology validation. As a result, AIRR-seq studies at the NCBI are often described using inconsistent terminologies, limiting scientists' ability to access, find, interoperate, and reuse the data sets. In order to improve metadata quality and ease submission of AIRR-seq studies to the NCBI, we have leveraged the software framework developed by the Center for Expanded Data Annotation and Retrieval (CEDAR), which develops technologies involving the use of data standards and ontologies to improve metadata quality. The resulting CEDAR-AIRR (CAIRR) pipeline enables data submitters to: (i) create web-based templates whose entries are controlled by ontology terms, (ii) generate and validate metadata, and (iii) submit the ontology-linked metadata and sequence files (FASTQ) to the NCBI BioProject, BioSample, and Sequence Read Archive databases. Overall, CAIRR provides a web-based metadata submission interface that supports compliance with the MiAIRR standard. This pipeline is available at http://cairr.miairr.org, and will facilitate the NCBI submission process and improve the metadata quality of AIRR-seq studies.


Subject(s)
Computational Biology/methods , Databases, Nucleic Acid , Receptors, Antigen, B-Cell/genetics , Receptors, Antigen, T-Cell/genetics , Software , Computational Biology/organization & administration , Data Mining , Gene Ontology , Humans , Metadata , Reproducibility of Results , User-Computer Interface , Workflow
18.
BMC Bioinformatics ; 19(1): 268, 2018 07 16.
Article in English | MEDLINE | ID: mdl-30012108

ABSTRACT

BACKGROUND: Public biomedical data repositories often provide web-based interfaces to collect experimental metadata. However, these interfaces typically reflect the ad hoc metadata specification practices of the associated repositories, leading to a lack of standardization in the collected metadata. This lack of standardization limits the ability of the source datasets to be broadly discovered, reused, and integrated with other datasets. To increase reuse, discoverability, and reproducibility of the described experiments, datasets should be appropriately annotated by using agreed-upon terms, ideally from ontologies or other controlled term sources. RESULTS: This work presents "CEDAR OnDemand", a browser extension powered by the NCBO (National Center for Biomedical Ontology) BioPortal that enables users to seamlessly enter ontology-based metadata through existing web forms native to individual repositories. CEDAR OnDemand analyzes the web page contents to identify the text input fields and associate them with relevant ontologies which are recommended automatically based upon input fields' labels (using the NCBO ontology recommender) and a pre-defined list of ontologies. These field-specific ontologies are used for controlling metadata entry. CEDAR OnDemand works for any web form designed in the HTML format. We demonstrate how CEDAR OnDemand works through the NCBI (National Center for Biotechnology Information) BioSample web-based metadata entry. CONCLUSION: CEDAR OnDemand helps lower the barrier of incorporating ontologies into standardized metadata entry for public data repositories. CEDAR OnDemand is available freely on the Google Chrome store https://chrome.google.com/webstore/search/CEDAROnDemand.


Subject(s)
Biological Ontologies , Internet , Metadata , Software , Algorithms , Humans
19.
Front Immunol ; 8: 1418, 2017.
Article in English | MEDLINE | ID: mdl-29163494

ABSTRACT

High-throughput sequencing (HTS) of immunoglobulin (B-cell receptor, antibody) and T-cell receptor repertoires has increased dramatically since the technique was introduced in 2009 (1-3). This experimental approach explores the maturation of the adaptive immune system and its response to antigens, pathogens, and disease conditions in exquisite detail. It holds significant promise for diagnostic and therapy-guiding applications. New technology often spreads rapidly, sometimes more rapidly than the understanding of how to make the products of that technology reliable, reproducible, or usable by others. As complex technologies have developed, scientific communities have come together to adopt common standards, protocols, and policies for generating and sharing data sets, such as the MIAME protocols developed for microarray experiments. The Adaptive Immune Receptor Repertoire (AIRR) Community formed in 2015 to address similar issues for HTS data of immune repertoires. The purpose of this perspective is to provide an overview of the AIRR Community's founding principles and present the progress that the AIRR Community has made in developing standards of practice and data sharing protocols. Finally, and most important, we invite all interested parties to join this effort to facilitate sharing and use of these powerful data sets (join@airr-community.org).

SELECTION OF CITATIONS
SEARCH DETAIL
...