RESUMEN
UNLABELLED: Since tens of millions of chemical compounds have been accumulated in public chemical databases, fast comprehensive computational methods to predict interactions between chemical compounds and proteins are needed for virtual screening of lead compounds. Previously, we proposed a novel method for predicting protein-chemical interactions using two-layer Support Vector Machine classifiers that require only readily available biochemical data, i.e. amino acid sequences of proteins and structure formulas of chemical compounds. In this article, the method has been implemented as the COPICAT web service, with an easy-to-use front-end interface. Users can simply submit a protein-chemical interaction prediction job using a pre-trained classifier, or can even train their own classification model by uploading training data. COPICAT's fast and accurate computational prediction has enhanced lead compound discovery against a database of tens of millions of chemical compounds, implying that the search space for drug discovery is extended by >1000 times compared with currently well-used high-throughput screening methodologies. AVAILABILITY: The COPICAT server is available at http://copicat.dna.bio.keio.ac.jp. All functions, including the prediction function are freely available via anonymous login without registration. Registered users, however, can use the system more intensively.
Asunto(s)
Bases de Datos Factuales , Ligandos , Proteínas/metabolismo , Programas Informáticos , Máquina de Vectores de Soporte , Unión Proteica , Proteínas/químicaRESUMEN
BACKGROUND: Human health status can be measured on the basis of many different parameters. Statistical relationships among these different health parameters will enable several possible health care applications and an approximation of the current health status of individuals, which will allow for more personalized and preventive health care by informing the potential risks and developing personalized interventions. Furthermore, a better understanding of the modifiable risk factors related to lifestyle, diet, and physical activity will facilitate the design of optimal treatment approaches for individuals. OBJECTIVE: This study aims to provide a high-dimensional, cross-sectional data set of comprehensive health care information to construct a combined statistical model as a single joint probability distribution and enable further studies on individual relationships among the multidimensional data obtained. METHODS: In this cross-sectional observational study, data were collected from a population of 1000 adult men and women (aged ≥20 years) matching the age ratio of the typical adult Japanese population. Data include biochemical and metabolic profiles from blood, urine, saliva, and oral glucose tolerance tests; bacterial profiles from feces, facial skin, scalp skin, and saliva; messenger RNA, proteome, and metabolite analyses of facial and scalp skin surface lipids; lifestyle surveys and questionnaires; physical, motor, cognitive, and vascular function analyses; alopecia analysis; and comprehensive analyses of body odor components. Statistical analyses will be performed in 2 modes: one to train a joint probability distribution by combining a commercially available health care data set containing large amounts of relatively low-dimensional data with the cross-sectional data set described in this paper and another to individually investigate the relationships among the variables obtained in this study. RESULTS: Recruitment for this study started in October 2021 and ended in February 2022, with a total of 997 participants enrolled. The collected data will be used to build a joint probability distribution called a Virtual Human Generative Model. Both the model and the collected data are expected to provide information on the relationships between various health statuses. CONCLUSIONS: As different degrees of health status correlations are expected to differentially affect individual health status, this study will contribute to the development of empirically justified interventions based on the population. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID): DERR1-10.2196/47024.
RESUMEN
Recent studies have demonstrated the usefulness of convolutional neural networks (CNNs) to classify images of melanoma, with accuracies comparable to those achieved by dermatologists. However, the performance of a CNN trained with only clinical images of a pigmented skin lesion in a clinical image classification task, in competition with dermatologists, has not been reported to date. In this study, we extracted 5846 clinical images of pigmented skin lesions from 3551 patients. Pigmented skin lesions included malignant tumors (malignant melanoma and basal cell carcinoma) and benign tumors (nevus, seborrhoeic keratosis, senile lentigo, and hematoma/hemangioma). We created the test dataset by randomly selecting 666 patients out of them and picking one image per patient, and created the training dataset by giving bounding-box annotations to the rest of the images (4732 images, 2885 patients). Subsequently, we trained a faster, region-based CNN (FRCNN) with the training dataset and checked the performance of the model on the test dataset. In addition, ten board-certified dermatologists (BCDs) and ten dermatologic trainees (TRNs) took the same tests, and we compared their diagnostic accuracy with FRCNN. For six-class classification, the accuracy of FRCNN was 86.2%, and that of the BCDs and TRNs was 79.5% (p = 0.0081) and 75.1% (p < 0.00001), respectively. For two-class classification (benign or malignant), the accuracy, sensitivity, and specificity were 91.5%, 83.3%, and 94.5% by FRCNN; 86.6%, 86.3%, and 86.6% by BCD; and 85.3%, 83.5%, and 85.9% by TRN, respectively. False positive rates and positive predictive values were 5.5% and 84.7% by FRCNN, 13.4% and 70.5% by BCD, and 14.1% and 68.5% by TRN, respectively. We compared the classification performance of FRCNN with 20 dermatologists. As a result, the classification accuracy of FRCNN was better than that of the dermatologists. In the future, we plan to implement this system in society and have it used by the general public, in order to improve the prognosis of skin cancer.