RESUMO
The design of RNA plays a crucial role in developing RNA vaccines, nucleic acid therapeutics, and innovative biotechnological tools. However, existing techniques frequently lack versatility across various tasks and are dependent on pre-defined secondary structure or other prior knowledge. To address these limitations, we introduce GenerRNA, a Transformer-based model inspired by the success of large language models (LLMs) in protein and molecule generation. GenerRNA is pre-trained on large-scale RNA sequences and capable of generating novel RNA sequences with stable secondary structures, while ensuring distinctiveness from existing sequences, thereby expanding our exploration of the RNA space. Moreover, GenerRNA can be fine-tuned on smaller, specialized datasets for specific subtasks, enabling the generation of RNAs with desired functionalities or properties without requiring any prior knowledge input. As a demonstration, we fine-tuned GenerRNA and successfully generated novel RNA sequences exhibiting high affinity for target proteins. Our work is the first application of a generative language model to RNA generation, presenting an innovative approach to RNA design.
Assuntos
Conformação de Ácido Nucleico , RNA , RNA/química , RNA/genéticaRESUMO
BACKGROUND: Human health status can be measured on the basis of many different parameters. Statistical relationships among these different health parameters will enable several possible health care applications and an approximation of the current health status of individuals, which will allow for more personalized and preventive health care by informing the potential risks and developing personalized interventions. Furthermore, a better understanding of the modifiable risk factors related to lifestyle, diet, and physical activity will facilitate the design of optimal treatment approaches for individuals. OBJECTIVE: This study aims to provide a high-dimensional, cross-sectional data set of comprehensive health care information to construct a combined statistical model as a single joint probability distribution and enable further studies on individual relationships among the multidimensional data obtained. METHODS: In this cross-sectional observational study, data were collected from a population of 1000 adult men and women (aged ≥20 years) matching the age ratio of the typical adult Japanese population. Data include biochemical and metabolic profiles from blood, urine, saliva, and oral glucose tolerance tests; bacterial profiles from feces, facial skin, scalp skin, and saliva; messenger RNA, proteome, and metabolite analyses of facial and scalp skin surface lipids; lifestyle surveys and questionnaires; physical, motor, cognitive, and vascular function analyses; alopecia analysis; and comprehensive analyses of body odor components. Statistical analyses will be performed in 2 modes: one to train a joint probability distribution by combining a commercially available health care data set containing large amounts of relatively low-dimensional data with the cross-sectional data set described in this paper and another to individually investigate the relationships among the variables obtained in this study. RESULTS: Recruitment for this study started in October 2021 and ended in February 2022, with a total of 997 participants enrolled. The collected data will be used to build a joint probability distribution called a Virtual Human Generative Model. Both the model and the collected data are expected to provide information on the relationships between various health statuses. CONCLUSIONS: As different degrees of health status correlations are expected to differentially affect individual health status, this study will contribute to the development of empirically justified interventions based on the population. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID): DERR1-10.2196/47024.
RESUMO
BACKGROUND: Noninvasive detection of early stage cancers with accurate prediction of tumor tissue-of-origin could improve patient prognosis. Because miRNA profiles differ between organs, circulating miRNomics represent a promising method for early detection of cancers, but this has not been shown conclusively. METHODS: A serum miRNA profile (miRNomes)-based classifier was evaluated for its ability to discriminate cancer types using advanced machine learning. The training set comprised 7931 serum samples from patients with 13 types of solid cancers and 5013 noncancer samples. The validation set consisted of 1990 cancer and 1256 noncancer samples. The contribution of each miRNA to the cancer-type classification was evaluated, and those with a high contribution were identified. RESULTS: Cancer type was predicted with an accuracy of 0.88 (95% confidence interval [CI] = 0.87 to 0.90) in all stages and an accuracy of 0.90 (95% CI = 0.88 to 0.91) in resectable stages (stages 0-II). The F1 score for the discrimination of the 13 cancer types was 0.93. Optimal classification performance was achieved with at least 100 miRNAs that contributed the strongest to accurate prediction of cancer type. Assessment of tissue expression patterns of these miRNAs suggested that miRNAs secreted from the tumor environment could be used to establish cancer type-specific serum miRNomes. CONCLUSIONS: This study demonstrates that large-scale serum miRNomics in combination with machine learning could lead to the development of a blood-based cancer classification system. Further investigations of the regulating mechanisms of the miRNAs that contributed strongly to accurate prediction of cancer type could pave the way for the clinical use of circulating miRNA diagnostics.