Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 76
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Brief Bioinform ; 23(4)2022 07 18.
Artigo em Inglês | MEDLINE | ID: mdl-35649392

RESUMO

RNA binding proteins (RBPs) are critical for the post-transcriptional control of RNAs and play vital roles in a myriad of biological processes, such as RNA localization and gene regulation. Therefore, computational methods that are capable of accurately identifying RBPs are highly desirable and have important implications for biomedical and biotechnological applications. Here, we propose a two-stage deep transfer learning-based framework, termed RBP-TSTL, for accurate prediction of RBPs. In the first stage, the knowledge from the self-supervised pre-trained model was extracted as feature embeddings and used to represent the protein sequences, while in the second stage, a customized deep learning model was initialized based on an annotated pre-training RBPs dataset before being fine-tuned on each corresponding target species dataset. This two-stage transfer learning framework can enable the RBP-TSTL model to be effectively trained to learn and improve the prediction performance. Extensive performance benchmarking of the RBP-TSTL models trained using the features generated by the self-supervised pre-trained model and other models trained using hand-crafting encoding features demonstrated the effectiveness of the proposed two-stage knowledge transfer strategy based on the self-supervised pre-trained models. Using the best-performing RBP-TSTL models, we further conducted genome-scale RBP predictions for Homo sapiens, Arabidopsis thaliana, Escherichia coli, and Salmonella and established a computational compendium containing all the predicted putative RBPs candidates. We anticipate that the proposed RBP-TSTL approach will be explored as a useful tool for the characterization of RNA-binding proteins and exploration of their sequence-structure-function relationships.


Assuntos
Proteínas de Ligação a RNA , RNA , Sítios de Ligação/genética , Genoma , Humanos , Aprendizado de Máquina , RNA/química , Proteínas de Ligação a RNA/metabolismo , Análise de Sequência de RNA/métodos
2.
Brief Bioinform ; 23(2)2022 03 10.
Artigo em Inglês | MEDLINE | ID: mdl-35176756

RESUMO

Protein secretion has a pivotal role in many biological processes and is particularly important for intercellular communication, from the cytoplasm to the host or external environment. Gram-positive bacteria can secrete proteins through multiple secretion pathways. The non-classical secretion pathway has recently received increasing attention among these secretion pathways, but its exact mechanism remains unclear. Non-classical secreted proteins (NCSPs) are a class of secreted proteins lacking signal peptides and motifs. Several NCSP predictors have been proposed to identify NCSPs and most of them employed the whole amino acid sequence of NCSPs to construct the model. However, the sequence length of different proteins varies greatly. In addition, not all regions of the protein are equally important and some local regions are not relevant to the secretion. The functional regions of the protein, particularly in the N- and C-terminal regions, contain important determinants for secretion. In this study, we propose a new hybrid deep learning-based framework, referred to as ASPIRER, which improves the prediction of NCSPs from amino acid sequences. More specifically, it combines a whole sequence-based XGBoost model and an N-terminal sequence-based convolutional neural network model; 5-fold cross-validation and independent tests demonstrate that ASPIRER achieves superior performance than existing state-of-the-art approaches. The source code and curated datasets of ASPIRER are publicly available at https://github.com/yanwu20/ASPIRER/. ASPIRER is anticipated to be a useful tool for improved prediction of novel putative NCSPs from sequences information and prioritization of candidate proteins for follow-up experimental validation.


Assuntos
Aprendizado Profundo , Sequência de Aminoácidos , Biologia Computacional , Redes Neurais de Computação , Proteínas/química , Software
3.
BMC Neurol ; 24(1): 71, 2024 Feb 20.
Artigo em Inglês | MEDLINE | ID: mdl-38378514

RESUMO

BACKGROUND: Little is known regarding the leading risk factors for dementia/Alzheimer's disease (AD) in individuals with and without APOE4. The identification of key risk factors for dementia/Alzheimer's disease (AD) in individuals with and without the APOE4 gene is of significant importance in global health. METHODS: Our analysis included 110,354 APOE4 carriers and 220,708 age- and sex-matched controls aged 40-73 years at baseline (between 2006-2010) from UK Biobank. Incident dementia was ascertained using hospital inpatient, or death records until January 2021. Individuals of non-European ancestry were excluded. Furthermore, individuals without medical record linkage were excluded from the analysis. Moderation analysis was tested for 134 individual factors. RESULTS: During a median follow-up of 11.9 years, 4,764 cases of incident all-cause dementia and 2065 incident AD cases were documented. Hazard ratios (95% CIs) for all-cause dementia and AD associated with APOE4 were 2.70(2.55-2.85) and 3.72(3.40-4.07), respectively. In APOE4 carriers, the leading risk factors for all-cause dementia included low self-rated overall health, low household income, high multimorbidity risk score, long-term illness, high neutrophil percentage, and high nitrogen dioxide air pollution. In non-APOE4 carriers, the leading risk factors included high multimorbidity risk score, low overall self-rated health, low household income, long-term illness, high microalbumin in urine, high neutrophil count, and low greenspace percentage. Population attributable risk for these individual risk factors combined was 65.1%, and 85.8% in APOE4 and non-APOE4 carriers, respectively. For 20 risk factors including multimorbidity risk score, unhealthy lifestyle habits, and particulate matter air pollutants, their associations with incident dementia were stronger in non-APOE4 carriers. For only 2 risk factors (mother's history of dementia, low C-reactive protein), their associations with incident all-cause dementia were stronger in APOE4 carriers. CONCLUSIONS: Our findings provide evidence for personalized preventative approaches to dementia/AD in APOE4 and non-APOE4 carriers. A mother's history of dementia and low levels of C-reactive protein were more important risk factors of dementia in APOE4 carriers whereas leading risk factors including unhealthy lifestyle habits, multimorbidity risk score, inflammation and immune-related markers were more predictive of dementia in non-APOE4 carriers.


Assuntos
Doença de Alzheimer , Humanos , Doença de Alzheimer/genética , Apolipoproteína E4/genética , Biomarcadores , Proteína C-Reativa/análise , Genótipo , Estudos Retrospectivos
4.
Retina ; 44(3): 527-536, 2024 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-37972986

RESUMO

PURPOSE: To investigate fundus tessellation density (TD) and its association with axial length (AL) elongation and spherical equivalent (SE) progression in children. METHODS: The school-based prospective cohort study enrolled 1,997 individuals aged 7 to 9 years in 11 elementary schools in Mojiang, China. Cycloplegic refraction and biometry were performed at baseline and 4-year visits. The baseline fundus photographs were taken, and TD, defined as the percentage of exposed choroidal vessel area in the photographs, was quantified using an artificial intelligence-assisted semiautomatic labeling approach. After the exclusion of 330 ineligible participants because of loss to follow-up or ineligible fundus photographs, logistic models were used to assess the association of TD with rapid AL elongation (>0.36 mm/year) and SE progression (>1.00 D/year). RESULTS: The prevalence of tessellation was 477 of 1,667 (28.6%) and mean TD was 0.008 ± 0.019. The mean AL elongation and SE progression in 4 years were 0.90 ± 0.58 mm and -1.09 ± 1.25 D. Higher TD was associated with longer baseline AL (ß, 0.030; 95% confidence interval: 0.015-0.046; P < 0.001) and more myopic baseline SE (ß, -0.017; 95% confidence interval: -0.032 to -0.002; P = 0.029). Higher TD was associated with rapid AL elongation (odds ratio, 1.128; 95% confidence interval: 1.055-1.207; P < 0.001) and SE progression (odds ratio, 1.123; 95% confidence interval: 1.020-1.237; P = 0.018). CONCLUSION: Tessellation density is a potential indicator of rapid AL elongation and refractive progression in children. TD measurement could be a routine to monitor AL elongation.


Assuntos
Inteligência Artificial , Miopia , Criança , Humanos , Estudos Prospectivos , Refração Ocular , Testes Visuais , Miopia/diagnóstico , Miopia/epidemiologia , Comprimento Axial do Olho
5.
Brief Bioinform ; 22(2): 2126-2140, 2021 03 22.
Artigo em Inglês | MEDLINE | ID: mdl-32363397

RESUMO

Promoters are short consensus sequences of DNA, which are responsible for transcription activation or the repression of all genes. There are many types of promoters in bacteria with important roles in initiating gene transcription. Therefore, solving promoter-identification problems has important implications for improving the understanding of their functions. To this end, computational methods targeting promoter classification have been established; however, their performance remains unsatisfactory. In this study, we present a novel stacked-ensemble approach (termed SELECTOR) for identifying both promoters and their respective classification. SELECTOR combined the composition of k-spaced nucleic acid pairs, parallel correlation pseudo-dinucleotide composition, position-specific trinucleotide propensity based on single-strand, and DNA strand features and using five popular tree-based ensemble learning algorithms to build a stacked model. Both 5-fold cross-validation tests using benchmark datasets and independent tests using the newly collected independent test dataset showed that SELECTOR outperformed state-of-the-art methods in both general and specific types of promoter prediction in Escherichia coli. Furthermore, this novel framework provides essential interpretations that aid understanding of model success by leveraging the powerful Shapley Additive exPlanation algorithm, thereby highlighting the most important features relevant for predicting both general and specific types of promoters and overcoming the limitations of existing 'Black-box' approaches that are unable to reveal causal relationships from large amounts of initially encoded features.


Assuntos
Escherichia coli/genética , Aprendizado de Máquina , Regiões Promotoras Genéticas , Conjuntos de Dados como Assunto , Genes Bacterianos , Reprodutibilidade dos Testes
6.
Diabet Med ; 40(2): e14966, 2023 02.
Artigo em Inglês | MEDLINE | ID: mdl-36177651

RESUMO

AIMS: To investigate the association of type 1 diabetes (T1D) and age at diagnosis of type 2 diabetes (T2D) with brain structure and incident dementia. METHODS: Our analysis was based on the UK Biobank. We included 1376 participants with diabetes and 2752 randomly selected controls for brain volume analysis, and 25,141 participants with diabetes and 50,282 randomly selected controls for dementia analysis. Brain volume was measured using magnetic resonance imaging. Dementia was identified using hospital inpatient records and mortality register data until January 2021. RESULTS: T2D diagnosed at a younger age was associated with larger reductions in brain volume. After adjustment for glycated haemoglobin (HbA1c) and other covariates, only T2D diagnosed <50 years was associated with smaller total brain volume (ß (95% CI): -14.56 (-24.67, -4.44) ml), and grey (-6.47[-12.75, -0.20] ml) and white matter volumes (-8.08[-14.66, -1.51] ml). Corresponding numbers for total brain, grey matter and white matter volumes associated with T1D were -62.86 (-93.71,-32.01), -34.27 (-53.72, -14.83), and -28.59 (-47.65, -9.52) ml, respectively. During a median follow-up of 11.9 years, 2035 new dementia cases were identified. Younger age at diagnosis of T2D was associated with larger excessive risk of dementia, whereas T2D diagnosed <50 years was associated with the largest hazard ratio (HR) (95% CI: 2.03[1.53-2.69]) in the multivariable analysis. The HR (95% CI) for dementia associated with T1D was 2.08 (1.40-3.09). CONCLUSION: Individuals with T1D or T2D diagnosed at younger age are at larger excessive risk of brain volume reduction and dementia.


Assuntos
Demência , Diabetes Mellitus Tipo 1 , Diabetes Mellitus Tipo 2 , Humanos , Pessoa de Meia-Idade , Diabetes Mellitus Tipo 2/complicações , Diabetes Mellitus Tipo 2/epidemiologia , Diabetes Mellitus Tipo 2/patologia , Diabetes Mellitus Tipo 1/complicações , Diabetes Mellitus Tipo 1/epidemiologia , Diabetes Mellitus Tipo 1/patologia , Estudos Prospectivos , Bancos de Espécimes Biológicos , Vida Independente , Encéfalo/diagnóstico por imagem , Encéfalo/patologia , Demência/diagnóstico por imagem , Demência/epidemiologia , Reino Unido/epidemiologia , Fatores de Risco
7.
J Biomed Inform ; 138: 104281, 2023 02.
Artigo em Inglês | MEDLINE | ID: mdl-36638935

RESUMO

Interpreting medical images such as chest X-ray images and retina images is an essential step for diagnosing and treating relevant diseases. Proposing automatic and reliable medical report generation systems can reduce the time-consuming workload, improve efficiencies of clinical workflows, and decrease practical variations between different clinical professionals. Many recent approaches based on image-encoder and language-decoder structure have been proposed to tackle this task. However, some technical challenges remain to be solved, including the fusion efficacy between the language and visual cues and the difficulty of obtaining an effective pre-trained image feature extractor for medical-specific tasks. In this work, we proposed the weighted query-key interacting attention module, including both the second-order and first-order interactions. Compared with the conventional scaled dot-product attention, this design generates a strong fusion mechanism between language and visual signals. In addition, we also proposed the contrastive pre-training step to reduce the domain gap between the image encoder and the target dataset. To test the generalizability of our learning scheme, we collected and verified our model on the world-first multi-modality retina report generation dataset referred to as Retina ImBank and another large-scale retina Chinese-based report dataset referred to as Retina Chinese. These two datasets will be made publicly available and serve as benchmarks to encourage further research exploration in this field. From our experimental results, we demonstrate that our proposed method has outperformed multiple state-of-the-art image captioning and medical report generation methods on IU X-RAY, MIMIC-CXR, Retina ImBank, and Retina Chinese datasets.


Assuntos
Benchmarking , Idioma , Aprendizagem , Prontuários Médicos , Registros
8.
Int J Mol Sci ; 24(19)2023 Sep 27.
Artigo em Inglês | MEDLINE | ID: mdl-37834093

RESUMO

Epilepsy is a group of brain disorders characterised by an enduring predisposition to generate unprovoked seizures. Fuelled by advances in sequencing technologies and computational approaches, more than 900 genes have now been implicated in epilepsy. The development and optimisation of tools and methods for analysing the vast quantity of genomic data is a rapidly evolving area of research. Deep learning (DL) is a subset of machine learning (ML) that brings opportunity for novel investigative strategies that can be harnessed to gain new insights into the genomic risk of people with epilepsy. DL is being harnessed to address limitations in accuracy of long-read sequencing technologies, which improve on short-read methods. Tools that predict the functional consequence of genetic variation can represent breaking ground in addressing critical knowledge gaps, while methods that integrate independent but complimentary data enhance the predictive power of genetic data. We provide an overview of these DL tools and discuss how they may be applied to the analysis of genetic data for epilepsy research.


Assuntos
Aprendizado Profundo , Epilepsia , Humanos , Epilepsia/genética , Convulsões , Genômica/métodos , Aprendizado de Máquina
9.
BMC Med ; 20(1): 185, 2022 05 27.
Artigo em Inglês | MEDLINE | ID: mdl-35619136

RESUMO

BACKGROUND: Little is known regarding life-course trajectories of important diseases. We aimed to identify diseases that were strongly associated with mortality and test temporal trajectories of these diseases before mortality. METHODS: Our analysis was based on UK Biobank. Diseases were identified using questionnaires, nurses' interviews, or inpatient data. Mortality register data were used to identify mortality up to January 2021. The association between 60 individual diseases at baseline and in the life course and incident mortality was examined using Cox proportional regression models. Those diseases with great contribution to mortality were identified and disease trajectories in life course were then derived. RESULTS: During a median follow-up of 11.8 years, 31,373 individuals (median age at death (interquartile range): 70.7 (65.3-74.8) years, 59.4% male) died of all-cause mortality (with complete data on diagnosis date of disease), with 16,237 dying with cancer and 6702 with cardiovascular disease (CVD). We identified 37 diseases including cancers and heart diseases that were associated with an increased risk of mortality independent of other diseases (hazard ratio ranged from 1.09 to 7.77). Among those who died during follow-up, 2.2% did not have a diagnosis of any disease of interest and 90.1% were diagnosed with two or more diseases in their life course. Individuals who were diagnosed with more diseases in their life course were more likely to have longer longevity. Cancer was more likely to be diagnosed following hypertension, hypercholesterolemia, CVD, or digestive disorders and more likely to be diagnosed ahead of CVD, chronic kidney disease (CKD), or digestive disorders. CVD was more likely to be diagnosed following hypertension, hypercholesterolemia, or digestive disorders and more likely to be diagnosed ahead of cancer or CKD. Hypertension was more likely to precede other diseases, and CKD was more likely to be diagnosed as the last disease before more mortality. CONCLUSIONS: There are significant interplays between cancer and CVD for mortality. Cancer and CVD were frequently clustered with hypertension, CKD, and digestive disorders with CKD highly being diagnosed as the last disease in the life course. Our findings underline the importance of health checks among middle-aged adults for the prevention of premature mortality.


Assuntos
Doenças Cardiovasculares , Hipercolesterolemia , Hipertensão , Insuficiência Renal Crônica , Adulto , Bancos de Espécimes Biológicos , Doenças Cardiovasculares/etiologia , Feminino , Humanos , Hipercolesterolemia/complicações , Hipertensão/complicações , Acontecimentos que Mudam a Vida , Masculino , Pessoa de Meia-Idade , Mortalidade Prematura , Insuficiência Renal Crônica/complicações , Fatores de Risco , Reino Unido/epidemiologia
10.
Bioinformatics ; 37(21): 3986-3988, 2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-34061168

RESUMO

MOTIVATION: Tumor tile selection is a necessary prerequisite in patch-based cancer whole slide image analysis, which is labor-intensive and requires expertise. Whole slides are annotated as tumor or tumor free, but tiles within a tumor slide are not. As all tiles within a tumor free slide are tumor free, these can be used to capture tumor-free patterns using the one-class learning strategy. RESULTS: We present a Python package, termed OCTID, which combines a pretrained convolutional neural network (CNN) model, Uniform Manifold Approximation and Projection (UMAP) and one-class support vector machine to achieve accurate tumor tile classification using a training set of tumor free tiles. Benchmarking experiments on four H&E image datasets achieved remarkable performance in terms of F1-score (0.90 ± 0.06), Matthews correlation coefficient (0.93 ± 0.05) and accuracy (0.94 ± 0.03). AVAILABILITY AND IMPLEMENTATION: Detailed information can be found in the Supplementary File. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Processamento de Imagem Assistida por Computador , Neoplasias , Redes Neurais de Computação , Linguagens de Programação , Neoplasias/diagnóstico por imagem , Humanos , Processamento de Imagem Assistida por Computador/métodos , Aprendizado de Máquina , Conjuntos de Dados como Assunto
11.
Age Ageing ; 51(12)2022 12 05.
Artigo em Inglês | MEDLINE | ID: mdl-36580391

RESUMO

BACKGROUND: the Cardiovascular Risk Factors, Aging, and Incidence of Dementia (CAIDE) dementia risk score is a recognised tool for dementia risk stratification. However, its application is limited due to the requirements for multidimensional information and fasting blood draw. Consequently, an effective and non-invasive tool for screening individuals with high dementia risk in large population-based settings is urgently needed. METHODS: a deep learning algorithm based on fundus photographs for estimating the CAIDE dementia risk score was developed and internally validated by a medical check-up dataset included 271,864 participants in 19 province-level administrative regions of China, and externally validated based on an independent dataset included 20,690 check-up participants in Beijing. The performance for identifying individuals with high dementia risk (CAIDE dementia risk score ≥ 10 points) was evaluated by area under the receiver operating curve (AUC) with 95% confidence interval (CI). RESULTS: the algorithm achieved an AUC of 0.944 (95% CI: 0.939-0.950) in the internal validation group and 0.926 (95% CI: 0.913-0.939) in the external group, respectively. Besides, the estimated CAIDE dementia risk score derived from the algorithm was significantly associated with both comprehensive cognitive function and specific cognitive domains. CONCLUSIONS: this algorithm trained via fundus photographs could well identify individuals with high dementia risk in a population setting. Therefore, it has the potential to be utilised as a non-invasive and more expedient method for dementia risk stratification. It might also be adopted in dementia clinical trials, incorporated as inclusion criteria to efficiently select eligible participants.


Assuntos
Aprendizado Profundo , Demência , Humanos , Demência/diagnóstico , Demência/epidemiologia , Demência/psicologia , Envelhecimento/psicologia , Fatores de Risco , Cognição
12.
J Med Internet Res ; 24(8): e37850, 2022 08 25.
Artigo em Inglês | MEDLINE | ID: mdl-36006685

RESUMO

BACKGROUND: HIV and sexually transmitted infections (STIs) are major global public health concerns. Over 1 million curable STIs occur every day among people aged 15 years to 49 years worldwide. Insufficient testing or screening substantially impedes the elimination of HIV and STI transmission. OBJECTIVE: The aim of our study was to develop an HIV and STI risk prediction tool using machine learning algorithms. METHODS: We used clinic consultations that tested for HIV and STIs at the Melbourne Sexual Health Centre between March 2, 2015, and December 31, 2018, as the development data set (training and testing data set). We also used 2 external validation data sets, including data from 2019 as external "validation data 1" and data from January 2020 and January 2021 as external "validation data 2." We developed 34 machine learning models to assess the risk of acquiring HIV, syphilis, gonorrhea, and chlamydia. We created an online tool to generate an individual's risk of HIV or an STI. RESULTS: The important predictors for HIV and STI risk were gender, age, men who reported having sex with men, number of casual sexual partners, and condom use. Our machine learning-based risk prediction tool, named MySTIRisk, performed at an acceptable or excellent level on testing data sets (area under the curve [AUC] for HIV=0.78; AUC for syphilis=0.84; AUC for gonorrhea=0.78; AUC for chlamydia=0.70) and had stable performance on both external validation data from 2019 (AUC for HIV=0.79; AUC for syphilis=0.85; AUC for gonorrhea=0.81; AUC for chlamydia=0.69) and data from 2020-2021 (AUC for HIV=0.71; AUC for syphilis=0.84; AUC for gonorrhea=0.79; AUC for chlamydia=0.69). CONCLUSIONS: Our web-based risk prediction tool could accurately predict the risk of HIV and STIs for clinic attendees using simple self-reported questions. MySTIRisk could serve as an HIV and STI screening tool on clinic websites or digital health platforms to encourage individuals at risk of HIV or an STI to be tested or start HIV pre-exposure prophylaxis. The public can use this tool to assess their risk and then decide if they would attend a clinic for testing. Clinicians or public health workers can use this tool to identify high-risk individuals for further interventions.


Assuntos
Infecções por Chlamydia , Gonorreia , Infecções por HIV , Infecções Sexualmente Transmissíveis , Sífilis , Algoritmos , Infecções por Chlamydia/diagnóstico , Gonorreia/diagnóstico , Infecções por HIV/diagnóstico , Infecções por HIV/epidemiologia , Infecções por HIV/prevenção & controle , Homossexualidade Masculina , Humanos , Internet , Aprendizado de Máquina , Masculino , Infecções Sexualmente Transmissíveis/diagnóstico , Infecções Sexualmente Transmissíveis/epidemiologia , Infecções Sexualmente Transmissíveis/prevenção & controle , Sífilis/diagnóstico
13.
Bioinformatics ; 36(3): 704-712, 2020 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-31393553

RESUMO

MOTIVATION: Gram-positive bacteria have developed secretion systems to transport proteins across their cell wall, a process that plays an important role during host infection. These secretion mechanisms have also been harnessed for therapeutic purposes in many biotechnology applications. Accordingly, the identification of features that select a protein for efficient secretion from these microorganisms has become an important task. Among all the secreted proteins, 'non-classical' secreted proteins are difficult to identify as they lack discernable signal peptide sequences and can make use of diverse secretion pathways. Currently, several computational methods have been developed to facilitate the discovery of such non-classical secreted proteins; however, the existing methods are based on either simulated or limited experimental datasets. In addition, they often employ basic features to train the models in a simple and coarse-grained manner. The availability of more experimentally validated datasets, advanced feature engineering techniques and novel machine learning approaches creates new opportunities for the development of improved predictors of 'non-classical' secreted proteins from sequence data. RESULTS: In this work, we first constructed a high-quality dataset of experimentally verified 'non-classical' secreted proteins, which we then used to create benchmark datasets. Using these benchmark datasets, we comprehensively analyzed a wide range of features and assessed their individual performance. Subsequently, we developed a two-layer Light Gradient Boosting Machine (LightGBM) ensemble model that integrates several single feature-based models into an overall prediction framework. At this stage, LightGBM, a gradient boosting machine, was used as a machine learning approach and the necessary parameter optimization was performed by a particle swarm optimization strategy. All single feature-based LightGBM models were then integrated into a unified ensemble model to further improve the predictive performance. Consequently, the final ensemble model achieved a superior performance with an accuracy of 0.900, an F-value of 0.903, Matthew's correlation coefficient of 0.803 and an area under the curve value of 0.963, and outperforming previous state-of-the-art predictors on the independent test. Based on our proposed optimal ensemble model, we further developed an accessible online predictor, PeNGaRoo, to serve users' demands. We believe this online web server, together with our proposed methodology, will expedite the discovery of non-classically secreted effector proteins in Gram-positive bacteria and further inspire the development of next-generation predictors. AVAILABILITY AND IMPLEMENTATION: http://pengaroo.erc.monash.edu/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Aprendizado de Máquina , Biologia Computacional , Peptídeos , Proteínas
14.
Epilepsy Behav ; 123: 108273, 2021 10.
Artigo em Inglês | MEDLINE | ID: mdl-34507093

RESUMO

PURPOSE: There remain major challenges for the clinician in managing patients with epilepsy effectively. Choosing anti-seizure medications (ASMs) is subject to trial and error. About one-third of patients have drug-resistant epilepsy (DRE). Surgery may be considered for selected patients, but time from diagnosis to surgery averages 20 years. We reviewed the potential use of machine learning (ML) predictive models as clinical decision support tools to help address some of these issues. METHODS: We conducted a comprehensive search of Medline and Embase of studies that investigated the application of ML in epilepsy management in terms of predicting ASM responsiveness, predicting DRE, identifying surgical candidates, and predicting epilepsy surgery outcomes. Original articles addressing these 4 areas published in English between 2000 and 2020 were included. RESULTS: We identified 24 relevant articles: 6 on ASM responsiveness, 3 on DRE prediction, 2 on identifying surgical candidates, and 13 on predicting surgical outcomes. A variety of potential predictors were used including clinical, neuropsychological, imaging, electroencephalography, and health system claims data. A number of different ML algorithms and approaches were used for prediction, but only one study utilized deep learning methods. Some models show promising performance with areas under the curve above 0.9. However, most were single setting studies (18 of 24) with small sample sizes (median number of patients 55), with the exception of 3 studies that utilized large databases and 3 studies that performed external validation. There was a lack of standardization in reporting model performance. None of the models reviewed have been prospectively evaluated for their clinical benefits. CONCLUSION: The utility of ML models for clinical decision support in epilepsy management remains to be determined. Future research should be directed toward conducting larger studies with external validation, standardization of reporting, and prospective evaluation of the ML model on patient outcomes.


Assuntos
Epilepsia Resistente a Medicamentos , Epilepsia , Algoritmos , Eletroencefalografia , Epilepsia/terapia , Humanos , Aprendizado de Máquina
15.
Sensors (Basel) ; 22(1)2021 Dec 27.
Artigo em Inglês | MEDLINE | ID: mdl-35009704

RESUMO

This paper focuses on improving the performance of scientific instrumentation that uses glass spray chambers for sample introduction, such as spectrometers, which are widely used in analytical chemistry, by detecting incidents using deep convolutional models. The performance of these instruments can be affected by the quality of the introduction of the sample into the spray chamber. Among the indicators of poor quality sample introduction are two primary incidents: The formation of liquid beads on the surface of the spray chamber, and flooding at the bottom of the spray chamber. Detecting such events autonomously as they occur can assist with improving the overall operational accuracy and efficacy of the chemical analysis, and avoid severe incidents such as malfunction and instrument damage. In contrast to objects commonly seen in the real world, beading and flooding detection are more challenging since they are of significantly small size and transparent. Furthermore, the non-rigid property increases the difficulty of the detection of these incidents, as such that existing deep-learning-based object detection frameworks are prone to fail for this task. There is no former work that uses computer vision to detect these incidents in the chemistry industry. In this work, we propose two frameworks for the detection task of these two incidents, which not only leverage the modern deep learning architectures but also integrate with expert knowledge of the problems. Specifically, the proposed networks first localize the regions of interest where the incidents are most likely generated and then refine these incident outputs. The use of data augmentation and synthesis, and choice of negative sampling in training, allows for a large increase in accuracy while remaining a real-time system for inference. In the data collected from our laboratory, our method surpasses widely used object detection baselines and can correctly detect 95% of the beads and 98% of the flooding. At the same time, out method can process four frames per second and is able to be implemented in real time.


Assuntos
Redes Neurais de Computação , Visão Ocular
18.
Sensors (Basel) ; 16(8)2016 Aug 03.
Artigo em Inglês | MEDLINE | ID: mdl-27527168

RESUMO

This paper presents a novel approach to fruit detection using deep convolutional neural networks. The aim is to build an accurate, fast and reliable fruit detection system, which is a vital element of an autonomous agricultural robotic platform; it is a key element for fruit yield estimation and automated harvesting. Recent work in deep neural networks has led to the development of a state-of-the-art object detector termed Faster Region-based CNN (Faster R-CNN). We adapt this model, through transfer learning, for the task of fruit detection using imagery obtained from two modalities: colour (RGB) and Near-Infrared (NIR). Early and late fusion methods are explored for combining the multi-modal (RGB and NIR) information. This leads to a novel multi-modal Faster R-CNN model, which achieves state-of-the-art results compared to prior work with the F1 score, which takes into account both precision and recall performances improving from 0 . 807 to 0 . 838 for the detection of sweet pepper. In addition to improved accuracy, this approach is also much quicker to deploy for new fruits, as it requires bounding box annotation rather than pixel-level annotation (annotating bounding boxes is approximately an order of magnitude quicker to perform). The model is retrained to perform the detection of seven fruits, with the entire process taking four hours to annotate and train the new model per fruit.


Assuntos
Frutas , Processamento de Imagem Assistida por Computador/métodos , Reconhecimento Automatizado de Padrão/métodos , Robótica , Algoritmos , Capsicum , Humanos , Redes Neurais de Computação
20.
Med Image Anal ; 93: 103075, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38199069

RESUMO

Informative sample selection in an active learning (AL) setting helps a machine learning system attain optimum performance with minimum labeled samples, thus reducing annotation costs and boosting performance of computer-aided diagnosis systems in the presence of limited labeled data. Another effective technique to enlarge datasets in a small labeled data regime is data augmentation. An intuitive active learning approach thus consists of combining informative sample selection and data augmentation to leverage their respective advantages and improve the performance of AL systems. In this paper, we propose a novel approach called GANDALF (Graph-based TrANsformer and Data Augmentation Active Learning Framework) to combine sample selection and data augmentation in a multi-label setting. Conventional sample selection approaches in AL have mostly focused on the single-label setting where a sample has only one disease label. These approaches do not perform optimally when a sample can have multiple disease labels (e.g., in chest X-ray images). We improve upon state-of-the-art multi-label active learning techniques by representing disease labels as graph nodes and use graph attention transformers (GAT) to learn more effective inter-label relationships. We identify the most informative samples by aggregating GAT representations. Subsequently, we generate transformations of these informative samples by sampling from a learned latent space. From these generated samples, we identify informative samples via a novel multi-label informativeness score, which beyond the state of the art, ensures that (i) generated samples are not redundant with respect to the training data and (ii) make important contributions to the training stage. We apply our method to two public chest X-ray datasets, as well as breast, dermatology, retina and kidney tissue microscopy MedMNIST datasets, and report improved results over state-of-the-art multi-label AL techniques in terms of model performance, learning rates, and robustness.


Assuntos
Mama , Tórax , Humanos , Raios X , Radiografia , Diagnóstico por Computador
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA