Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 10.118
Filtrar
1.
J Chromatogr A ; 1644: 462119, 2021 May 10.
Artigo em Inglês | MEDLINE | ID: mdl-33845426

RESUMO

Small molecule retention time prediction is a sophisticated task because of the wide variety of separation techniques resulting in fragmented data available for training machine learning models. Predictions are typically made with traditional machine learning methods such as support vector machine, random forest, or gradient boosting. Another approach is to use large data sets for training with a consequent projection of predictions. Here we evaluate the applicability of transfer learning for small molecule retention prediction as a new approach to deal with small retention data sets. Transfer learning is a state-of-the-art technique for natural language processing (NLP) tasks. We propose using text-based molecular representations (SMILES) widely used in cheminformatics for NLP-like modeling on molecules. We suggest using self-supervised pre-training to capture relevant features from a large corpus of one million molecules followed by fine-tuning on task-specific data. Mean absolute error (MAE) of predictions was in range of 88-248 s for tested reversed-phase data sets and 66 s for HILIC data set, which is comparable with MAE reported for traditional machine learning models based on descriptors or projection approaches on the same data.


Assuntos
Aprendizado de Máquina , Bases de Dados como Assunto , Processamento de Linguagem Natural , Reprodutibilidade dos Testes , Máquina de Vetores de Suporte , Fatores de Tempo
2.
Nat Commun ; 12(1): 2078, 2021 04 06.
Artigo em Inglês | MEDLINE | ID: mdl-33824310

RESUMO

Multiple sclerosis (MS) can be divided into four phenotypes based on clinical evolution. The pathophysiological boundaries of these phenotypes are unclear, limiting treatment stratification. Machine learning can identify groups with similar features using multidimensional data. Here, to classify MS subtypes based on pathological features, we apply unsupervised machine learning to brain MRI scans acquired in previously published studies. We use a training dataset from 6322 MS patients to define MRI-based subtypes and an independent cohort of 3068 patients for validation. Based on the earliest abnormalities, we define MS subtypes as cortex-led, normal-appearing white matter-led, and lesion-led. People with the lesion-led subtype have the highest risk of confirmed disability progression (CDP) and the highest relapse rate. People with the lesion-led MS subtype show positive treatment response in selected clinical trials. Our findings suggest that MRI-based subtypes predict MS disability progression and response to treatment and may be used to define groups of patients in interventional trials.


Assuntos
Imagem por Ressonância Magnética , Esclerose Múltipla/diagnóstico por imagem , Esclerose Múltipla/diagnóstico , Aprendizado de Máquina não Supervisionado , Adulto , Bases de Dados como Assunto , Progressão da Doença , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Modelos Biológicos , Placebos , Ensaios Clínicos Controlados Aleatórios como Assunto , Recidiva , Reprodutibilidade dos Testes
4.
Molecules ; 26(4)2021 Feb 19.
Artigo em Inglês | MEDLINE | ID: mdl-33669834

RESUMO

Applied datasets can vary from a few hundred to thousands of samples in typical quantitative structure-activity/property (QSAR/QSPR) relationships and classification. However, the size of the datasets and the train/test split ratios can greatly affect the outcome of the models, and thus the classification performance itself. We compared several combinations of dataset sizes and split ratios with five different machine learning algorithms to find the differences or similarities and to select the best parameter settings in nonbinary (multiclass) classification. It is also known that the models are ranked differently according to the performance merit(s) used. Here, 25 performance parameters were calculated for each model, then factorial ANOVA was applied to compare the results. The results clearly show the differences not just between the applied machine learning algorithms but also between the dataset sizes and to a lesser extent the train/test split ratios. The XGBoost algorithm could outperform the others, even in multiclass modeling. The performance parameters reacted differently to the change of the sample set size; some of them were much more sensitive to this factor than the others. Moreover, significant differences could be detected between train/test split ratios as well, exerting a great effect on the test validation of our models.


Assuntos
Algoritmos , Bases de Dados como Assunto , Relação Quantitativa Estrutura-Atividade , Intervalos de Confiança , Aprendizado de Máquina
5.
Br J Surg ; 108(2): 182-187, 2021 03 12.
Artigo em Inglês | MEDLINE | ID: mdl-33711146

RESUMO

BACKGROUND: Intraoperative nerve monitoring (IONM) is used increasingly in thyroid surgery to prevent recurrent laryngeal nerve (RLN) injury, despite lack of definitive evidence. This study analysed the United Kingdom Registry of Endocrine and Thyroid Surgery (UKRETS) to investigate whether IONM reduced the incidence of RLN injury. METHODS: UKRETS data were extracted on 28 July 2018. Factors related to risk of RLN palsy, such as age, sex, retrosternal goitre, reoperation, use of energy devices, extent of surgery, nodal dissection and IONM, were analysed. Data with missing entries for these risk factors were excluded. Outcomes of patients who had preoperative and postoperative laryngoscopy were analysed. RESULTS: RLN palsy occurred in 4.9 per cent of thyroidectomies. The palsy was temporary in 64.6 per cent and persistent in 35.4 per cent of patients. In multivariable analysis, IONM reduced the risk of RLN palsy (odds ratio (OR) 0.63, 95 per cent confidence interval (CI) 0.54 to 0.74, P < 0.001) and persistent nerve palsy (OR 0.47, 0.37 to 0.61, P < 0.001). Outpatient laryngoscopy was also associated with a reduced incidence of RLN palsy (OR 0.50, 0.37 to 0.67, P < 0.001). Bilateral RLN palsy occurred in 0.3 per cent. Reoperation (OR 12.30, 2.90 to 52.10, P = 0.001) and total thyroidectomy (OR 6.52, 1.50 to 27.80; P = 0.010) were significantly associated with bilateral RLN palsy. CONCLUSION: The use of IONM is associated with a decreased risk of RLN injury in thyroidectomy. These results based on analysis of UKRETS data support the routine use of RLN monitoring in thyroid surgery.


Assuntos
Nervos Laríngeos/fisiologia , Monitorização Intraoperatória , Traumatismos do Nervo Laríngeo Recorrente/prevenção & controle , Tireoidectomia/métodos , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Bases de Dados como Assunto , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Monitorização Intraoperatória/métodos , Monitorização Intraoperatória/estatística & dados numéricos , Traumatismos do Nervo Laríngeo Recorrente/epidemiologia , Sistema de Registros , Fatores de Risco , Glândula Tireoide/cirurgia , Tireoidectomia/efeitos adversos , Tireoidectomia/estatística & dados numéricos , Reino Unido/epidemiologia , Adulto Jovem
6.
Sci Rep ; 11(1): 6375, 2021 03 18.
Artigo em Inglês | MEDLINE | ID: mdl-33737679

RESUMO

We aimed to investigate the impact of comorbidity burden on mortality in patients with coronavirus disease (COVID-19). We analyzed the COVID-19 data from the nationwide health insurance claims of South Korea. Data on demographic characteristics, comorbidities, and mortality records of patients with COVID-19 were extracted from the database. The odds ratios of mortality according to comorbidities in these patients with and without adjustment for age and sex were calculated. The predictive value of the original Charlson comorbidity index (CCI) and the age-adjusted CCI (ACCI) for mortality in these patients were investigated using the receiver operating characteristic (ROC) curve analysis. Among 7590 patients, 227 (3.0%) had died. After age and sex adjustment, hypertension, diabetes mellitus, congestive heart failure, dementia, chronic pulmonary disease, liver disease, renal disease, and cancer were significant risk factors for mortality. The ROC curve analysis showed that an ACCI threshold > 3.5 yielded the best cut-off point for predicting mortality (area under the ROC 0.92; 95% confidence interval 0.91-0.94). Our study revealed multiple risk factors for mortality in patients with COVID-19. The high predictive power of the ACCI for mortality in our results can support the importance of old age and comorbidities in the severity of COVID-19.


Assuntos
/mortalidade , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Criança , Estudos de Coortes , Comorbidade , Bases de Dados como Assunto , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , República da Coreia/epidemiologia , Adulto Jovem
8.
Nat Commun ; 12(1): 25, 2021 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-33397940

RESUMO

Droplet-based microfluidic devices hold immense potential in becoming inexpensive alternatives to existing screening platforms across life science applications, such as enzyme discovery and early cancer detection. However, the lack of a predictive understanding of droplet generation makes engineering a droplet-based platform an iterative and resource-intensive process. We present a web-based tool, DAFD, that predicts the performance and enables design automation of flow-focusing droplet generators. We capitalize on machine learning algorithms to predict the droplet diameter and rate with a mean absolute error of less than 10 µm and 20 Hz. This tool delivers a user-specified performance within 4.2% and 11.5% of the desired diameter and rate. We demonstrate that DAFD can be extended by the community to support additional fluid combinations, without requiring extensive machine learning knowledge or large-scale data-sets. This tool will reduce the need for microfluidic expertise and design iterations and facilitate adoption of microfluidics in life sciences.


Assuntos
Aprendizado de Máquina , Microfluídica , Reologia , Algoritmos , Automação , Bases de Dados como Assunto , Desenho de Equipamento , Dispositivos Lab-On-A-Chip , Redes Neurais de Computação
9.
PLoS One ; 16(1): e0242612, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33417606

RESUMO

Butterfly Optimization Algorithm (BOA) is a recent metaheuristics algorithm that mimics the behavior of butterflies in mating and foraging. In this paper, three improved versions of BOA have been developed to prevent the original algorithm from getting trapped in local optima and have a good balance between exploration and exploitation abilities. In the first version, Opposition-Based Strategy has been embedded in BOA while in the second Chaotic Local Search has been embedded. Both strategies: Opposition-based & Chaotic Local Search have been integrated to get the most optimal/near-optimal results. The proposed versions are compared against original Butterfly Optimization Algorithm (BOA), Grey Wolf Optimizer (GWO), Moth-flame Optimization (MFO), Particle warm Optimization (PSO), Sine Cosine Algorithm (SCA), and Whale Optimization Algorithm (WOA) using CEC 2014 benchmark functions and 4 different real-world engineering problems namely: welded beam engineering design, tension/compression spring, pressure vessel design, and Speed reducer design problem. Furthermore, the proposed approches have been applied to feature selection problem using 5 UCI datasets. The results show the superiority of the third version (CLSOBBOA) in achieving the best results in terms of speed and accuracy.


Assuntos
Algoritmos , Borboletas/fisiologia , Animais , Bases de Dados como Assunto , Heurística
10.
PLoS One ; 16(1): e0242600, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33434209

RESUMO

Human behavior as they engaged in financial activities is intimately connected to the observed market dynamics. Despite many existing theories and studies on the fundamental motivations of the behavior of humans in financial systems, there is still limited empirical deduction of the behavioral compositions of the financial agents from a detailed market analysis. Blockchain technology has provided an avenue for the latter investigation with its voluminous data and its transparency of financial transactions. It has enabled us to perform empirical inference on the behavioral patterns of users in the market, which we explore in the bitcoin and ethereum cryptocurrency markets. In our study, we first determine various properties of the bitcoin and ethereum users by a temporal complex network analysis. After which, we develop methodology by combining k-means clustering and Support Vector Machines to derive behavioral types of users in the two cryptocurrency markets. Interestingly, we found four distinct strategies that are common in both markets: optimists, pessimists, positive traders and negative traders. The composition of user behavior is remarkably different between the bitcoin and ethereum market during periods of local price fluctuations and large systemic events. We observe that bitcoin (ethereum) users tend to take a short-term (long-term) view of the market during the local events. For the large systemic events, ethereum (bitcoin) users are found to consistently display a greater sense of pessimism (optimism) towards the future of the market.


Assuntos
Comportamento , Comércio , Algoritmos , Bases de Dados como Assunto , Modelos Econômicos , Software
11.
Mol Phylogenet Evol ; 157: 107069, 2021 04.
Artigo em Inglês | MEDLINE | ID: mdl-33421615

RESUMO

The tribe Arvicanthini (Muridae: Murinae) is a highly diversified group of rodents (ca. 100 species) and with 18 African genera (plus one Asiatic) represents probably the most successful adaptive radiation of extant mammals in Africa. They colonized a broad spectrum of habitats (from rainforests to semi-deserts) in whole sub-Saharan Africa and their members often belong to most abundant parts of mammal communities. Despite intensive efforts, the phylogenetic relationships among major lineages (i.e. genera) remained obscured, which was likely caused by the intensive radiation of the group, dated to the Late Miocene. Here we used genomic scale data (377 nuclear loci; 581,030 bp) and produced the first fully resolved species tree containing all currently delimited genera of the tribe. Mitogenomes were also extracted, and while the results were largely congruent, there was less resolution at basal nodes of the mitochondrial phylogeny. Results of a fossil-based divergence dating analysis suggest that the African radiation started early after the colonization of Africa by a single arvicanthine ancestor from Asia during the Messinian stage (ca. 7 Ma), and was likely linked with a fragmentation of the pan-African Miocene forest. Some lineages remained in the rain forest, while many others successfully colonized broad spectrum of new open habitats (e.g. savannas, wetlands or montane moorlands) that appeared at the beginning of Pliocene. One lineage even evolved partially arboricolous life style in savanna woodlands, which allowed them to re-colonize equatorial forests. We also discuss delimitation of genera in Arvicanthini and propose corresponding taxonomic changes.


Assuntos
Núcleo Celular/genética , Genoma Mitocondrial , Murinae/classificação , Murinae/genética , África ao Sul do Saara , Animais , Teorema de Bayes , DNA Mitocondrial/genética , Bases de Dados como Assunto , Loci Gênicos , Filogenia , Especificidade da Espécie
12.
Jpn J Clin Oncol ; 51(4): 630-638, 2021 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-33395486

RESUMO

OBJECTIVE: We used National Cancer Institute's Surveillance, Epidemiology and End Result database to assess the role of salvage radiotherapy for women with unanticipated cervical cancer after simple hysterectomy. METHODS: Patients with non-metastatic cervical cancer and meeting inclusion criteria were divided into three groups based on treatment strategy: simple hysterectomy, salvage radiotherapy after hysterectomy and radical surgery. Parallel propensity score-matched datasets were established for salvage radiotherapy group vs. simple hysterectomy group (matching ratio 1: 1), and salvage radiotherapy group vs. radical surgery group (matching ratio 1:2). The primary endpoint was the overall survival advantage of salvage radiotherapy over simple hysterectomy or radical surgery within the propensity score-matched datasets. RESULTS: In total, 2682 patients were recruited: 647 in the simple hysterectomy group, 564 in the salvage radiotherapy group and 1471 in the radical surgery group. Age, race, histology, grade, FIGO stage, insured and marital status and chemotherapy were comprised in propensity score-matched. Matching resulted in two comparison groups with neglectable differences in most variables, except for black race, FIGO stage III and chemotherapy in first matching. In the matched analysis for salvage radiotherapy vs. simple hysterectomy, the median follow-up time was 39 versus 32 months. In the matched analysis for salvage radiotherapy vs. radical surgery, the median follow-up time was 39 and 41 months, respectively. Salvage radiotherapy (HR 0.53, P = 0.046) significantly improved overall survival compared with simple hysterectomy, while salvage radiotherapy cannot achieve similar overall survival to radical surgery (HR 1.317, P = 0.045). CONCLUSIONS: This is the largest study of the effect of salvage radiotherapy on overall survival in patients with unanticipated cervical cancer. Salvage radiotherapy can improve overall survival compared with hysterectomy alone, while cannot achieve comparable survival to radical surgery.


Assuntos
Bases de Dados como Assunto , Histerectomia , Oncologia , Pontuação de Propensão , Terapia de Salvação , Neoplasias do Colo do Útero/radioterapia , Neoplasias do Colo do Útero/cirurgia , Adulto , Feminino , Humanos , Estimativa de Kaplan-Meier , Pessoa de Meia-Idade , Estadiamento de Neoplasias , Razão de Chances , Neoplasias do Colo do Útero/patologia
13.
PLoS One ; 16(1): e0245253, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33444340

RESUMO

The main goal of the current paper is to contribute to the existing literature of probability distributions. In this paper, a new probability distribution is generated by using the Alpha Power Family of distributions with the aim to model the data with non-monotonic failure rates and provides a better fit. The proposed distribution is called Alpha Power Exponentiated Inverse Rayleigh or in short APEIR distribution. Various statistical properties have been investigated including they are the order statistics, moments, residual life function, mean waiting time, quantiles, entropy, and stress-strength parameter. To estimate the parameters of the proposed distribution, the maximum likelihood method is employed. It has been proved theoretically that the proposed distribution provides a better fit to the data with monotonic as well as non-monotonic hazard rate shapes. Moreover, two real data sets are used to evaluate the significance and flexibility of the proposed distribution as compared to other probability distributions.


Assuntos
Simulação por Computador , Probabilidade , Analgésicos/farmacologia , Bases de Dados como Assunto , Humanos , Modelos Teóricos , Chuva
14.
Interdiscip Sci ; 13(1): 103-117, 2021 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-33387306

RESUMO

Corona virus disease (COVID-19) acknowledged as a pandemic by the WHO and mankind all over the world is vulnerable to this virus. Alternative tools are needed that can help in diagnosis of the coronavirus. Researchers of this article investigated the potential of machine learning methods for automatic diagnosis of corona virus with high accuracy from X-ray images. Two most commonly used classifiers were selected: logistic regression (LR) and convolutional neural networks (CNN). The main reason was to make the system fast and efficient. Moreover, a dimensionality reduction approach was also investigated based on principal component analysis (PCA) to further speed up the learning process and improve the classification accuracy by selecting the highly discriminate features. The deep learning-based methods demand large amount of training samples compared to conventional approaches, yet adequate amount of labelled training samples was not available for COVID-19 X-ray images. Therefore, data augmentation technique using generative adversarial network (GAN) was employed to further increase the training samples and reduce the overfitting problem. We used the online available dataset and incorporated GAN to have 500 X-ray images in total for this study. Both CNN and LR showed encouraging results for COVID-19 patient identification. The LR and CNN models showed 95.2-97.6% overall accuracy without PCA and 97.6-100% with PCA for positive cases identification, respectively.


Assuntos
/diagnóstico por imagem , Imageamento Tridimensional , Aprendizado de Máquina , Tórax/diagnóstico por imagem , Algoritmos , Bases de Dados como Assunto , Humanos , Modelos Logísticos , Redes Neurais de Computação , Raios X
15.
Nucleic Acids Res ; 49(D1): D809-D816, 2021 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-33313778

RESUMO

VIrus Particle ExploreR data base (VIPERdb) (http://viperdb.scripps.edu) is a curated repository of virus capsid structures and a database of structure-derived data along with various virus specific information. VIPERdb has been continuously improved for over 20 years and contains a number of virus structure analysis tools. The release of VIPERdb v3.0 contains new structure-based data analytics tools like Multiple Structure-based and Sequence Alignment (MSSA) to identify hot-spot residues within a selected group of structures and an anomaly detection application to analyze and curate the structure-derived data within individual virus families. At the time of this writing, there are 931 virus structures from 62 different virus families in the database. Significantly, the new release also contains a standalone database called 'Virus World database' (VWdb) that comprises all the characterized viruses (∼181 000) known to date, gathered from ICTVdb and NCBI, and their capsid protein sequences, organized according to their virus taxonomy with links to known structures in VIPERdb and PDB. Moreover, the new release of VIPERdb includes a service-oriented data engine to handle all the data access requests and provides an interface for futuristic data analytics using machine leaning applications.


Assuntos
Capsídeo/química , Ciência de Dados , Bases de Dados como Assunto , Vírus/química , Curadoria de Dados , Alinhamento de Sequência
17.
Mayo Clin Proc ; 96(1): 105-119, 2021 01.
Artigo em Inglês | MEDLINE | ID: mdl-33309181

RESUMO

OBJECTIVE: To examine the combined and stratified associations of physical activity and adiposity measures, modelled as body mass index (BMI), abdominal adiposity (waist circumference), and body fat percentage (BF) with all-cause mortality. PATIENTS AND METHODS: Using the UK Biobank cohort, we extracted quintiles of self-reported weekly physical activity. Categories of measured BMI, waist circumference, and BF were generated. Joint associations between physical activity-adiposity categories and mortality were examined using Cox proportional hazards models adjusted for demographic, behavioral, and clinical covariates. Physical activity-mortality associations were also examined within adiposity strata. Participants were followed from baseline (2006 to 2010) through January 31, 2018. RESULTS: A total of 295,917 participants (median follow-up, 8.9 years, during which 6684 deaths occurred) were included. High physical activity was associated with lower risk of premature mortality in all strata of adiposity except for those with BMI ≥35 kg/m2. Highest risk (HR, 1.54; 95% CI; 1.33 to 1.79) was observed in individuals with low physical activity and high BF as compared with the high physical activity-low BF referent. High physical activity attenuated the risk of high adiposity when using BF (HR, 1.24; 95% CI; 1.04 to 1.49), but the association was weaker with BMI (HR, 1.45; 95% CI; 1.21 to 1.73). Physical activity also attenuated the association between mortality and high waist circumference. CONCLUSION: Low physical activity and adiposity were both associated with a higher risk of premature mortality, but high physical activity attenuated the increased risk with adiposity irrespective of adiposity metric, except in those with a BMI ≥35 kg/m2.


Assuntos
Adiposidade , Exercício Físico , Mortalidade , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Índice de Massa Corporal , Bases de Dados como Assunto , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Modelos de Riscos Proporcionais , Estudos Prospectivos , Fatores de Risco , Reino Unido/epidemiologia , Circunferência da Cintura , Adulto Jovem
18.
PLoS One ; 15(12): e0243843, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33320878

RESUMO

BACKGROUND: National birth cohorts derived from administrative health databases constitute unique resources for child health research due to whole country coverage, ongoing follow-up and linkage to other data sources. In England, a national birth cohort can be developed using Hospital Episode Statistics (HES), an administrative database covering details of all publicly funded hospital activity, including 97% of births, with longitudinal follow-up via linkage to hospital and mortality records. We present methods for developing a national birth cohort using HES and assess the impact of changes to data collection over time on coverage and completeness of linked follow-up records for children. METHODS: We developed a national cohort of singleton live births in 1998-2015, with information on key risk factors at birth (birth weight, gestational age, maternal age, ethnicity, area-level deprivation). We identified three changes to data collection, which could affect linkage of births to follow-up records: (1) the introduction of the "NHS Numbers for Babies (NN4B)", an on-line system which enabled maternity staff to request a unique healthcare patient identifier (NHS number) immediately at birth rather than at civil registration, in Q4 2002; (2) the introduction of additional data quality checks at civil registration in Q3 2009; and (3) correcting a postcode extraction error for births by the data provider in Q2 2013. We evaluated the impact of these changes on trends in two outcomes in infancy: hospital readmissions after birth (using interrupted time series analyses) and mortality rates (compared to published national statistics). RESULTS: The cohort covered 10,653,998 babies, accounting for 96% of singleton live births in England in 1998-2015. Overall, 2,077,929 infants (19.5%) had at least one hospital readmission after birth. Readmission rates declined by 0.2% percentage points per annual quarter in Q1 1998 to Q3 2002, shifted up by 6.1% percentage points (compared to the expected value based on the trend before Q4 2002) to 17.7% in Q4 2002 when NN4B was introduced, and increased by 0.1% percentage points per annual quarter thereafter. Infant mortality rates were under-reported by 16% for births in 1998-2002 and similar to published national mortality statistics for births in 2003-2015. The trends in infant readmission were not affected by changes to data collection practices in Q3 2009 and Q2 2013, but the proportion of unlinked mortality records in HES and in ONS further declined after 2009. DISCUSSION: HES can be used to develop a national birth cohort for child health research with follow-up via linkage to hospital and mortality records for children born from 2003 onwards. Re-linking births before 2003 to their follow-up records would maximise potential benefits of this rich resource, enabling studies of outcomes in adolescents with over 20 years of follow-up.


Assuntos
Saúde da Criança , Coleta de Dados , Bases de Dados como Assunto , Hospitalização , Parto , Pesquisa , Algoritmos , Estudos de Coortes , Inglaterra/epidemiologia , Humanos , Lactente , Mortalidade Infantil , Recém-Nascido , Fatores de Risco
19.
PLoS One ; 15(12): e0243852, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33332398

RESUMO

Software developers need to cope with a massive amount of knowledge throughout the typical life cycle of modern projects. This knowledge includes expertise related to the software development phases (e.g., programming, testing) using a wide variety of methods and tools, including development methodologies (e.g., waterfall, agile), software tools (e.g., Eclipse), programming languages (e.g., Java, SQL), and deployment strategies (e.g., Docker, Jenkins). However, there is no explicit integration of these various types of knowledge with software development projects so that developers can avoid having to search over and over for similar and recurrent solutions to tasks and reuse this knowledge. Specifically, Q&A sites such as Stack Overflow are used by developers to share software development knowledge through posts published in several categories, but there is no link between these posts and the tasks developers perform. In this paper, we present an approach that (i) allows developers to associate project tasks with Stack Overflow posts, and (ii) recommends which Stack Overflow posts might be reused based on task similarity. We analyze an industry dataset, which contains project tasks associated with Stack Overflow posts, looking for the similarity of project tasks that reuse a Stack Overflow post. The approach indicates that when a software developer is performing a task, and this task is similar to another task that has been associated with a post, the same post can be recommended to the developer and possibly reused. We believe that this approach can significantly advance the state of the art of software knowledge reuse by supporting novel knowledge-project associations.


Assuntos
Conhecimento , Software , Inquéritos e Questionários , Análise e Desempenho de Tarefas , Algoritmos , Bases de Dados como Assunto , Modelos Teóricos , Publicações
20.
BMC Bioinformatics ; 21(Suppl 23): 579, 2020 Dec 29.
Artigo em Inglês | MEDLINE | ID: mdl-33372606

RESUMO

BACKGROUND: Entity normalization is an important information extraction task which has gained renewed attention in the last decade, particularly in the biomedical and life science domains. In these domains, and more generally in all specialized domains, this task is still challenging for the latest machine learning-based approaches, which have difficulty handling highly multi-class and few-shot learning problems. To address this issue, we propose C-Norm, a new neural approach which synergistically combines standard and weak supervision, ontological knowledge integration and distributional semantics. RESULTS: Our approach greatly outperforms all methods evaluated on the Bacteria Biotope datasets of BioNLP Open Shared Tasks 2019, without integrating any manually-designed domain-specific rules. CONCLUSIONS: Our results show that relatively shallow neural network methods can perform well in domains that present highly multi-class and few-shot learning problems.


Assuntos
Algoritmos , Redes Neurais de Computação , Bactérias/metabolismo , Intervalos de Confiança , Bases de Dados como Assunto , Ecossistema , Humanos , Conhecimento , Aprendizado de Máquina , Fenótipo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...