Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 42
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
Behav Res Methods ; 2024 Aug 07.
Artigo em Inglês | MEDLINE | ID: mdl-39112741

RESUMO

Story recall is an episodic memory paradigm that is popular among researchers interested in the effects of aging, disease, and/or injury on memory functioning; it is less popular among individual-differences researchers studying neurotypical young adults. One reason differential psychologists may favor other episodic memory paradigms is that the prospect of scoring story recall is daunting, as it typically requires manually scoring hundreds or thousands of freely recalled narratives. In this study, I investigated two questions related to scoring story recall for individual differences research. First, whether there is anything to gain by scoring story recall for memory of central and peripheral details or if a single score is sufficient. Second, I investigated whether scoring can be automated using computational methods - namely, BERTScore and GPT-4. A total of 235 individuals participated in this study. At the latent variable level, central and peripheral factors were highly correlated (r = .99), and the two factors correlated with external factors (viz., fluid intelligence, crystallized intelligence, and working memory capacity) similarly. Regarding automated scoring, both BERTScore and GPT-4 derived scores were strongly correlated with manually derived scores (r ≥ .97); additionally, factors estimated from the various scoring methods all showed a similar pattern of correlations with the external factors. Thus, differential psychologists may be able to streamline scoring by disregarding detail type and by using automated approaches. Further research is needed, particularly of the automated approaches, as both BERTScore and GPT-4 derived scores were occasionally leptokurtic while manual scores were not.

2.
J Clin Med ; 13(14)2024 Jul 18.
Artigo em Inglês | MEDLINE | ID: mdl-39064244

RESUMO

Background: Obstructive sleep apnea (OSA) affects a significant proportion of the global population, with many having moderate or severe forms of the disease. Home Sleep Apnea Testing (HSAT) has become the most common method of diagnosing OSA, replacing in-lab polysomnography. Polysmith software Version 11 by Nihon Kohden allows for the automatic scoring of respiratory events. This study aimed to assess the validity of this technology. Study Objectives: The objective was to assess the accuracy of the Polysmith Software Automatic Scoring Algorithm of HSATs in comparison to that of sleep technicians. Methods: One hundred twenty HSATs were scored by both sleep technicians and Polysmith software. The measured values were the respiratory event index (REI), apneic events, and hypopneic events. Agreement between the two methods was reached by utilizing the Kruskal-Wallis test, Pearson correlation coefficient, and Bland-Altman plot, as well as sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). Results: The correlation between the REI calculated by the software and technicians proved to be strong overall (r = 0.96, p < 0.0001). The mild OSA group had a moderate correlation (r = 0.45, p = 0.0129). The primary snoring, moderate OSA, and severe OSA groups showed stronger correlations (r = 0.69, p < 0.0001; r = 0.56, p = 0.012; r = 0.71, p < 0.0001). The analysis conducted across all groups demonstrated an average sensitivity of 81%, specificity of 94%, PPV of 82%, and NPV of 94%, with an overall accuracy of 81%. When combining the moderate and severe OSA groups into a single category, the sensitivity was 90%, specificity was 100%, PPV was 100%, and NPV was 91%. Conclusions: OSA can be reliably diagnosed from HSATs with the automated Polysmith software across all OSA disease severity groups, with higher levels of accuracy in moderate/severe OSA and lower levels of accuracy in mild OSA.

3.
Psychometrika ; 89(1): 64-83, 2024 03.
Artigo em Inglês | MEDLINE | ID: mdl-38565794

RESUMO

Rapid advances in psychology and technology open opportunities and present challenges beyond familiar forms of educational assessment and measurement. Viewing assessment through the perspectives of complex adaptive sociocognitive systems and argumentation helps us extend the concepts and methods of educational measurement to new forms of assessment, such as those involving interaction in simulation environments and automated evaluation of performances. I summarize key ideas for doing so and point to the roles of measurement models and their relation to sociocognitive systems and assessment arguments. A game-based learning assessment SimCityEDU: Pollution Challenge! is used to illustrate ideas.


Assuntos
Avaliação Educacional , Psicometria , Psicometria/métodos , Humanos , Avaliação Educacional/métodos , Modelos Estatísticos
4.
eNeuro ; 11(3)2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38351132

RESUMO

In the field of behavioral neuroscience, the classification and scoring of animal behavior play pivotal roles in the quantification and interpretation of complex behaviors displayed by animals. Traditional methods have relied on video examination by investigators, which is labor-intensive and susceptible to bias. To address these challenges, research efforts have focused on computational methods and image-processing algorithms for automated behavioral classification. Two primary approaches have emerged: marker- and markerless-based tracking systems. In this study, we showcase the utility of "Augmented Reality University of Cordoba" (ArUco) markers as a marker-based tracking approach for assessing rat engagement during a nose-poking go/no-go behavioral task. In addition, we introduce a two-state engagement model based on ArUco marker tracking data that can be analyzed with a rectangular kernel convolution to identify critical transition points between states of engagement and distraction. In this study, we hypothesized that ArUco markers could be utilized to accurately estimate animal engagement in a nose-poking go/no-go behavioral task, enabling the computation of optimal task durations for behavioral testing. Here, we present the performance of our ArUco tracking program, demonstrating a classification accuracy of 98% that was validated against the manual curation of video data. Furthermore, our convolution analysis revealed that, on average, our animals became disengaged with the behavioral task at ∼75 min, providing a quantitative basis for limiting experimental session durations. Overall, our approach offers a scalable, efficient, and accessible solution for automated scoring of rodent engagement during behavioral data collection.


Assuntos
Comportamento Animal , Roedores , Ratos , Animais , Algoritmos , Processamento de Imagem Assistida por Computador
5.
Int J Neural Syst ; 34(3): 2450009, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38318751

RESUMO

Large-scale benchmark datasets are crucial in advancing research within the computer science communities. They enable the development of more sophisticated AI models and serve as "golden" benchmarks for evaluating their performance. Thus, ensuring the quality of these datasets is of utmost importance for academic research and the progress of AI systems. For the emerging vision-language tasks, some datasets have been created and frequently used, such as Flickr30k, COCO, and NoCaps, which typically contain a large number of images paired with their ground-truth textual descriptions. In this paper, an automatic method is proposed to assess the quality of large-scale benchmark datasets designed for vision-language tasks. In particular, a new cross-modal matching model is developed, which is capable of automatically scoring the textual descriptions of visual images. Subsequently, this model is employed to evaluate the quality of vision-language datasets by automatically assigning a score to each 'ground-truth' description for every image picture. With a good agreement between manual and automated scoring results on the datasets, our findings reveal significant disparities in the quality of the ground-truth descriptions included in the benchmark datasets. Even more surprising, it is evident that a small portion of the descriptions are unsuitable for serving as reliable ground-truth references. These discoveries emphasize the need for careful utilization of these publicly accessible benchmark databases.


Assuntos
Benchmarking , Bases de Dados Factuais
6.
Cureus ; 16(1): e52654, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38380197

RESUMO

Objective Automated scoring of respiratory events could allow a swifter obstructive sleep apnea (OSA) identification. We assessed the accuracy of the Alice PDx device with the Somnolyzer automated scoring algorithm, compared to the manually reviewed scoring by a trained sleep technician, for the diagnosis of OSA. Methods A prospective study was conducted between March 2021 and March 2022 in Centro Hospitalar do Baixo Vouga, a level 2 hospital in Aveiro, Portugal. Patients with high pre-test probability for OSA performed a type III home sleep apnea testing with the Alice PDx device. Data were scored automatedly by the Sleepware G3 with the Somnolyzer digital system and manually by a trained sleep technician. Correlation and dependent t-tests were used. Sensitivity, specificity, positive predictive values (PPVs), negative predictive values (NPVs), and area under the receiver operating characteristic curve (AUROC) of automated scoring were calculated. Data were analyzed using the Stata Statistical Software (Release 17, StataCorp., 2023, College Station, TX: StataCorp LLC). Results In 150 participants (mean age 57.8 ± 13.9 years), the mean apnea-hypopnea index (AHI) was 21.9 ± 21.8 events/hour by manual scoring and 25.4 ± 21.6 events/hour by automated scoring. The mean difference was 3.4 ± 4.4 events/hour, and a strong, positive, linear correlation was found between the two scores (r = 0.98). At the altered AHI (AHI ≥ 5 events/hour), mild, moderate, and severe OSA, the automated scoring sensitivity/specificity values were 91.2%/100.0%, 80.0%/68.6%, 91.6%/41.9%, and 98.1%/80.9%, respectively. The PPVs/NPVs for the same categories were 100.0%/69.4%, 89.3%/51.1%, 79.7%/66.7%, and 91.8%/95.0%, respectively. Finally, the AUROC was 0.85, 0.70, 0.73, and 0.93, respectively. Conclusion The automated scoring obtained from the Alice PDx portable device, using Sleepware G3 with the Somnolyzer digital system, seems accurate enough to diagnose OSA and validate the initiation of PAP therapy in the correct clinical setting. Nevertheless, it does not replace manual reviewing by a trained sleep technician in the case of mild and moderate OSA, to obtain a correct severity classification. With this valuable time-saving tool, we expect to hasten OSA diagnosis and treatment and thus tackle the underdiagnosis problem.

7.
Behav Res Methods ; 56(3): 2243-2259, 2024 03.
Artigo em Inglês | MEDLINE | ID: mdl-38233632

RESUMO

The autobiographical interview has been used in more than 200 studies to assess the content of autobiographical memories. In a typical experiment, participants recall memories, which are then scored manually for internal details (episodic details from the central event) and external details (largely non-episodic details). Scoring these narratives requires a significant amount of time. As a result, large studies with this procedure are often impractical, and even conducting small studies is time-consuming. To reduce scoring burden and enable larger studies, we developed an approach to automatically score responses with natural language processing. We fine-tuned an existing language model (distilBERT) to identify the amount of internal and external content in each sentence. These predictions were aggregated to obtain internal and external content estimates for each narrative. We evaluated our model by comparing manual scores with automated scores in five datasets. We found that our model performed well across datasets. In four datasets, we found a strong correlation between internal detail counts and the amount of predicted internal content. In these datasets, manual and automated external scores were also strongly correlated, and we found minimal misclassification of content. In a fifth dataset, our model performed well after additional preprocessing. To make automated scoring available to other researchers, we provide a Colab notebook that is intended to be used without additional coding.


Assuntos
Memória Episódica , Processamento de Linguagem Natural , Humanos , Idioma , Rememoração Mental/fisiologia , Narração
8.
J Sport Rehabil ; 33(3): 220-224, 2024 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-38295786

RESUMO

CONTEXT: The Balance Error Scoring System (BESS) is a commonly used clinical tool to evaluate postural control that is traditionally performed through visual assessment and subjective evaluation of balance errors. The purpose of this study was to evaluate an automated computer-based scoring system using an instrumented pressure mat compared to the traditional human-based manual assessment. DESIGN: A descriptive cross-sectional study design was used to evaluate the performance of the automated versus human BESS scoring methodology in healthy individuals. METHODS: Fifty-one healthy active participants performed BESS trials following standard BESS procedures on an instrumented pressure mat (MobileMat, Tekscan Inc). Trained evaluators manually scored balance errors from frontal and sagittal plane video recordings for comparison to errors scored using center of force measurements and an automated scoring software (SportsAT, version 2.0.2, Tekscan Inc). A linear mixed model was used to determine measurement discrepancies across the 2 methods. Bland-Altman analyses were conducted to determine limit of agreement for the automated and manual scoring methods. RESULTS: Significant differences between the automated and manual errors scored were observed across all conditions (P < .05), excluding bilateral firm stance. The greatest discrepancy between scoring methods was during the tandem foam stance, while the smallest discrepancy was during the tandem firm stance. CONCLUSION: The 2 methods of BESS scoring are different with wide limits of agreement. The benefits and risks of each approach to error scoring should be considered when selecting the most appropriate metric for clinical use or research studies.


Assuntos
Equilíbrio Postural , Projetos de Pesquisa , Humanos , Estudos Transversais
9.
Front Plant Sci ; 14: 1151911, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37484468

RESUMO

Seed physiology is related to functional and metabolic traits of the seed-seedling transition. In this sense, modeling the kinetics, uniformity and capacity of a seed sample plays a central role in designing strategies for trade, food, and environmental security. Thus, POMONA is presented as an easy-to-use multiplatform software designed to bring several logistic and linearized models into a single package, allowing for convenient and fast assessment of seed germination and or longevity, even if the data has a non-Normal distribution. POMONA is implemented in JavaScript using the Quasar framework and can run in the Microsoft Windows operating system, GNU/Linux, and Android-powered mobile hardware or on a web server as a service. The capabilities of POMONA are showcased through a series of examples with diaspores of corn and soybean, evidencing its robustness, accuracy, and performance. POMONA can be the first step for the creation of an automatic multiplatform that will benefit laboratory users, including those focused on image analysis.

10.
Educ Psychol Meas ; 83(3): 556-585, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-37187689

RESUMO

Automated scoring of free drawings or images as responses has yet to be used in large-scale assessments of student achievement. In this study, we propose artificial neural networks to classify these types of graphical responses from a TIMSS 2019 item. We are comparing classification accuracy of convolutional and feed-forward approaches. Our results show that convolutional neural networks (CNNs) outperform feed-forward neural networks in both loss and accuracy. The CNN models classified up to 97.53% of the image responses into the appropriate scoring category, which is comparable to, if not more accurate, than typical human raters. These findings were further strengthened by the observation that the most accurate CNN models correctly classified some image responses that had been incorrectly scored by the human raters. As an additional innovation, we outline a method to select human-rated responses for the training sample based on an application of the expected response function derived from item response theory. This paper argues that CNN-based automated scoring of image responses is a highly accurate procedure that could potentially replace the workload and cost of second human raters for international large-scale assessments (ILSAs), while improving the validity and comparability of scoring complex constructed-response items.

11.
Radiat Environ Biophys ; 62(3): 349-356, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37195317

RESUMO

Radiation dose estimations performed by automated counting of micronuclei (MN) have been studied for their utility for triage following large-scale radiological incidents; although speed is essential, it also is essential to estimate radiation doses as accurately as possible for long-term epidemiological follow-up. Our goal in this study was to evaluate and improve the performance of automated MN counting for biodosimetry using the cytokinesis-block micronucleus (CBMN) assay. We measured false detection rates and used them to improve the accuracy of dosimetry. The average false-positive rate for binucleated cells was 1.14%; average false-positive and -negative MN rates were 1.03% and 3.50%, respectively. Detection errors seemed to be correlated with radiation dose. Correction of errors by visual inspection of images used for automated counting, called the semi-automated and manual scoring method, increased accuracy of dose estimation. Our findings suggest that dose assessment of the automated MN scoring system can be improved by subsequent error correction, which could be useful for performing biodosimetry on large numbers of people rapidly, accurately, and efficiently.


Assuntos
Núcleo Celular , Radiometria , Humanos , Relação Dose-Resposta à Radiação , Radiometria/métodos , Testes para Micronúcleos/métodos , Citocinese , Linfócitos
12.
Neurophysiol Clin ; 53(1): 102856, 2023 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-36966728

RESUMO

OBJECTIVES: Due to the noisy environment, a very large number of patients admitted to intensive care units (ICUs) suffer from sleep severe disruption. These sleep alterations have been associated with a prolonged need for assisted ventilation or even with death. Sleep scoring in the critically ill is very challenging and requires sleep experts, limiting relevant studies to a few experienced teams. In this context, an automated scoring system would be of interest for researchers. In addition, real-time scoring could be used by nurses to protect patients' sleep. We devised a sleep scoring algorithm working in real time and compared this automated scoring against visual scoring. METHODS: We analyzed retrospectively 45 polysomnographies previously recorded in non-sedated and conscious ICU patients during their weaning phase. For each patient, one EEG channel was processed, providing automated sleep scoring. We compared total sleep time obtained with visual scoring versus automated scoring. The proportion of sleep episodes correctly identified was calculated. RESULTS: Automated total sleep time and visual sleep time were correlated; the automatic system overestimated total sleep time. The median [25th-75th] percentage of sleep episodes lasting more than 10 min detected by algorithm was 100% [73.2 - 100.0]. Median sensitivity was 97.9% [92.5 - 99.9]. CONCLUSION: An automated sleep scoring system can identify nearly all long sleep episodes. Since these episodes are restorative, this real-time automated system opens the way for EEG-guided sleep protection strategies. Nurses could cluster their non-urgent care procedures, and reduce ambient noise so as to minimize patients' sleep disruptions.


Assuntos
Estado Terminal , Respiração Artificial , Humanos , Estudos Retrospectivos , Respiração Artificial/métodos , Sono , Unidades de Terapia Intensiva , Algoritmos
13.
Psychometrika ; 88(1): 76-97, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-35962849

RESUMO

Accurate assessment of a student's ability is the key task of a test. Assessments based on final responses are the standard. As the infrastructure advances, substantially more information is observed. One of such instances is the process data that is collected by computer-based interactive items and contain a student's detailed interactive processes. In this paper, we show both theoretically and with simulated and empirical data that appropriately including such information in the assessment will substantially improve relevant assessment precision.


Assuntos
Sucesso Acadêmico , Psicometria , Humanos
14.
J Intell ; 10(4)2022 Dec 13.
Artigo em Inglês | MEDLINE | ID: mdl-36547511

RESUMO

Existing assessment methods of writing originality have been criticized for depending heavily on subjective scoring methods. This study attempted to investigate the use of topic analysis and semantic networks in assessing writing originality. Written material was collected from a Chinese language test administered to eighth-grade students. Two steps were performed: 1. Latent topics of essays in each writing task were identified, and essays on the same topic were treated as a refined reference group, within which an essay was to be evaluated; 2. A group of features was developed, including four categories, i.e., path distance, semantic differences, centrality, and similarity of the network drawn from each text response, which were used to quantify the differences among essays. The results show that writing originality scoring is not only related to the intrinsic characteristics of the text, but is also affected by the reference group in which it is to be evaluated. This study proves that computational linguistic features can be a predictor of originality in Chinese writing. Each feature type of the four categories can predict originality, although the effect varies across various topics. Furthermore, the feature analysis provided evidence and insights to human raters for originality scoring.

15.
Front Public Health ; 10: 1002501, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36339161

RESUMO

The dicentric chromosome assay (DCA) is considered the gold standard for radiation biodosimetry, but it is limited by its long dicentric scoring time and need for skilled scorers. The automation of scoring dicentrics has been considered a strategy to overcome the constraints of DCA. However, the studies on automated scoring methods are limited compared to those on conventional manual DCA. Our study aims to assess the performance of a semi-automated scoring method for DCA using ex vivo and in vivo irradiated samples. Dose estimations of 39 blind samples irradiated ex vivo and 35 industrial radiographers occupationally exposed in vivo were estimated using the manual and semi-automated scoring methods and subsequently compared. The semi-automated scoring method, which removed the false positives of automated scoring using the dicentric chromosome (DC) scoring algorithm, had an accuracy of 94.9% in the ex vivo irradiated samples. It also had more than 90% accuracy, sensitivity, and specificity to distinguish binary dose categories reflecting clinical, diagnostic, and epidemiological significance. These data were comparable to those of manual DCA. Moreover, Cohen's kappa statistic and McNemar's test showed a substantial agreement between the two methods for categorizing in vivo samples into never and ever radiation exposure. There was also a significant correlation between the two methods. Despite of comparable results with two methods, lower sensitivity of semi-automated scoring method could be limited to assess various radiation exposures. Taken together, our findings show the semi-automated scoring method can provide accurate dose estimation rapidly, and can be useful as an alternative to manual DCA for biodosimetry in large-scale accidents or cases to monitor radiation exposure of radiation workers.


Assuntos
Exposição à Radiação , Triagem , Humanos , Relação Dose-Resposta à Radiação , Doses de Radiação , Cromossomos Humanos , Aberrações Cromossômicas
16.
Bioengineering (Basel) ; 9(8)2022 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-36004884

RESUMO

Bronchiectasis is defined as a permanent dilation of the bronchi that can cause pulmonary ventilation dysfunction. CT examination is an important means of diagnosing bronchiectasis. It can also be used in severity scoring. Current studies on bronchiectasis have focused on high-resolution CT (HRCT), ignoring the more common low-dose CT (LDCT). Methodologically, existing studies have not adopted an authoritative standard to classify the severity of bronchiectasis. In effect, the accuracy of detection and classification needs to be improved for practical application. In this paper, the ACER image enhancement method, RDU-Net lung lobe segmentation method and HDC Mask R-CNN model were proposed to detect and classify bronchiectasis. Moreover, a Python-based system was developed: after inputing an LDCT image of a patient's lung, it can automatically perform a series of processing, then call on the trained deep learning model for detection and classification, and automatically obtain the patient's bronchiectasis final score according to the Reiff and BRICS scoring criteria. In this paper, the mapping relationship between original lung CT image data and bronchiectasis scoring system was established. The accuracy of the method proposed in this paper was 91.4%; the IOU, sensitivity and specificity were 88.8%, 88.6% and 85.4%, respectively; and the recognition speed of one picture was about 1 s. Compared to a human doctor, the system can process large amounts of data simultaneously, quickly and efficiently, with the same judgment accuracy as a human doctor. Doctors only need to judge the uncertain cases, which significantly reduces the burden of doctors and provides a useful reference for doctors to diagnose the disease.

17.
Front Immunol ; 13: 893198, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35844508

RESUMO

Programmed cell death ligand 1 (PD-L1) is a critical biomarker for predicting the response to immunotherapy. However, traditional quantitative evaluation of PD-L1 expression using immunohistochemistry staining remains challenging for pathologists. Here we developed a deep learning (DL)-based artificial intelligence (AI) model to automatically analyze the immunohistochemical expression of PD-L1 in lung cancer patients. A total of 1,288 patients with lung cancer were included in the study. The diagnostic ability of three different AI models (M1, M2, and M3) was assessed in both PD-L1 (22C3) and PD-L1 (SP263) assays. M2 and M3 showed improved performance in the evaluation of PD-L1 expression in the PD-L1 (22C3) assay, especially at 1% cutoff. Highly accurate performance in the PD-L1 (SP263) was also achieved, with accuracy and specificity of 96.4 and 96.8% in both M2 and M3, respectively. Moreover, the diagnostic results of these three AI-assisted models were highly consistent with those from the pathologist. Similar performances of M1, M2, and M3 in the 22C3 dataset were also obtained in lung adenocarcinoma and lung squamous cell carcinoma in both sampling methods. In conclusion, these results suggest that AI-assisted diagnostic models in PD-L1 expression are a promising tool for improving the efficiency of clinical pathologists.


Assuntos
Antígeno B7-H1 , Neoplasias Pulmonares , Inteligência Artificial , Antígeno B7-H1/metabolismo , Biomarcadores , Humanos , Imunoterapia , Neoplasias Pulmonares/tratamento farmacológico , Neoplasias Pulmonares/terapia
18.
Biomed Signal Process Control ; 75: 103561, 2022 May.
Artigo em Inglês | MEDLINE | ID: mdl-35154355

RESUMO

Coronavirus disease 2019 (COVID-19) pneumonia has erupted worldwide, causing massive population deaths and huge economic losses. In clinic, lung ultrasound (LUS) plays an important role in the auxiliary diagnosis of COVID-19 pneumonia. However, the lack of medical resources leads to the low using efficiency of the LUS, to address this problem, a novel automated LUS scoring system for evaluating COVID-19 pneumonia based on the two-stage cascaded deep learning model was proposed in this paper. 18,330 LUS images collected from 26 COVID-19 pneumonia patients were successfully assigned scores by two experienced doctors according to the designed four-level scoring standard for training the model. At the first stage, we made a secondary selection of these scored images through five ResNet-50 models and five-fold cross validation to obtain the available 12,949 LUS images which were highly relevant to the initial scoring results. At the second stage, three deep learning models including ResNet-50, Vgg-19, and GoogLeNet were formed the cascaded scored model and trained using the new dataset, whose predictive result was obtained by the voting mechanism. In addition, 1000 LUS images collected another 5 COVID-19 pneumonia patients were employed to test the model. Experiments results showed that the automated LUS scoring model was evaluated in terms of accuracy, sensitivity, specificity, and F1-score, being 96.1%, 96.3%, 98.8%, and 96.1%, respectively. They proved the proposed two-stage cascaded deep learning model could automatically score an LUS image, which has great potential for application to the clinics on various occasions.

19.
Front Psychol ; 12: 668401, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34366987

RESUMO

Speech and language impairments are common pediatric conditions, with as many as 10% of children experiencing one or both at some point during development. Expressive language disorders in particular often go undiagnosed, underscoring the immediate need for assessments of expressive language that can be administered and scored reliably and objectively. In this paper, we present a set of highly accurate computational models for automatically scoring several common expressive language tasks. In our assessment framework, instructions and stimuli are presented to the child on a tablet computer, which records the child's responses in real time, while a clinician controls the pace and presentation of the tasks using a second tablet. The recorded responses for four distinct expressive language tasks (expressive vocabulary, word structure, recalling sentences, and formulated sentences) are then scored using traditional paper-and-pencil scoring and using machine learning methods relying on a deep neural network-based language representation model. All four tasks can be scored automatically from both clean and verbatim speech transcripts with very high accuracy at the item level (83-99%). In addition, these automated scores correlate strongly and significantly (ρ = 0.76-0.99, p < 0.001) with manual item-level, raw, and scaled scores. These results point to the utility and potential of automated computationally-driven methods of both administering and scoring expressive language tasks for pediatric developmental language evaluation.

20.
Brain Sci ; 11(7)2021 Jun 29.
Artigo em Inglês | MEDLINE | ID: mdl-34209754

RESUMO

Ultrasonic vocalizations (USVs) are known to reflect emotional processing, brain neurochemistry, and brain function. Collecting and processing USV data is manual, time-intensive, and costly, creating a significant bottleneck by limiting researchers' ability to employ fully effective and nuanced experimental designs and serving as a barrier to entry for other researchers. In this report, we provide a snapshot of the current development and testing of Acoustilytix™, a web-based automated USV scoring tool. Acoustilytix implements machine learning methodology in the USV detection and classification process and is recording-environment-agnostic. We summarize the user features identified as desirable by USV researchers and how these were implemented. These include the ability to easily upload USV files, output a list of detected USVs with associated parameters in csv format, and the ability to manually verify or modify an automatically detected call. With no user intervention or tuning, Acoustilytix achieves 93% sensitivity (a measure of how accurately Acoustilytix detects true calls) and 73% precision (a measure of how accurately Acoustilytix avoids false positives) in call detection across four unique recording environments and was superior to the popular DeepSqueak algorithm (sensitivity = 88%; precision = 41%). Future work will include integration and implementation of machine-learning-based call type classification prediction that will recommend a call type to the user for each detected call. Call classification accuracy is currently in the 71-79% accuracy range, which will continue to improve as more USV files are scored by expert scorers, providing more training data for the classification model. We also describe a recently developed feature of Acoustilytix that offers a fast and effective way to train hand-scorers using automated learning principles without the need for an expert hand-scorer to be present and is built upon a foundation of learning science. The key is that trainees are given practice classifying hundreds of calls with immediate corrective feedback based on an expert's USV classification. We showed that this approach is highly effective with inter-rater reliability (i.e., kappa statistics) between trainees and the expert ranging from 0.30-0.75 (average = 0.55) after only 1000-2000 calls of training. We conclude with a brief discussion of future improvements to the Acoustilytix platform.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA