RESUMO
Filipino students' performance in global assessments of science literacy has always been low, and this was confirmed again in the PISA 2018, where Filipino learners' average science literacy scores ranked second to last among 78 countries. In this study, machine learning approaches were used to analyze PISA data from the student questionnaire to test models that best identify the poorest-performing Filipino students. The goal was to explore factors that could help identify the students who are vulnerable to very low achievement in science and that could indicate possible targets for reform in science education in the Philippines. The random forest classifier model was found to be the most accurate and more precise, and Shapley Additive Explanations indicated 15 variables that were most important in identifying the low-proficiency science students. The variables related to metacognitive awareness of reading strategies, social experiences in school, aspirations and pride about achievements, and family/home factors, include parents' characteristics and access to ICT with internet connections. The results of the factors highlight the importance of considering personal and contextual factors beyond the typical instructional and curricular factors that are the foci of science education reform in the Philippines, and some implications for programs and policies for science education reform are suggested.
RESUMO
Filipino students performed poorly in the 2018 Programme for International Student Assessment (PISA) mathematics assessment, with more than 50% obtaining scores below the lowest proficiency level. Students from public schools also performed worse compared to their private school counterparts. We used machine learning approaches, specifically binary classification methods, to model the variables that best identified the poor performing students (below Level 1) vs. better performing students (Levels 1 to 6) using the PISA data from a nationally representative sample of 15-year-old Filipino students. We analyzed data from students in private and public schools separately. Several binary classification methods were applied, and the best classification model for both private and public school groups was the Random Forest classifier. The ten variables with the highest impact on the model were identified for the private and public school groups. Five variables were similarly important in the private and public school models. However, there were other distinct variables that relate to students' motivations, family and school experiences that were important in identifying the poor performing students in each school type. The results are discussed in relation to the social and social cognitive experiences of students that relate to socioeconomic contexts that differ between public and private schools.