| Competence.area | Competence | Type.of.DSE | ICILS.Label | PISA.Label |
|---|---|---|---|---|
| 1. Information and data literacy | 1.1 Navigate, search and filter data, information | General | Search for relevant information for a school project on the internet | Search for and find relevant information online |
| 1. Information and data literacy | 1.1 Navigate, search and filter data, information | General | Find the original sources of information referred to in an article on the internet | |
| 1. Information and data literacy | 1.2 Evaluate data, information and digital content | General | Judge whether you can trust information you find on the internet | Assess the quality of information you found online |
| 1. Information and data literacy | 1.3 Manage data, information and digital content | General | Collect and record data (e.g., using data loggers, Google Forms, spreadsheets) | |
| 1. Information and data literacy | 1.3 Manage data, information and digital content | General | Install a program or app | |
| 2. Communication and collaboration | 2.1 Interact through digital technologies/2.2 Share info | General | Explain to other students how to share digital content online or on a school platform | |
| 2. Communication and collaboration | 2.1 Interact through digital technologies/2.2 Share info | General | Upload text, images, or video to an online profile | Share practical information with a group of students |
| 2. Communication and collaboration | 2.4 Collaborate through digital technologies | General | Collaborate with other students on a group assignment | |
| 3. Digital content creation | 3.1 Develop digital content | General | Write or edit text for a school assignment | Write or edit text for a school assignment |
| 3. Digital content creation | 3.1 Develop digital content / 3.2 Integrate content | General | Create a multi-media presentation (with sound, pictures, or video) | Create a multi-media presentation (with sound, pictures, or video) |
| 3. Digital content creation | 3.1 Develop digital content / 3.2 Integrate content | General | Insert an image into a document or message | |
| 3. Digital content creation | 3.2 Integrate and re-elaborate digital content | General | Edit digital photographs or other graphic images | |
| 3. Digital content creation | 3.1 Develop digital content | Specialized | Build or edit a webpage | Create, update and maintain a webpage or a blog |
| 3. Digital content creation | 3.4 Programming | Specialized | Use a text-based programming language (e.g., Python, JavaScript...) | Create a computer program (e.g., in Scratch, Python, Java) |
| 3. Digital content creation | 3.4 Programming | Specialized | Use visual coding (e.g., Scratch, Lego Mindstorms) to develop a simple computer program | |
| 4. Safety | 4.1 Protect devices/ 4.2 Protect personal data and privacy | General | Change the settings on a device to suit your needs and preferences | Change the settings of a device or App in order to protect my data and privacy |
| 5. Problem solving | 5.1 Solve technical problems | Specialized | Identify the source of an error in a software after considering a list of potential causes | |
| 5. Problem solving | 5.2 Identify needs and technological responses | Specialized | Select the most efficient programme or App that allows me to carry out a specific task | |
| 5. Problem solving | 5.3 Use digital technologies creatively | Specialized | Break down a problem and represent a solution as a series of logical steps, such as an algorithm |
Digital Self-efficacy and gender in Large-scale assesments
Daniel Miranda1, Juan Carlos Castillo2, Nicolás Tobar3, Tomás Urzúa4 and Ismael Aguayo5
Abstract
…
1 Introduction
The mastery of digital technologies today requires more than physical access to devices or procedural knowledge of software; it increasingly involves attitudinal dispositions such as confidence, persistence, and the capacity to adapt in complex environments (europeancommissionjointresearchcentre_digcomp_2022?). Research in educational psychology has consistently highlighted self-efficacy as a key determinant of how individuals engage with technology. Defined as the belief in one’s ability to accomplish tasks and overcome obstacles in digital contexts, digital self-efficacy (DSE) has emerged as a central construct in understanding the development of digital competences (bandura_selfefficacy_1995?; ulfert-blank_assessing_2022?). Unlike purely skill-based measures, DSE captures both perceived capability and self-regulatory attitudes, making it particularly useful for explaining why some learners successfully navigate digital transformations while others struggle (schererRelationStudentsSocioeconomic2019?; rohatgiRoleICTSelfefficacy2016?).
Within this attitudinal dimension, evidence has consistently revealed gender-related patterns. Historically, women have reported lower levels of self-confidence and motivation in digital environments, especially in tasks associated with STEM or advanced technical applications (hargittaiDifferencesActualPerceived2006?; caiGenderAttitudesTechnology2017a?). However, more recent studies suggest a nuanced picture: while women continue to show lower self-efficacy in specialized digital domains such as programming or data analysis, they now frequently outperform or equal men in general DSE tasks related to information navigation, communication, and content creation (gebhardtGenderDifferencesComputer2019?). These shifts indicate that the relationship between gender and DSE is dynamic, contingent upon how digital tasks are conceptualized and measured.
Such attitudinal differences are also shaped by broader sociocultural and educational contexts. Cross-national studies show that the magnitude of the gender gap in digital self-efficacy varies considerably depending on how societies adopt and integrate technologies into everyday life (camposDigitalGenderGaps2024?; hatlevikStudentsICTSelfefficacy2018?). This contextual variability underscores the importance of measurement models that are not only valid within single cultural settings but also comparable across diverse populations. Without rigorous tests of measurement invariance, it is difficult to determine whether observed differences reflect substantive disparities or methodological artifacts (leitgob_measurement_2023?).
International large-scale assessments (ILSAs) provide an unparalleled opportunity to examine these issues. Studies such as the International Computer and Information Literacy Study (ICILS) and the Programme for International Student Assessment (PISA) include DSE measures, but they differ in important ways. PISA has traditionally adopted a unidimensional approach, aggregating digital self-confidence into a single scale (oecdPISA2022ICT2023?), whereas ICILS uses a bidimensional framework, distinguishing between general and specialized DSE (fraillon_preparing_2020?). Although ICILS and PISA differ in the wording of the items, if one examines the meaning of the items in depth, they are similar in terms of the tasks, which does not justify their different measurement strategies.
These differences are more than technical: they affect how countries governments interpret digital readiness and how gender disparities are identified. Against this background, the present study contributes by (i) testing whether a two-dimensional model of DSE —general and specialized— can be identified in both PISA and ICILS, (ii) evaluating its measurement invariance across gender and countries, and (iii) exploring determinants on gender differences at country level for both studies. By doing so, we aim to clarify whether differences in DSE are consistent across contexts or instead a product of how assessments operationalize the construct.
2 Self-efficacy and Digital Technologies
Self-efficacy is an orchestration or continued improvisation of multiple skills to manage the ever-changing situation around mastering an activity. As part of the forethought element of human agency, Self-efficacy is not the simple self-perception about the ability to execute an action, is concerned by judgments of how well one can execute courses of action required to deal with prospective situations. Self-efficacy is a type of self-assessment or expectatives to produce particular outcomes, crucial for reinforcing the mastery of the activity to which one is dedicated (banduraSelfefficacyMechanismHuman1982?; banduraSelfefficacyUnifyingTheory1977?).
This concept occupies a pivotal role in the causal structure of adaptation and change in learning contexts through its effect on other cognitive determinants. i) Efficacy plays a central role in the self-regulation of motivation through goal challenges and outcome expectations. ii) Efficacy determines the challenge to undertake, how much effort to expend in the endeavor, how long to persevere in the face of obstacles and failure, and whether failures are motivating or demoralizing. iii) Efficacy beliefs also play a key role in shaping the courses lives take by influencing the types of activities and environments people choose to get into. In synthesis, efficacy shapes the mindset, choices on path-life, and change of environments, crucial for creating fulfilling standards and obtaining performance accomplishments across learning activities (bandura_selfefficacy_1995?; steele_stereotype_1995?).
Self-efficacy has proven to be particularly relevant when it comes to digital technologies learning achievements. A whole thematic research agenda has opened up around self-efficacy role on the adoption and competences on digital technologies, although there are scattered efforts when it comes to measuring this construct.
The first antecedents of self-efficacy applied to digital issues resorted to ‘Computer self-efficacy’. (compeau_computer_1995?) proposed this early instrument focused on general computer domains and specific software application tasks. Defined as individual’s perceptions of his or her ability to use computers in the accomplishment of a task (ie., using a software package for data analysis, writing a mailmerge letter using a word processor), rather than reflecting simple component skills (ie., formatting diskettes, booting up a computer, using a specific software feature such as “bolding text” or “changing margin”). The computer self-efficacy construct was criticized and overcome for neglecting the changing dynamics of digital systems, which extender the digital enviroment over computers. The items of these scales tend to become outdated rapidly (durndellComputerSelfEfficacy2002?; weigel_technical_2014?).
While the increasing importance of interconnection with technologies, the focus turn on Internet Self-efficacy, which was defined as the general one’s judgment of confidence to accomplish different tasks related to online use of technologies. Internet Self-efficacy it does not refer to operative simple tasks, such as writing HTML, using a browser, or transferring files, for example. Instead, it assesses a person’s judgment of their ability to apply Internet skills in a more encompassing mode, such as finding information, social communication, or troubleshooting search problems. Internet self-efficacy may be distinguished from computer self-efficacy as the belief that one can successfully perform a distinct set of behaviors required to establish, maintain, and utilize effectively the Internet over and above basic personal computer skills (eastin_internet_2000a?; hsuInternetSelfefficacyElectronic2004?)
Although this new construct partially addressed the obsolescence of technologies, the set of digital activities was reduced to a particular domain, as is the case with the Internet and online interactions. An ICT Self-efficacy scale was proposed to comprise Computer and internet tasks on the same construct. ICT Self-efficacy construct considers digital information processing and content creation (aesaert_exploring_2014?; hatlevik_students_2018a?). Also, more advanced skills, such as programming or web development (rohatgi_role_2016?), started to be considered. Although to its new measures, ICT Self-efficacy usually presents unidimensional concepts or focuses on specific application domains (using ICT for work, school, or leisure) rather than competencies applicably for general life domains (ulfert-blank_assessing_2022?).
The current measures presented have common limitations in various ways. First, they often do not consider more recent frameworks of digital competences, such as the DigComp, regarding their level of generality, the competences included, and their multidimensionality. The DigComp describes digital competences in terms of general actions (i.e., tasks, functions), such as protecting devices or managing data, that can be applied to a heterogeneous group of individuals and are independent of specific digital systems. Most DSE scales are still system (e.g., specific computer software) or technology-specific (e.g., data storage such as floppy disc) and may thus become outdated. Second, critical competence areas, such as safety and problem-solving are often disregarded. Most of the scales focus on the informational, communicative, and creative aspects of the technologies without exhaustively capturing their dimensions of mastery. Third, the term DSE has been used interchangeably for measuring general competence beliefs (i.e., including items assessing self-concept, another competence belief) or actual proficiency. As a result, this has led to inconsistencies in the representation of the DSE construct in the literature. This is in spite of self-efficacy literature offering clear definitions of how measures should be constructed and its well-defined differentiation from related constructs (ulfert-blank_assessing_2022?).
(ulfert-blank_assessing_2022?) suggests to work with a unified construct denominated Digital Self-efficacy (hereinafter DSE) to reach a high-level research on this issue. Considering the gaps and inconsistencies in previous measurements, (ulfert-blank_assessing_2022?) points out that DSE construct have to (1) be theoretically-grounded multi-dimensional measures of DSE, encompassing diverse digital competence areas, (2) cover different functions and tasks of digital systems, (3) be independent of a specific digital system (e.g., Word), (4) be also labor or economical, not only educational-oriented.
3 Measuring Digital Self-efficacy
Generally, Self-efficacy has two ways to be studied: As perceived capabilities for task achievement and as a self-regulatory attitude. Task self-efficacy involves the beliefs that one can or cannot perform a single instance of a circumscribed behavior at different levels of performance. Self-regulatory self-efficacy is the confidence in how one can (or could) achieve tasks in the context of potential barriers. Studies focusing on capabilities usually emphasize the magnitude of the task, i.e., its degree of difficulty or complexity, and the linear achievement of the masterization process. By contrast, studies focused on attitudinal aspects give greater importance to persistance or resistance in the face of adversities presented in the enviroment involved in an activity (marlatt_selfefficacy_1995?; schwarzer_socialcognitive_2000?; williams_confounded_2016?).
Digital self-efficacy measurements are generally based on the linear achievement of tasks, more than the development of a strong confident self-attitude with technologies. Although some studies define DSE as a unidimensional construct, measuring individuals’ task achievement in using digital technologies without distinguishing between types of tools and/or levels of complexity (hatlevik_digital_2015?; rohatgiRoleICTSelfefficacy2016?), another range of studies adopt a bidimensional approach, which categorise as ‘general or basic DSE’ a first level of achievement considering widespread skills around handling information (browsing the web, evaluating the accuracy of information on the internet, installing programmes), communicating or collaborating through technologies (uploading content to social media, creating online profiles) and creating digital content (designing images, videos and text). A second level of achievement, jerarquically above the previous one, considers high technical expertise on computational tasks such as using programming languages, application development, or website management. Generally, is labelled as ‘specialized or advanced DSE’ (pashaCrossNationalVariationsSelfEfficacy2024?; gebhardtGenderDifferencesComputer2019?).
Both the unidimensional and the bidimensional approaches not only influence measurement instruments but also lead to different research findings: unidimensional models could underestimate the predictive power of DSE for complex digital enviroments, while bidimensional models offer greater explanatory precision but can introduce challenges such as construct overlap or reduced generalizability across contexts, limiting findings across educational systems and cultural contexts (aesaertAccuracyBiasICT2017?; siddiqThereGenderGap2019?; scherer_gender_2023?; camposDigitalGenderGaps2024?). Thus, the choice between unidimensional and bidimensional models of DSE is not merely theoretical, as it has significant implications for the validity, reliability, and utility of research on digital competence development.
However, although evidence shows that self-efficacy in technological area improvement does not follow a positive linear direction with performance, it is noteworthy that studies in the area of DSE focus closely on task achievement rather than on developing attitudes for dealing with situated difficult situations in digital learning processes (bandura_selfefficacy_1995?; fabia_students_2024?; mekheimer_technological_2025?).
4 PISA and ICILS different approachs
International Large-scale Assessment have become one of the most important types of studies in the field of education, characterized by the deployment of their surveys throughout the world, enabling cross-country analysis with large volumes of data. Recently, several ILSA studies have included sections on digital topics as it’s relevance in the current social life (Fraillon et al. 2013), where digital self-efficacy has earned its place.
There are two main studies that include digital topics in their surveys. First, International Computer and Information Literacy Study (ICILS), a studie belonging to the International Evaluation Assosiation (IEA). ICILS is a study focused on digital competences, which seeks to answer the question: How well are students prepared to study, work and live in a digital world? To this end, the study measures computer and information literacy achievement through a standardsized test. The second study is organized and executed by the OECD, called Programme for International Student Assessment (PISA). PISA is recognized by measuring the abilities of 15-year-old adolescents to use their knowledge in reading, mathematics and science to face challenges in real life. This ILSA stands out for the great thematic versatility of its questionnaires, including digital themes in the ICT familiarity module.
Each of these studies understands and, consequently, measures DSE differently. PISA has a one-dimensional approach, as can be inferred from a single battery consisting of 10 items. These items ask about different skills, from text editing in digital services to identifying the source of an error in software. In contrast, ICILS proposes a two-dimensional perspective of DSE, distinguishing between general DSE and specialized DSE. General dimension encompasses tasks that do not require in-depth knowledge of how digital devices work like search and find information on internet. Specialized DSE tasks requires greater mastery of digital skills, as they are more complex to perform like create a computer program.
Probably, these different considerations of DSE dimensions are due to the distinct approaches that both studies have towards self-efficacy concept. PISA presents greater contextualization in its items, defining the existence of obstacles in some and a procedural nature for overcoming them. ICILS opts for a battery focused on the operational aspect of task completion, whose items reflect greater relevance in the completion of a specific task rather than in carrying out an evaluative process of how to solve a problem. In this sense, PISA places greater emphasis on regulatory self-efficacy than ICILS, and therefore, it does not attempt to differentiate tasks based on their level of technical difficulty..
Despite distinct approach, the two studies contain tasks that can be categorized into a more general dimension and a specialized one. PISA and ICILS share items that focus on tasks with a low degree of technical complexity, such as searching for information online and/or editing text for a school subject, but both studies also include items that refer to the creation and maintenance of web pages or programming software.
Considering the previous statements, The first hypothesis is of this study is H1 It is possible to identify two latent dimensions of digital self-efficacy (general and specialized) based on related batteries and indicators included in large-scale assessments such as PISA and ICILS (bi-dimensional hypothesis).
The fact that PISA can be understood from a two-dimensional perspective poses a challenge in terms of scale stability, i.e., that these two dimensions are consistent with the data collected. Furthermore, the fact that studies understand self-efficacy under models with different dimensions prevents the comparability of the batteries from being studied. Therefore, with the intention of validating these constructs in the different countries where it is applied, the following hypothesis is proposed:
By the way, the second hypothesis is H2 The bi-dimensional measurement model of digital self-efficacy is equivalent across countries. Furthermore, the literature has shown that there are significant gender differences depending on the type of dimension (gebhardtGenderDifferencesComputer2019?), so it is also necessary to test the stability of the scales by gender. Then the third hypothesis: H3 The bi-dimensional measurement model of digital self-efficacy is equivalent between girls and boys
5 Methods
5.1 Data
ICILS 2023 employed a two-stage cluster sampling approach across 34 countries and more than 130,000 studentes around the world. In the first stage, schools with students in the target grade were randomly selected using a probability proportional to size (PPS) method. In the second stage, one intact class was randomly selected within each sampled school, and all students in that class were invited to participate (Julian Fraillon, 2024).
PISA 2022 target population consists of 15-year-old students attending educational establishments, employing a two-stage stratified sampling design. In the first stage, schools were systematically selected with PPS from national sampling frames. In the second stage, a target number of students (typically 42) were randomly selected with equal probability from within each sampled school. Official subset of data reach around 393,000 students, nested on 52 countries (maily OECD).
To ensure comparability, both datasets were filtered to include only those countries which participated in PISA ICT questionnaire and the ICILS survey. Furthermore, to maximize the overlap between the samples, specific adjustments were made. In ICILS data, Germany is reported as two separate regions reflecting its distinct educational systems; for this analysis, these regions were aggregated to form a single national entity. Similarly, the ICILS data for Belgium were collected exclusively from the Flemish Community (Flanders) in this wave. To align the datasets, the data from Flanders were treated as representative of the entire country.
The final country-level sample include 22 countries, with 91,132 nested students in ICILS and 183,902 in PISA.
5.2 Variables
Both studies include a module on digital self-efficacy in their questionnaires. In ICILS the items are introduced with the phrase “How well can you do:” and recoded categories are 1 = I do not think I could do this; 2 = I have never done this, but I could work out how to do it; 3 = Moderately well; 4 = Very well. PISA module preface the items with the question “To what extent are you able to do the following tasks when using <digital resources>?”, where <digital resources> is a placeholder for a term adapted to the target culture. PISA recoded categories are 1 = I cannot do this; 2 = I struggle to do this on my own; 3 = I can do this with a bit of effort; 4 = I can easily do this. Responses as “I don’t know” was threated as missing values. A complete comparison of the items from both surveys is presented in Table 1, highlighting the concordances and distinctions between the studies, and the type of DSE we are assuming to match the item.
We also consider the three-digit ISO country code and the student’s sex (recoded as 1 = male and 2 = female) from both datasets.
5.3 Analytical Strategy
All data management and analyses were conducted using the R statistical environment. The data wrangling were developed using R packages dplyr and haven.
The main analyses were performed within a Confirmatory Factor Analysis (CFA) framework to test the hypothesized two-factor structure of DSE (Brown, 2015). The CFA models were estimated with the lavaan package (Rosseel, 2012). Given the ordinal nature of the Likert-scale items, they were treated as ordered categorical variables. Models were estimated using the Weighted Least Squares Mean and Variance Adjusted (WLSMV) estimator. Missing data were handled using pairwise deletion, which is the standard approach for this estimator in lavaan besides the (default) listwise deletion.
To evaluate the goodness-of-fit of the models, chi-square (χ²) test was used. However, due to the test’s sensitivity to large sample sizes, model evaluation primarily relied on a complementary set of fit indices: the Comparative Fit Index (CFI), the Tucker-Lewis Index (TLI), and the Root Mean Square Error of Approximation (RMSEA). Following common guidelines, CFI and TLI values of 0.95 or higher are considered indicative of excellent fit, while values between 0.90 and 0.95 are considered acceptable. For RMSEA, values of 0.06 or less indicate a close fit, while values up to 0.08 are considered adequate (Brown, 2015; Wang & Wang, n.d.).
First, a pooled model was estimated to test the overall fit of the hypothesized model across both studies. In instances where the initial model failed to achieve an acceptable fit, a principled procedure of model re-specification was undertaken. Sources of model misfit were diagnosed by examining multiple indicators, including standardized factor loadings and modification indices. Items identified as primary sources of misspecification were considered for removal. Any modification was contingent upon a strong theoretical justification on (ulfert-blank_assessing_2022?) framework: Items with ambiguity dimension or content overlap were prioritized to remove, ensuring that the construct’s content validity was not compromised. This process was performed judiciously to arrive at a final, well-fitting model that remained theoretically coherent. A factor was required to have a minimum of three indicators to be retained.
in the next step, to test the comparability of this measurement model across countries and genders, a series of Multi-Group Confirmatory Factor Analyses (MGCFA) was conducted to assess Measurement Equivalence or Invariance (ME/I) (e.g., Beaujean, 2014; Davidov et al., 2014). This procedure involves testing three sequential levels of invariance by imposing progressive restrictions: 1) configural invariance, which tests if the same factor structure holds for all groups; 2) metric invariance, which adds the constraint that factor loadings are equal across groups; and 3) scalar invariance, which further constrains item intercepts (or thresholds for ordinal items) to be equal (Cheung & Rensvold, 2002; Milfont & Fischer, 2010). The establishment of invariance between these nested models was evaluated by examining the change in CFI (ΔCFI) and RMSEA (ΔRMSEA). Specifically, for metric (weak) invariance, a change in CFI of ≥ -0.004 and a change in RMSEA of ≤ 0.050 were considered acceptable. For scalar (strong) invariance, a ΔCFI of ≥ -0.004 and a ΔRMSEA of ≤ 0.010 were considered adequate (Rutkowski & Svetina, 2017). Achieving metric invariance is a prerequisite for further analysis of association among dimensions of DSE with factors. Achieving scalar invariance is a prerequisite for further analysis of association among dimensions of DSE with factors and estimation of the latent means across countries and across gender.
Upon establishing scalar invariance, latent mean scores of General and Specialized DSE country distribution are explored. Then the magnitude of the DSE gender gap is visualized for each country.
6 Results
6.1 Pooled models
The models originally hypothesized for both studies did not achieve an initial fit, thus requiring several modifications to meet the established criteria. Regarding PISA, the pooled model yielded the following fit indices: CFI = 0.98; TLI = 0.97; RMSEA = 0.15, exceeding the maximum acceptable RMSEA threshold of 0.08. Given this result, a model re-specification process was undertaken, removing items from the model. The first item dropped from the model was ‘identify app’, which, according to the modification indices, cross-loaded on the general sub-dimension of the latent variable (MI = 113,132). This item, unlike the others in the specialized sub-dimension, does not refer to tasks specifically related to programming, which may explain why it did not fit well within the factor. The next item removed was ‘develop webpage’, due to its cross-loading on the general DSE factor (MI = 61,606). It is possible that due to the propagation and popularization of ‘user-friendly’ platforms for website creation, this skill is no longer considered advanced by young people. The next item removed was ‘change settings’, due to it cross-loading on the specialized DSE factor (MI = 9,168). This item is not directly related to the basic ‘digital literacy’ dimensions recognized in the DigComp framework (information, communication, and creation), but rather to the dimension related to privacy. Next, the ‘collect data’ item was dropped, as modification indices indicated it cross-loaded on the specialized DSE factor (MI = 11,187). Although this item requires deeper analysis, the basic data collection and analysis skills it assesses seem to be perceived as more complex by students. Finally, the ‘create media’ item was extracted from the model, as modification indices suggested it cross-loaded on the specialized sub-dimension (MI = 7,927). Although we lack substantive evidence to affirm this, it is plausible that the perceived threshold for proficiency in multimedia skills has increased for students, given the wide availability of high-quality multimedia content on the internet.
In the case of ICILS, the pooled model also failed to fit due to a very high RMSEA; its fit indices were: CFI = 0.97; TLI = 0.96; RMSEA = 0.1. Again, a re-specification process of the hypothesized model was followed. First, ‘source info’ was removed from the model, as it cross-loaded on the specialized DSE factor (MI = 18,633). Identifying an information source today has become very complex; tracing the origin of a specific piece of data is something that, in the students’ view, apparently requires greater specialization. So much so, that this item shows high shared variance with visual coding and programming. Then, just as in the previous study, ‘change settings’ was removed, due to it not being related to the basic dimensions of the DigComp framework. Finally, two items related to multimedia skills were removed due to cross-loading on the specialized DSE factor: ‘create media’ (MI = 6,174) and ‘edit image’ (MI = 6,510). The fact that in both studies these items lean towards the specialized self-efficacy factor leads us to think that, rather than a measurement issue specific to one study, the students’ perceived difficulty of editing and managing audio, video, and image skills is genuinely changing.
Given the above, the following re-specified two-dimensional DSE models are proposed for both studies.
6.2 Modified-Fitted models
Figure 1: PISA fitted model

Figure 2: ICILS fitted model

In both studies, the general factor remains with six items and the specialized factor with three. In PISA, the fit indices are within the excellent fit range (>0.95) for CFI and TLI, and the acceptable range (<0.08) for RMSEA. All factor loadings are greater than 0.85, indicating that the proposed factors explain a good portion of the variance of the items comprising the latent variable (at least 72.3%). The correlation between the two factors is moderate, at 0.4.
Regarding ICILS, the model shows a slightly looser fit than in PISA. Although the CFI and TLI also indicate an excellent fit, the RMSEA is just at the borderline to be considered acceptable. In addition to this, the factor loadings are somewhat lower, although they still show high magnitude, with values ranging from 0.68 to 0.84, meaning they would explain, in the worst-case scenario, 46.2% of the item variance. The correlation between both factors is very similar to the PISA model, at 0.43.
6.3 Multigroup CFA and invariance analysis

After establishing the fit of the general models for each study, the multigroup confirmatory analysis and invariance testing proceeded. Starting with cross-country invariance, the PISA study passed all three tested levels (configural, metric, and scalar), showing changes in the key fit indices (\(\Delta\)CFI and \(\Delta\)RMSEA) that were within acceptable ranges. This establishes cross-country comparability and permits subsequent analyses. On the other hand, while ICILS achieved configural and metric invariance, scalar invariance could not be established due to a change in CFI (-0.007) that exceeded the cut-off criterion. This allows us to compare the factor loadings across countries, but it does not permit the comparison of latent means, as groups may interpret the items differently (i.e., the intercepts are not equivalent).

Regarding gender invariance, both studies achieved scalar invariance. The changes between the nested models (\(\Delta\)CFI and \(\Delta\)RMSEA) were within the acceptable ranges (i.e., below the problematic cut-offs). This signifies that the latent constructs are comparable across genders in both studies, permitting comparative analyses of both factor loadings and latent means.
6.4 Descriptives by gender and countries
To do…
7
Footnotes
Psychology Department, University of Chile. Associate researcher at Center of Social Cohesion and Conflict studies (COES). Adjunct researcher on NUDOS. damiran1@uc.cl.↩︎
Sociology Department, University of Chile. Principal researcher on Millennium Nucleus on Digital Inequalities and Opportunities (NUDOS) and at the Center of Social Cohesion and Conflict studies (COES). juancastillov@uchile.cl - jc-castillo.com↩︎
Research Assistant on NUDOS. Msc (c) Political Science, Faculty of Government, University of Chile. nicolas.tobar@ug.uchile.cl.↩︎
Research Assistant on NUDOS. Msc (c) Social Sciences, Sociology Department, University of Chile. tomas.urzua@ug.uchile.cl↩︎
Research Assistant on NUDOS. Bsc (c) Sociology, Sociology Department, University of Chile. ismael.aguayo@ug.uchile.cl↩︎