DOI: 10.47460/athenea.v2i4.17
Structural equation models - PLS in engineering sciences: a brief
guide for researchers through a case applied to the industry
Villalva A. Juan Enrique
ORCID:
Juanev99@gmail.com
UNEXPO Puerto Ordaz - CVG Ferrominera Orinoco
Edo. Bolívar, Venezuela
Received (08/04/21), Accepted(18/05/21)
Abstract: Modeling using structural equations, is a second generation statistical data analysis technique, it has been positioned as the methodological options most used by researchers in various fields of science. The best known method is the
Keywords: Competitiveness, Structural equations, Iron mining, Sustainability.
Modelos de ecuaciones estructurales - PLS en ciencias de la ingeniería: una breve guía para investigadores a través de un caso aplicado a la industria
Resumen: El modelado mediante ecuaciones estructurales, es una técnica de análisis de datos estadísticos de segunda generación, se ha posicionado como la opción metodológica más utilizada por investigadores en diversos campos de la ciencia. El método más conocido es el basado en la covarianza, pero presenta algunas limitaciones para su aplicación en determinados casos. Otro método alternativo se basa en la estructura de varianza, mediante el análisis de mínimos cuadrados parciales, que es una opción adecuada cuando la investigación implica el uso de variables latentes (por ejemplo, indicadores compuestos) elaboradas por el investigador, y donde es necesario explicar y predecir modelos complejos. Este artículo presenta un breve resumen de la técnica de modelado de ecuaciones estructurales, con un ejemplo sobre la relación de constructos, sostenibilidad y competitividad en la minería del hierro, y pretende ser una breve guía para futuros investigadores en las ciencias de la ingeniería.
Palabras Clave: Competitividad, Ecuaciones estructurales, Minería de hierro, Sostenibilidad.
5
Villalva et al., Structural equation models - PLS engineering sciences
ISSN
I.INTRODUCTION
Latent variables or constructs are present in everyday life more than we realize, although we use them daily, examples of which are: happiness, intelligence, poverty, etc. Also at the level of engineering disciplines, cons- tructs are present, examples of which are sustainability, environmental performance, competitiveness, corporate social responsibility, quality of service, capacity for innovation, among others, which need to be measured and evaluated for the problem diagnosis and decision making.
The structural equation model (Structural Equation Modeling, SEM) is a multivariate method that allows simultaneously evaluating the dependency relationships between observable and unobservable variables (cons- tructs). With this technique, research models are carried out through the transformation of theoretical concepts into unobservable variables and the transformation of empirical concepts into indicators, both are related throu- gh the hypothesis expressed graphically by path diagrams. The SEM method can be applied using two alternati- ves: SEM based on the structure of covariance (BC) or SEM based on the structure of variance, through partial least squares analysis (PLS).
The origin of the BC SEM dates back to 1973, when Karl Jöreskog introduced a maximum likelihood algo- rithm for estimating models of covariance structures [1]. The Swedish professor Herman Wold, criticized the dependence of the distribution assumptions, which affects the validity of the empirical results, and proposed an alternative approach, Partial Least Squares (PLS), and in 1977 he developed the algorithm NIPALS (Nonlinear Iterative Partial Least Squares) [2]. The BC SEM is based on the assumption of normality of the variables and uses a maximum likelihood estimate, requires a generally large sample and is focused on the “reproduction” of the structure of relationships between variables.
This article is based on the PLS SEM, which has advantages for its application. This modeling method is more flexible by not requiring rigorous parametric assumptions. PLS SEM does not assume normality and is estimated by recursive least squares, it is applicable with small samples and is focused on prediction. The mathe- matical and statistical procedures underlying the PLS SEM are rigorous and robust [2].
PLS SEM is the iterative combination of principal component analysis, path analysis, and Ordinary Least Squares (OLS) regression. The principal component analysis links the observable variables with the constructs, the path analysis allows the construction of the structure of the system of variables and the OLS regression allows the estimation of the parameters. It is important to highlight that PLS SEM can be used for both expla- natory (confirmatory) and predictive (exploratory) research [1], [3].
In engineering sciences, relatively few researchers have also begun to successfully exploit the potential of PLS SEM to obtain relevant results in their analyzes. In this article, after proposing the theoretical definitions and the procedure, to illustrate the application of the PLS SEM, a case of an multidisciplinary nature of mining, industrial and environmental engineering is developed, on the relationship of the constructs Sustainability and Competitiveness in the iron mining industry, demonstrating the high applicability of this novel technique for the development of models in the field of engineering.
II.THEORETICAL ASPECTS
The general model of structural equations consists of a measurement model, also called an external model, and a structural model or internal model. The measurement model specifies the relationships between the ob- servable variables and the latent variables that underlie them. On the contrary, the structural model specifies the relationships between the latent variables, which in turn consist of exogenous variables or constructs ( η ) and endogenous variables or constructs ( ξ ). In Figure 1, a schematic of the general model of a PLS SEM is presented. In the context of PLS SEM, you can work with two types of measurement models: (1) the reflective model; and (2) the training model.
The measurement model is governed by two equations; one that measures the relationships between endoge- nous latent variables and their observable variables.
6
Villalva et al., Structural equation models - PLS engineering sciences
Where:
y is the vector of p observable variables (px1)
Λy is the matrix of coefficients that show the relationships between the latent and observed variables (pxm). Also called the charge matrix (λ).
εis the error vector (px1).
The second equation of the measurement model is the one that governs the relationships between the exoge- nous latent variables and their observable variables:
δis the error vector (qx1).
The structural model is defined by the equation:
Where:
ηrepresents the vector of endogenous latent random variables of dimension mx1.
ξrepresents the vector of exogenous latent random variables of dimension nx1.
βrepresents the matrix of coefficients that govern the relationships between the endogenous variables mxm. Γ represents the matrix of coefficients that govern the exogenous relationships and each of the endogenous
ones, or in other words, the effects of ξ on η . Its dimension is mxn.
ζrepresents the vector of disturbances or errors.
Fig. 1. General model schematic of a PLS SEM
Source: Adapted from Cepeda and Roldán [4]
III.PROCEDURE TO APPLY PLS SEM
The procedure to apply PLS SEM is illustrated in Figure 2, there are six steps:
1.Specify the measurement and structural models.
2.Collect and examine the data.
7
Villalva et al., Structural equation models - PLS engineering sciences
3.Estimate the PLS parameters.
4.Evaluate the results of the measurement and structural models.
5.
6.Interpret the results and draw conclusions.
Fig. 2. cheme of the procedure to apply the PLS SEM.
A. Specify the measurement and structural models.
The researcher applies the theoretical knowledge of the studied phenomena to the formulation of mathemati- cal expressions related to the relationships between latent variables, and their relationships with their indicators or observable variables.
B. Collect and examine data.
The data collection and examination stage is very important in the SEM application and can avoid delay, es- pecially when careful examination of the data manages to rid the data of outliers and identify missing data. The first step in dealing with outliers is to identify them, standard statistical software packages offer a multitude of statistical tools, which allow you to identify outliers. In general, each PLS SEM software offers ways to handle the missing data, the most common are: substituting the mean of the valid values of that indicator or eliminating the cases that include missing values.
C. Estimate parameters (PLS).
Once specified, the structural and measurement parameters of a PLS SEM model are estimated by the sof- tware (in our case the SmartPLS) iteratively using simple Ordinary Least Squares (OLS) and multiple regres- sions. In a much summarized way it would be the following sequence of iterations:
1.In the first iteration of PLS, you get an initial value for η(adding the values y1 ..., yq).
2.Estimation of the regression weights π1 ..., πp (regression of the value of η with x1 ..., xp).
3.Estimates of π1 ..., πp in linear combination with x1 ..., xp resulting in an initial value for ξ .
4.Estimates of the charges λ1 ..., λ q by a series of simple regressions of y1 ..., yq on ξ .
5.The estimated charges λ1 ..., λ q, in linear combination with y1 ..., yq, obtain a new estimate of the value
of η .
This procedure continues until the difference between consecutive iterations is extremely small, according to the criterion selected by the researcher [4].
D. Evaluate the results of the measurement and structural models.
To evaluate the results, it is necessary to verify and validate the goodness of fit of the models. There is no global fit coefficient available in PLS SEM and not all measures are appropriate to assess all types of fit [5]. The
8
Villalva et al., Structural equation models - PLS engineering sciences
validation of the SEM model through the PLS statistical tool requires a series of parameters that are estimated in two stages: the measurement model and the structural model [6], [7].
Validation of the Measurement Model. This is done with respect to the validity and reliability attributes of the model. This implies verifying: i) individual item reliability, ii) internal consistency, iii) convergent validity, and iv) discriminant validity [7], [8].
Item Reliability. The criterion for an item to be considered in the composition of the variables is that it must load at least 0.5 in the factor [9]. In this sense, it is considered that the individual reliability of the item is asses- sed by examining the loads ( λ ) or simple correlations. Another more demanding criterion to accept an indicator is that it has a load equal to or greater than 0.707 ( λ ², 50% of the variance is explained) [4].
Internal Consistency (Construct Reliability). The reliability of a construct makes it possible to check the internal consistency of all the indicators when measuring the concept, that is, it is evaluated how rigorously the obser- vable variables are measuring the same latent variable (Roldán, 2004). Construct reliability can be verified using composite reliability and Cronbach's alpha. Composite reliability is a preferred alternative to Cronbach's alpha as a test of convergent validity in a reflective model; Cronbach's alpha may overestimate or underestimate the reliability of the scale. In a model suitable for exploratory purposes, the composite reliabilities should be equal to or greater than .6 [10], [11]; equal to or greater than .70 for a suitable model for confirmation purposes [12]; and equal to or greater than .80 is considered good for confirmatory research [13]. The compound reliability measure (ICC) is given by the following mathematical expression:
donde λi = carga estandarizada del indicador i, λ i = error de medida del indicador i, y var( ε i) = 1 – λ²i [14]. Convergent Validation. It determines if the different items destined to measure a concept or construct really measure the same thing, then the adjustment of these items will be significant and they will be highly correlated
[6].The assessment of convergent validity is carried out by means of the measure developed by Fornell and Larcker (1981) called the mean extracted variance (Average Variance Extracted: AVE) [4]. AVE measures the amount of variance that a construct obtains from its indicators in relation to the amount of variance due to the measurement error, its formula being the following:
[14].
This statistic can be interpreted as a measure of construct reliability and as a measure of the evaluation of discriminant validity [15]. The mean extracted variance is recommended to be greater than 0.50, which establi- shes that more than 50% of the variance of the construct is due to its indicators [14].
Discriminant validity. It indicates to what extent a given construct is different from others in a research model
[6].Therefore, establishing discriminant validity implies that a construct is unique and captures phenomena not represented by other constructs in the model [1]. For there to be discriminant validity in a construct, there must be weak correlations between it and other latent variables that measure different phenomena [7].
Traditionally, researchers have relied on two measures of discriminant validity: the
9
Villalva et al., Structural equation models - PLS engineering sciences
and cross loads must be below .3 (some use .4) [5].
Validation of the Structural Model. In this phase, it must be verified whether the amount of variance of the endogenous variable is explained by the constructs that predict it. The appropriate model fit criteria are sum- marized in the following aspects:
Predictive relevance
The
To assess the validity of the structural model, changes in R² can also be explored to determine whether the influence of a particular latent variable on a dependent construct has a substantive impact [10]. The importance of the effect f² can be calculated with the following expression:
where R²included and R²excluded represent the R² provided by the dependent latent variable when the pre- dictor variable is used or omitted in the structural equation respectively [10]. The f² levels of 0.02, 0.15 and 0.35 are respectively a small, medium or large effect.
The structural path coefficients (loads) vary from 0 to 1, for standardized data. These loads must be signifi- cant. The significance level is determined from the Student t value derived from the resampling or bootstrapping process, which is a
The predictive relevance check is performed using a procedure called “blindfolding”, to determine the Q² coefficients. This procedure omits part of the data when estimating a dependent latent variable from other in- dependent latent variables, and then attempts to estimate those data using the previously estimated parameters. The process is repeated until each omitted data has been estimated.
The Q²
For its calculation of Q², an omission distance "D" is taken that is not a divisor of the sample size. "D" corres- ponds to the number of cases omitted in the sample that must be estimated. Generally in existing PLS software packages, the default distance is between
A good model demonstrates predictive relevance when Q² is greater than zero [10]. For values close to .02 it represents a "small" relevance size, .15 represents a "medium" relevance size and .35 represents a "high" rele- vance size [19]. The
Where:
SSE = sum of squares of prediction error
SSO = sum of squares of observations
10
Villalva et al., Structural equation models - PLS engineering sciences
Multicollinearity is a problem in the reflective or formative models, as well as the structural model, for the same reason that it is in the OLS regression models. To evaluate multicollinearity, it is performed through the variance inflation factor coefficients (VIF) and / or the tolerance is equal to 1.0 minus R². In a
E.
On rare occasions, the proposed model is the one that best fits initially in the first run, so it is very common to
F. Interpret results and draw conclusions.
As a last step, the simple regression coefficients between the scores of the components of ξ and η are analyzed, where the results and statistical significance of the relationships between latent variables that make up the hypotheses are analyzed, in order to check whether they were accepted or not in the study.
IV.Case Study: Evaluation Of The Effect Of Sustainability Dimensions On Competitiveness In Iron Mi- ning
A.Measurement and structural models.
To study the effect of the dimensions of sustainability on competitiveness in iron mining, the dimensions spe- cified in the model proposed by the ICMM (“International Council on Mining and Metals”) and the GRI (“Global Reporting Initiative”). This model establishes three dimensions for sustainability in economic, environmental and social terms. The economic dimension refers to the impacts of the organization on the economic conditions of its stakeholders and on economic systems at the local, national and global levels. The environmental dimen- sion refers to the impact of the organization on natural systems, including land, air, water, and ecosystems. The Environmental Category covers impacts related to energy, water, emissions and waste. The social dimension refers to the impacts that the organization has on the social systems in which it operates [20]. In Figure 3, the indicators and dimensions of the sustainability and competitiveness of iron mining are illustrated, the variables that were commonly reported by mining companies in their sustainability reports were selected as indicators.
To define the competitiveness construct of iron mining, the variables used by prominent authors in the area of business competitiveness and mining competitiveness were taken.
Fig. 3. Scheme of indicators of the dimensions of sustainability and competitiveness of iron mining.
Table 1 shows the variables and their units. Figure 4 shows the diagram of the measurement and structural models. The hypotheses were as follows:
- H1: EconomPerf has a positive and significant effect on CompetPerf.
11
Villalva et al., Structural equation models - PLS engineering sciences
-H2: EnvironmPerf has a positive and significant effect on CompetPerf.
-H3: SocialPerf has a positive and significant effect on CompetPerf.
Table 1. Description of the observable variables
Fig. 4. Diagram of the measurement and structural models
B.Data collected from the indicators.
For data collection, annual data were taken from the reports of eight mining companies, national and interna- tional organizations. The mining companies were Assmang (South Africa), CAP (Chile), IOC (Canada), Kumba (South Africa), LKAB (Sweden), Rio Tinto (Australia), Vale (Brazil) and Ferrexpo (Ukraine). These companies represented approximately 45% of the world market for 2019. Table 2 presents the descriptive statistics of the data collected.
12
Villalva et al., Structural equation models - PLS engineering sciences
ISSN
Table 2. Descriptive statistics of the data of the observable variables
C.Estimated PLS parameters of the models.
For PLS calculations, SmartPLS version 3.2.7 software was used. In Figure 5, the model is presented with the values of the loads of the measurement model, the path coefficients of the structural model and the composite re- liability values of the latent variables.
Fig. 5. Models with the estimated parameters: Loads, Coefficients and composite reliability indices.
D.Evaluation of the results
Validity of the Measurement Model. The individual reliability of each of the items is assessed by examining the loads (λ), as can be seen in Figure 5, all the loads are greater than 0.5, which satisfies the criterion of the minimum required load value λ> 0.5. Table 3 shows the statistical significance of all loads (λ), it is observed that they are significant; therefore all items are accepted as valid.
Internal Consistency and Convergent Validity. Table 4 shows the results of the composite reliability index and the mean extracted variance (AVE) for each latent variable. The measurement model is considered to have internal consistency and convergent validity, since the composite reliabilities are greater than .80 and the AVEs are greater than 0.5.
13
Villalva et al., Structural equation models - PLS engineering sciences
Table 4. Internal consistency and convergent validity
Discriminant validity. There are different criteria for determining the discriminant validity, among which are the analysis of the extracted variance (AVE) and the cross loads. Table 5 shows the correlation matrix between constructs, where the diagonal shows that the square root of the extracted variance is greater than the shared va- riance between constructs, therefore, according to the
Table 5. Discriminant validity
Validity of the Structural Model. For the evaluation of the structural model, the collinearity coefficients, the magnitude and statistical significance of the path coefficients, the effect sizes f² and the predictive relevance Q² were verified.
14
Villalva et al., Structural equation models - PLS engineering sciences
The R² values <0.8 and tolerance> 0.2, shown in Table 6, indicate the absence of multicollinearity, this is also corroborated with the VIF values shown in Appendix A2, by satisfying the criterion of VIF <4.0.
Table 6. Reliability and construct validity
Table 7 shows the standardized path coefficients, the t statistics and the corresponding statistical significance, it is observed that the coefficients are significant.
Table 7. Path coefficients (standardized regression coefficients)
In Table 8, the values of f² are presented, which measure the change in R² when a certain exogenous construct is omitted from the model. As can be seen, the EconomPerf has a large effect with the CompetPerf; however, the En- vironmPerf has a medium effect with the CompetPerf, and the SocialPerf has a small effect with the CompetPerf.
Table 8. F square
In Table 9, the Q squared values are presented. According to the criterion of Cohen (1988), affirming that the model has a high degree of predictive relevance with respect to the endogenous CompetPerf and SocialPerf fac- tors. In the case of EconomPerf, it presents a medium degree of predictive relevance.
Table 9. Q squared
15
Villalva et al., Structural equation models - PLS engineering sciences
E.Interpret results and draw conclusions.
The results of the path coefficients in Table 7 indicate that EconomPerf
Table 10. Summary of results
The results obtained reveal that the economic sustainability and environmental sustainability dimensions of iron mining have a negative influence on competitiveness. This result, far from being a conflict of interest between latent variables, represents the effect of the observable variables. In the case of the environmental di- mension, the emission of CO2, the waste generated, the use of water and the use of energy results in a negative effect on competitiveness. In the case of the economic dimension, the operating costs involved in the acquisition of goods and services in the localities, generating indirect jobs for the mining activity, and the community invest- ment, if it increases, benefits the community, but in turn affects the profitability by being expenditures.
It is then a reality, in which the mining companies must give responsible treatment to the
V.CONCLUSIONS
The following conclusions emerge from the research carried out:
1.It has been shown that the PLS SEM is a technique that facilitates the development of research models from theoretical concepts and latent variables, with a limited number of observations.
2.With the present case of a multidisciplinary nature of mining, industrial, environmental engineering and statistical science, applied to the mining industry, where indicators prepared from objective data of the observed reality were used, we can affirm that the PLS SEM technique, constitutes an excellent support tool for research in the field of engineering sciences.
3.The ability to model the relationships between latent variables in a flexible way and not subject to rigorous parametric assumptions of the PLS SEM, allows us to forecast for this recent technique many applications in the field of engineering sciences.
4.Finally, as future research work, there are possible applications of the PLS SEM to the study of important aspects of the industry such as: productivity, efficiency, innovation, quality, corporate social responsibility, opera- tion of industrial plants, organizational climate, ergonomics, industrial safety, among others.
16
Villalva et al., Structural equation models - PLS engineering sciences
ISSN
A2. VIF values of the structural model
REFERENCES
[1]J. Hair, G. Hult, C. Ringle and M. Sarstedt. A Primer on Partial Least Square Structural Equation Modeling
[2]H. Wold. Model Construction and Evaluation when Theoretical Knowledge Is Scarce: An Example of the Use of Partial Least Squares. Genève. Faculté des Sciences Économiques et Sociales, Université de Genève. 1979. [3]J. Henseler, G. Hubona & P. Ray. “Using PLS path modeling new technology research: updated guidelines”. Industrial Management & Data Systems, 116(1),
[4]G. Cepeda and Roldán J. “Aplicando en la Práctica la Técnica PLS en la Administración de Empresas”. Con- greso de la ACEDE, Murcia, España, 2004.
[5]D. Garson. Partial Least Squares. Regresión and Structural Equation Models. USA. Statistical Associates Pu- blishing: 2016.
[6]D. Barclay, C. Higgins & R. Thompson. “The Partial Least Squares (PLS) Approach to Causal Modeling: Per- sonal Computer Adoption and Use as an Illustration”. Technology Studies. Special Issue on Research Methodolo- gy. (2:2), pp.
[7]J. Medina, N. Pedraza & M. Guerrero. “Modelado de Ecuaciones Estructurales. Un Enfoque de Partial Least Square Aplicado en las Ciencias Sociales y Administrativas”. XIV Congreso Internacional de la Academia de Ciencias Administrativas A.C. (ACACIA). EGADE – ITESM. Monterrey, México, 2010.
[8]J. Medina & J. Chaparro. “The Impact of the Human Element in the Information Systems Quality for Decision Making and User Satisfaction”. Journal of Computer Information Systems. (48:2), pp.
[9]D. Leidner, S. Carlsson, J. Elam & M. Corrales. “Mexican and Swedish Managers’ Perceptions of the Impact
17
Villalva et al., Structural equation models - PLS engineering sciences
of EIS on Organizational Intelligence, Decisión Making, and Structure”. Decision Science. (30:3), pp.
[10]W. Chin. “The partial least squares approach for structural equation modeling”. Chapter Ten, pp.
[11]M. Höck & C. Ringle M. “Strategic networks in the software industry: An empirical analysis of the value continuum”. IFSAM VIIIth World Congress, Berlin 2006.
[12]J. Henseler, Ch. Ringle & M. Sarstedt. Handbook of partial least squares: Concepts, methods and applications
in marketing and related fields. Berlin: Springer, 2012.
[13]S. Daskalakis & J. Mantas. “Evaluating the impact of a
[14]C. Fornell & D. Larcker: “Evaluating Structural Equation Models with Unobservable Variables and Measure- ment Error”, Journal of Marketing Research, vol. 18, pp.
[15]C. Fornell. A Second Generation of Multivariate Analysis: An Overview. Vol. 1. New York, U.S.A. Praeger
Publishers: 1982.
[16]R. Falk and N. Miller. A Primer for Soft Modeling. Ohio: The University of Akron. 1992.
[17]M. Martínez. Aplicación de la técnica
[18]S. Geisser. “A predictive approach to the random effects model”. Biometrika, Vol. 61(1), pp.
CURRICULUM SUMMARY
Juan E. Villalva. Dr. in Engineering Sciences, MSc in Electronic Engineering, Esp in Operations and Production, Esp in Automation and Electrical Engineer. Researcher, Teacher and field experience in the mining and metal processing industries.
18
Villalva et al., Structural equation models - PLS engineering sciences