Structural equation models-PLS in engineering sciences: a brief guide for researchers through a case applied to the industry

Modeling using structural equations, is a second generation statistical data analysis technique, it has been positioned as the methodological options most used by researchers in various fields of science. The best known method is the covariance-based approach, but it presents some limitations for its application in certain cases. Another alternative method is based on the variance structure, through the analysis of partial least squares, which is an appropriate option when the research involves the use of latent variables (for example, composite indicators) prepared by the researcher, and where it is necessary to explain and predict complex models. This article presents a brief summary of the structural equation modeling technique, with an example on the relationship of constructs, sustainability and competitiveness in iron mining, and is intended to be a brief guide for future researchers in the engineering sciences.

y is the vector of p observable variables (px1) y is the matrix of coefficients that show the relationships between the latent and observed variables (pxm). Also called the charge matrix ( ).
is the error vector (px1). The second equation of the measurement model is the one that governs the relationships between the exogenous latent variables and their observable variables: (2) Where: x is the vector of p observable variables (qx1) Λx is the matrix of coefficients that show the relationships between the latent and observed variables (qxm). Also called a matrix of weights (π).
is the error vector (qx1). The structural model is defined by the equation: (3) Where: represents the vector of endogenous latent random variables of dimension mx1. represents the vector of exogenous latent random variables of dimension nx1. represents the matrix of coefficients that govern the relationships between the endogenous variables mxm. represents the matrix of coefficients that govern the exogenous relationships and each of the endogenous ones, or in other words, the effects of on . Its dimension is mxn.
represents the vector of disturbances or errors.

III.PROCEDURE TO APPLY PLS SEM
The procedure to apply PLS SEM is illustrated in Figure 2, there are six steps: 1. Specify the measurement and structural models. 2. Collect and examine the data. 3. Estimate the PLS parameters. 4. Evaluate the results of the measurement and structural models. 5. Re-specify the model. 6. Interpret the results and draw conclusions.

Fig. 2. cheme of the procedure to apply the PLS SEM.
A. Specify the measurement and structural models. The researcher applies the theoretical knowledge of the studied phenomena to the formulation of mathematical expressions related to the relationships between latent variables, and their relationships with their indicators or observable variables.

B. Collect and examine data.
The data collection and examination stage is very important in the SEM application and can avoid delay, especially when careful examination of the data manages to rid the data of outliers and identify missing data. The first step in dealing with outliers is to identify them, standard statistical software packages offer a multitude of statistical tools, which allow you to identify outliers. In general, each PLS SEM software offers ways to handle the missing data, the most common are: substituting the mean of the valid values of that indicator or eliminating the cases that include missing values.

C. Estimate parameters (PLS).
Once specified, the structural and measurement parameters of a PLS SEM model are estimated by the software (in our case the SmartPLS) iteratively using simple Ordinary Least Squares (OLS) and multiple regressions. In a much summarized way it would be the following sequence of iterations: 1. In the first iteration of PLS, you get an initial value for (adding the values y1 ..., yq This procedure continues until the difference between consecutive iterations is extremely small, according to the criterion selected by the researcher [4].
D. Evaluate the results of the measurement and structural models.
To evaluate the results, it is necessary to verify and validate the goodness of fit of the models. There is no global fit coefficient available in PLS SEM and not all measures are appropriate to assess all types of fit [5]. The validation of the SEM model through the PLS statistical tool requires a series of parameters that are estimated in two stages: the measurement model and the structural model [6], [7].
Validation of the Measurement Model. This is done with respect to the validity and reliability attributes of the model. This implies verifying: i) individual item reliability, ii) internal consistency, iii) convergent validity, and iv) discriminant validity [7], [8].
Item Reliability. The criterion for an item to be considered in the composition of the variables is that it must load at least 0.5 in the factor [9]. In this sense, it is considered that the individual reliability of the item is assessed by examining the loads ( ) or simple correlations. Another more demanding criterion to accept an indicator is that it has a load equal to or greater than 0.707 ( ², 50% of the variance is explained) [4].
Internal Consistency (Construct Reliability). The reliability of a construct makes it possible to check the internal consistency of all the indicators when measuring the concept, that is, it is evaluated how rigorously the observable variables are measuring the same latent variable (Roldán, 2004). Construct reliability can be verified using composite reliability and Cronbach's alpha. Composite reliability is a preferred alternative to Cronbach's alpha as a test of convergent validity in a reflective model; Cronbach's alpha may overestimate or underestimate the reliability of the scale. In a model suitable for exploratory purposes, the composite reliabilities should be equal to or greater than .6 [10], [11]; equal to or greater than .70 for a suitable model for confirmation purposes [12]; and equal to or greater than .80 is considered good for confirmatory research [13]. The compound reliability measure (ICC) is given by the following mathematical expression: (4) donde i = carga estandarizada del indicador i, i = error de medida del indicador i, y var( i) = 1 -²i [14]. Convergent Validation. It determines if the different items destined to measure a concept or construct really measure the same thing, then the adjustment of these items will be significant and they will be highly correlated [6]. The assessment of convergent validity is carried out by means of the measure developed by Fornell and Larcker (1981) called the mean extracted variance (Average Variance Extracted: AVE) [4]. AVE measures the amount of variance that a construct obtains from its indicators in relation to the amount of variance due to the measurement error, its formula being the following: (5) Where, i = standardized load of indicator i, i = measurement error of indicator i, and var ( i) = 1 -²i [14].
This statistic can be interpreted as a measure of construct reliability and as a measure of the evaluation of discriminant validity [15]. The mean extracted variance is recommended to be greater than 0.50, which establishes that more than 50% of the variance of the construct is due to its indicators [14].

Discriminant validity.
It indicates to what extent a given construct is different from others in a research model [6]. Therefore, establishing discriminant validity implies that a construct is unique and captures phenomena not represented by other constructs in the model [1]. For there to be discriminant validity in a construct, there must be weak correlations between it and other latent variables that measure different phenomena [7].
Traditionally, researchers have relied on two measures of discriminant validity: the Fornell-Larcker discriminant validity criterion and crossloads. According to the Fornell-Larcker criterion, for any latent variable, the square root of the AVE must be greater than its correlation with any other latent variable. In a good model, the indicators load well in their expected factors and the cross loads with other factors that they should not measure should be low, as a general rule it is understood that the expected loads should be greater than .7 (some use . Salomón et al., Productividad del proceso minero, mas allá de la producción and cross loads must be below .3 (some use .4) [5].
Validation of the Structural Model. In this phase, it must be verified whether the amount of variance of the endogenous variable is explained by the constructs that predict it. The appropriate model fit criteria are summarized in the following aspects: R-square R-squared change and the f-squared effect of exogenous factors Structural path coefficients Predictive relevance (q-square) Multicollinearity The R-square, also called the coefficient of determination, is the measure of the overall effect size for the structural model, it indicates the% of the variance in the variable that is explained by the model. The explained variance of the endogenous variables (R²) should be greater than or equal to 0.1 [16].
To assess the validity of the structural model, changes in R² can also be explored to determine whether the influence of a particular latent variable on a dependent construct has a substantive impact [10]. The importance of the effect f² can be calculated with the following expression: (6) where R²included and R²excluded represent the R² provided by the dependent latent variable when the predictor variable is used or omitted in the structural equation respectively [10]. The f² levels of 0.02, 0.15 and 0.35 are respectively a small, medium or large effect.
The structural path coefficients (loads) vary from 0 to 1, for standardized data. These loads must be significant. The significance level is determined from the Student t value derived from the resampling or bootstrapping process, which is a non-parametric technique (there are no initial parameters; it is tested if the paths between variables are feasible) [17].
The predictive relevance check is performed using a procedure called "blindfolding", to determine the Q² coefficients. This procedure omits part of the data when estimating a dependent latent variable from other independent latent variables, and then attempts to estimate those data using the previously estimated parameters. The process is repeated until each omitted data has been estimated.
The Q² (Stone-Geisser validated redundancy measures), predicts the punctual indicators in the endogenous reflexive measurement models and the constructs (the Q² does not apply to the endogenous formative constructs). This criterion refers to the fact that the model must have the ability to predict reflective indicators of endogenous latent variables [18].
For its calculation of Q², an omission distance "D" is taken that is not a divisor of the sample size. "D" corresponds to the number of cases omitted in the sample that must be estimated. Generally in existing PLS software packages, the default distance is between 5-10.
A good model demonstrates predictive relevance when Q² is greater than zero [10]. For values close to .02 it represents a "small" relevance size, .15 represents a "medium" relevance size and .35 represents a "high" relevance size [19]. The Stone-Geisser Q² measurement is based on the following parameters: Multicollinearity is a problem in the reflective or formative models, as well as the structural model, for the same reason that it is in the OLS regression models. To evaluate multicollinearity, it is performed through the variance inflation factor coefficients (VIF) and / or the tolerance is equal to 1.0 minus R². In a well-fitted model, the structural VIF coefficients should not exceed 4.0 (some use the more lenient criterion of 5.0), and tolerance <.20 indicates possible multicollinearity [1]. This is equivalent to saying that R² > .80 suggests a possible multicollinearity problem [5].
E. Re-specify the model. On rare occasions, the proposed model is the one that best fits initially in the first run, so it is very common to re-specify it, which consists of adding or removing parameters from the model. These modifications must respond to theoretical justifications, and not to desirable empirical justifications.

F. Interpret results and draw conclusions.
As a last step, the simple regression coefficients between the scores of the components of and are analyzed, where the results and statistical significance of the relationships between latent variables that make up the hypotheses are analyzed, in order to check whether they were accepted or not in the study.

A.Measurement and structural models.
To study the effect of the dimensions of sustainability on competitiveness in iron mining, the dimensions specified in the model proposed by the ICMM ("International Council on Mining and Metals") and the GRI ("Global Reporting Initiative"). This model establishes three dimensions for sustainability in economic, environmental and social terms. The economic dimension refers to the impacts of the organization on the economic conditions of its stakeholders and on economic systems at the local, national and global levels. The environmental dimension refers to the impact of the organization on natural systems, including land, air, water, and ecosystems. The Environmental Category covers impacts related to energy, water, emissions and waste. The social dimension refers to the impacts that the organization has on the social systems in which it operates [20]. In Figure 3, the indicators and dimensions of the sustainability and competitiveness of iron mining are illustrated, the variables that were commonly reported by mining companies in their sustainability reports were selected as indicators.
To define the competitiveness construct of iron mining, the variables used by prominent authors in the area of business competitiveness and mining competitiveness were taken.  Table 1 shows the variables and their units. Figure 4 shows the diagram of the measurement and structural models. The hypotheses were as follows: -H1: EconomPerf has a positive and significant effect on CompetPerf.      Figure 5, the model is presented with the values of the loads of the measurement model, the path coefficients of the structural model and the composite reliability values of the latent variables.

D.Evaluation of the results
Validity of the Measurement Model. The individual reliability of each of the items is assessed by examining the loads (λ), as can be seen in Figure 5, all the loads are greater than 0.5, which satisfies the criterion of the minimum required load value λ> 0.5. Table 3 shows the statistical significance of all loads (λ), it is observed that they are significant; therefore all items are accepted as valid.
Internal Consistency and Convergent Validity. Table 4 shows the results of the composite reliability index and the mean extracted variance (AVE) for each latent variable. The measurement model is considered to have internal consistency and convergent validity, since the composite reliabilities are greater than .80 and the AVEs are greater than 0.5.  Table 3. Statistical significance of the loads of the observable variables } Table 4. Internal consistency and convergent validity Discriminant validity. There are different criteria for determining the discriminant validity, among which are the analysis of the extracted variance (AVE) and the cross loads. Table 5 shows the correlation matrix between constructs, where the diagonal shows that the square root of the extracted variance is greater than the shared variance between constructs, therefore, according to the Fornell-Larcker criterion, can affirm that there is discriminant validity. In Appendix A1, the cross-load matrix is presented, which also confirms the discriminant validity of the measurement model.

Table 5. Discriminant validity (Fornell-Larcker criterion).
Validity of the Structural Model. For the evaluation of the structural model, the collinearity coefficients, the magnitude and statistical significance of the path coefficients, the effect sizes f² and the predictive relevance Q² were verified. The R² values <0.8 and tolerance> 0.2, shown in Table 6, indicate the absence of multicollinearity, this is also corroborated with the VIF values shown in Appendix A2, by satisfying the criterion of VIF <4.0. Table 7 shows the standardized path coefficients, the t statistics and the corresponding statistical significance, it is observed that the coefficients are significant.

Table 7. Path coefficients (standardized regression coefficients)
In Table 8, the values of f² are presented, which measure the change in R² when a certain exogenous construct is omitted from the model. As can be seen, the EconomPerf has a large effect with the CompetPerf; however, the En-vironmPerf has a medium effect with the CompetPerf, and the SocialPerf has a small effect with the CompetPerf.  Table 9, the Q squared values are presented. According to the criterion of Cohen (1988), affirming that the model has a high degree of predictive relevance with respect to the endogenous CompetPerf and SocialPerf factors. In the case of EconomPerf, it presents a medium degree of predictive relevance.  Table 7 indicate that EconomPerf -> CompetPerf and EnvironmPerf -> CompetPerf, even when they have statistical significance, have signs contrary to those postulated in hypotheses H1 and H2, which indicates that these hypotheses are rejected. On the other hand, the path SocialPerf -> CompetPerf, corresponding to hypothesis H3, has statistical significance and a positive sign, therefore this hypothesis is accepted (see Table 10).

Table 10. Summary of results
The results obtained reveal that the economic sustainability and environmental sustainability dimensions of iron mining have a negative influence on competitiveness. This result, far from being a conflict of interest between latent variables, represents the effect of the observable variables. In the case of the environmental dimension, the emission of CO2, the waste generated, the use of water and the use of energy results in a negative effect on competitiveness. In the case of the economic dimension, the operating costs involved in the acquisition of goods and services in the localities, generating indirect jobs for the mining activity, and the community investment, if it increases, benefits the community, but in turn affects the profitability by being expenditures.
It is then a reality, in which the mining companies must give responsible treatment to the socio-environmental systems affected by the operations, and where necessarily part of the income and benefits generated will necessarily have to be allocated to their remediation.

V.CONCLUSIONS
The following conclusions emerge from the research carried out: 1.It has been shown that the PLS SEM is a technique that facilitates the development of research models from theoretical concepts and latent variables, with a limited number of observations.
2.With the present case of a multidisciplinary nature of mining, industrial, environmental engineering and statistical science, applied to the mining industry, where indicators prepared from objective data of the observed reality were used, we can affirm that the PLS SEM technique, constitutes an excellent support tool for research in the field of engineering sciences.
3.The ability to model the relationships between latent variables in a flexible way and not subject to rigorous parametric assumptions of the PLS SEM, allows us to forecast for this recent technique many applications in the field of engineering sciences.
4.Finally, as future research work, there are possible applications of the PLS SEM to the study of important aspects of the industry such as: productivity, efficiency, innovation, quality, corporate social responsibility, operation of industrial plants, organizational climate, ergonomics, industrial safety, among others.

H1
EconomPerf has a positive and significant effect on the CompetPerf Rejected

H2
EnvironmPerf has a positive and significant effect on the CompetPerf. Rejected

H3
SocialPerf has a positive and significant effect on the CompetPerf Accepted