Statistical methods contribute to scientific research by proposing statistical models that formalize in a mathematical fashion the observed phenomenons, as well as mathematical tools for assessing hypothesis in order to help taking decisions on the basis of experimental observations. The later are in nature random since they are produced by processes that cannot be considered as producing exact measurements but subject to experimental error. Statistical models take into account these experimental errors and statistical methods are aimed at estimating the sizes of these experimental errors for then assessing hypothesis by controlling these errors’ sizes.
Hence, a statistical model contains a so-called systematic part that represents in a way the expected output of the process under investigation, the one that would be obtained in the absence of experimental error, and a random part, the one that can be controlled only through the specification of a probability distribution. For example, in chips manufacturing, quality control (the process) would involve the binary measurement of default (say 0 for no default and 1 for default) on a given sample of chips. A statistical model for this process would contain a random part, the probability of default, and a systematic part linking the probability of default with manufacturing conditions such as temperature, humidity, etc, called explanatory variables. A suitable probability distribution for this example is the Bernoulli distribution and the explanatory variables enter in the model in either a linear or non linear fashion. In any case, parameters (unknown quantities) are needed to formalize the relationships between the different variables; in the linear case with one explanatory variable, one would need a parameter for the intercept and another for the slope.
Then, given a statistical model including variables and parameters, given a sample of observations for all or most of the variables, statistical methods allow to estimate the different parameters of the proposed model, check the validity of the model, test the parameters of the model, etc. The estimated models, i.e. the models with the parameters replaced by their estimated values, are then useful tools not only for understanding the phenomenon under investigation, but also for making predictions given new values for the variables or even for simplifying complex variable structures. Latent variable models are a class of complex statistical models that are widely used in the social and economics sciences. They formalize the relationships between (observed) manifest variables by means of a smaller number of (unobserved) latent variables. They are useful in situations where the aim is to measure a process that is not directly observable, but only observable from proxy variables. For example, in educational testing, a quantity of interest is for example the ability of a student, which can only be measured through scores made by the student at different tests. In marketing for example, the consumer’s appreciation of a product can only be measured by quantities that can describe the product such as sweetness, freshness, smoothness, etc. Given such measurements, latent variable models are able to reconstruct the latent variables (e.g. ability or appreciation) from the observed ones.
The power of latent variable models in unveiling unobserved quantities is possible at the cost of very computationally intensive estimation methods. This is due, among others, to the objective function to optimize in order to find parameters estimates, objective function of large dimension (large number of parameters), of highly nonlinear form, involving integrals that cannot be solved explicitly. Moreover, for proper statistical analysis, the parameters should be tested for their significance, or in other words if their introduction in the model is appropriate. This can be done using so-called resampling techniques (such as the bootstrap) which involves furthermore the estimation of several models, possibly in a parallel fashion. Finally, it is also important to check the ability of the model to fit the data at hand. This process as well involves the estimation of several competing models.