gsa-module
Package¶
gsa-module
is a Python3 package implementing several global sensitivity
analysis methods for computer/simulation experiments.
The implementation is based on a black-box approach where the computer model
(or any generic function) is externally implemented to the module itself.
The module accepts the model outputs and the design of experiment (optional,
only for certain methods) and compute the associated sensitivity measures.
The package also includes routines to generate normalized design of experiment
file to be used in the simulation experiment based on several algorithms (such
as simple random sampling or latin hypercube) as well as simple routines to
post-processed multivariate raw code output such as its maximum, minimum, or
average.
The general calculation flow chart involved in using the gsa-module
can
be seen in the figure below.

gsa-module
Documentation¶
gsa-module
Basics¶
User’s Guide¶
General Purpose Design of Experiment¶
Morris Screening Method¶
Successful installation of gsa-module
will give access to two
executables in the path useful to carry out Morris screening analysis
on model outputs:
gsa_morris_generate
: executable to generate Morris (One-at-a-Time, OAT) designgsa_morris_analyze
: executable to compute the statistics of the elementary effects given model inputs/outputs as files
A more theoretical background of the method can be found in the implementation section of this documentation.
Generating Morris Design (Sample)¶
The first step in conducting sensitivity analysis by Morris screening method is to generate One-at-a-Time design. The Morris design generator driver script can be invoked from the terminal using the following command:
> gsa_morris_generate -r <number of blocks/replications> \
-d <number of input dimensions> \
-o <design matrix output filename> \
-sep <delimiter for the output file {csv, tsv, txt}, default = csv> \
-ss <sampling scheme {trajectory or radial}> \
-p <trajectory scheme only, number of levels> \
-s <trajectory scheme only, random seed number> \
-sobol <radial scheme only, the fullpath to Sobol' sequence generator executable> \
-dirnum <radial scheme only, the fullpath to Sobol' sequence generator direction numbers file>
Brief explanation on this parameter can be shown using the following command:
> gsa_morris_generate --help
By default the naming convention of the output file (if not explicitly specified is):
<sampling scheme>_<number of replications>_<number of input dimensions>_<number of levels, trajectory only>.csv
In general, the larger the number of replications the more accurate the sensitivity measures are. On the other hand, a large number of levels (in trajectory design) increases the granularity in the input parameter space exploration. However, this only makes sense if there is large number of replications otherwise there is big risk of using unbalance design (bias in some part of input parameter space)
Example
As an example, consider the following command:
> gsa_morris_generate -r 10 -d 4 -p 6
The above command will generate a csv
file with the name of
trajectory_10_4_6.csv
containing 50 rows and 4 columns.
50 rows are obtained from the \(replicates \times (inputs+1)\) formula
and it corresponds to the number of model evaluations, while the number of columns
corresponds to the number of input dimensions.
In OAT design only one parameter is changed between perturbation and in the case
of trajectory scheme there is no base point per se as the perturbations are carried out
one parameter at a time from the last perturbed point.
In the example above the size of grid jump (\(\Delta = 2/3\)) is locked to
the number of levels.
Another example with more explicit specification of arguments:
> gsa_morris_generate -r 10 -d 6 -ss radial \
-o test_radial \
-sep txt \
-sobol ./path_to_sobol_gen/sobol_gen.x \
-dirnum ./path_to_sobol_gen/dirnum.txt
The above command will generate a space separated file test_radial.txt
containing 70 rows and 6 columns.
The radial OAT design is generated using the specified Sobol’ sequence generator.
In the radial design, multiple base points are generated for different replications.
The perturbation per parameter in each replication is relative to the base point.
Furthermore, number of level is not required to be specified as the size of grid
jump differs from parameter to parameter and from replication to replication.
Executing Model¶
Following the design philosophy of gsa-module
the model executions are
implemented outside the module itself. The most important thing to remember is that
the OAT design generated using gsa_morris_generate
is normalized (between [0,1]).
If the actual model has a different scale of parameters or different
probability distribution, the proper transformation of the design point is to
be carried out prior to the model evaluation.
Note also that the results of the execution should be saved inside a text file
with rows corresponding to the results of each model execution.
In general, the number of model evaluations, both for trajectory and radial scheme, are related to the number of replications (r) and the number of input dimensions (k):
\(n_{runs} = r \times (k + 1)\)
Analyzing the I/O of Morris Experimental Runs¶
The last step in conducting the Morris screening analysis is to compute the statistics of the elementary effects for each input. The minimum requirements for this computation are the design file and its corresponding model output. If necessary, the rescaled design file can also be specified to compute the standardized version of the elementary effects. The driver script to analyze the inputs/outputs of Morris experimental run can be invoked from the terminal using the following command:
> gsa_morris_analyze -in <the normalized inputs file> \
-ir <the rescaled inputs file> \
-o <the model/function outputs file> \
-output <the results of the analysis output file> \
-mc <Verbose model error checking> \
Brief explanation on this parameter can be shown using the following command:
> gsa_morris_analyze --help
By default, the naming convention of the results of the analysis output file is:
<normalized inputs filename>-<model outputs file>.csv
The -mc
flag is to verbosely give report on the model specification
consistency. This includes:
- Number of input dimensions in the design file
- Number of blocks/replications in the design file
- Total number of runs
- Type of design
- Number of levels and grid jump size (trajectory scheme only)
- Rescaled inputs if specified
This information (except number 6) is directly inferred from the content of the normalized design file.
Example
As an example, consider that a 4-parameter model was evaluated according to
the OAT design in the file trajectory_10_4_10.csv
.
The output of the model was saved inside a file 4paramsFunction.csv
.
To compute the statistics of the elementary effects of this I/O pair, invoke the following command:
> gsa_morris_analyze -in ./trajectory_10_4_10.csv -o ./4paramsFunction.csv -mc
The flag -mc
will result in verbose reporting of the model specification:
Number of Input Dimensions = 4
Number of Blocks/Replications = 10
Total Number of Runs = 50
Type of Design = trajectory
Number of levels (trajectory) = 10 (Delta = 0.5556)
Rescaled Inputs = None
The results of the analysis is saved inside the file
trajectory_10_4_10-4paramsFunction.csv
with the following contents:
# mu, mu_star, std_dev, std_mu, std_mu_star, std_std_dev
9.738333e+01,9.738333e+01,3.452392e+01,0.000000e+00,0.000000e+00,0.000000e+00
6.596656e+01,6.596656e+01,3.181203e+01,0.000000e+00,0.000000e+00,0.000000e+00
3.814122e+01,3.814122e+01,2.275404e+01,0.000000e+00,0.000000e+00,0.000000e+00
2.529044e+01,2.529044e+01,1.261223e+01,0.000000e+00,0.000000e+00,0.000000e+00
Each column corresponds to the appropriate sensitivity measure as indicated above. Note that the standardized version of the elementary effects are taken to be zero as the rescaled input file was not specified. The parameter is ordered according to the design matrix file (the first column is the first parameter, etc.)
Sobol’ Sensitivity Indices¶
Theory and Implementation¶
Sensitivity Analysis: Local vs. Global¶
Designs of Experiment¶
Morris Screening Method¶
Screening methods are used to rank the importance of the model parameters using a relatively small number of model executions [1]. However, they tend to simply give qualitative measures. That is, meaningful information resides in the rank itself but not in the exact importance of the parameters with respect to the output. These methods are particularly valuable in the early phase of a SA to identify the non-influential parameters of a model, which then could be safely excluded from further detailed analysis. This exclusion step is important to reduce the size of the problem especially if a more expensive method is to be applied at the next step. In this work, attention was paid to a particular screening method proposed by Morris [2].
Elementary Effects¶
Consider a model with \(K\) parameters, where \(\vec{x} = (x_1, x_2, . . ., x_K)\) is a vector of parameter value mapped onto the unit hypercube and \(y(\vec{x})\) is the model output evaluated at point \(\vec{x}\). The elementary effect of parameter k is defined as follow [2] ,
\(EE_k = \frac{Y(x_1, x_2, \ldots, x_k + \Delta, \ldots, x_K) - Y(x_1, x_2, \ldots, x_K)}{\Delta}\)
Where \(\Delta\), the grid jump, is chosen such that \(\vec{x} + \Delta\) is still in the specified domain of parameter space; \(\Delta\) is a value in \({\frac{1}{p-1}, \ldots, 1 - \frac{1}{p-1}}\) where \(p\) is the number of levels that partitions the model parameter space into a uniform grid of points at which the model can be evaluated. The grid constructs a finite distribution of size \(p^{K-1} [p - \Delta(p-1)]\) elementary effects per input parameters.
The key idea of the original Morris method is in initiating the model evaluations from various “nominal” points \(\vec{x}\) randomly selected over the grid and then gradually advancing one grid jump at a time between each model evaluation (one at a time), along a different dimension of the parameter space selected randomly.
Statistics of Elementary Effects and Sensitivity Measure¶
Consider now that an \(n_R\) number of elementary effects (or replicates) associated with the k’th parameter have been sampled from the finite distribution of \(EE_k\). The statistical summary of \(EE_k\) from the sampled trajectories can be calculated. The first is the arithmetic mean defined as,
\(\mu_k = \frac{1}{n_R} \Sigma^{n_R}_{r=1} EE^{r}_{k}\)
The second statistical summary of interest is the standard deviation of the elementary effect associated with the k’th parameter from all the trajectories,
\(\sigma_k = \sqrt{\frac{1}{n_R}\Sigma^{n_R}_{r=1} (EE^{r}_{k} - \mu_k)^2}\)
The standard deviation gives an indication of the presence of nonlinearity and/or interactions between the k’th parameter and other parameters. As a change in a parameter value might have a changing sign on the output and thus result in a cancelation effect (as can be the case for a nonmonotonic function), Campolongo et al. [4] proposed the use of the mean of the absolute elementary effect to circumvent this issue. It is defined as
\(\mu^{*}_k = \frac{1}{n_R} \Sigma^{n_R}_{r=1} |EE^{r}_{k}|\)
The three aforementioned statistical summaries, when evaluated over a large number of trajectories \(n_R\), can provide global sensitivity measures of the importance of the k’th parameter. As indicated by Morris [2] there are three possible categories of parameter importance:
- Parameters with noninfluential effects, i.e., the parameters that have relatively small values of both \(\mu_k\) (or \(\mu^{*}_k\)) and \(\sigma_k\). The small values of both indicate that the parameter has a negligible overall effect on the model output.
- Parameters with linear and/or additive effects, i.e., the parameters that have a relatively large value of \(\mu_k\) (or \(\mu^{*}_k\)) and relatively small value of \(\sigma_k\). The small value of \(\sigma_k\) and the large value of \(\mu_k\) (or \(\mu^{*}_k\)) indicate that the variation of elementary effects is small while the magnitude of the effect itself is consistently large for the perturbations in the parameter space.
- Parameters with nonlinear and/or interaction effects, i.e., the parameters that have a relatively small value of \(\mu_k\) (or \(\mu^{*}_k\)) and a relatively large value of \(\sigma_k\). Opposite to the previous case, a small value of \(\mu_k\) (or \(\mu^{*}_k\)) indicates that the aggregate effect of perturbations is seemingly small while a large value of \(\sigma_k\) indicates that the variation of the effect is large; the effect can be large or negligibly small depending on the other values of parameters at which the model is evaluated. Such large variation is a symptom of nonlinear effects and/or parameter interaction.
Such classification makes parameter importance ranking and, in turn, screening of non-influential parameters possible. However, the procedure is done rather qualitatively, and this is illustrated in the figure below, which depicts a typical parameter classification derived from visual inspection of the elementary effect statistics on the \(\sigma_k\) versus \(\mu^{*}_k\) plane.

The notions of influential and non-influential parameters are based on the relative locations of those statistics in the plane. Typically, the non-influential ones are clustered closer to the origin (relative to the more influential ones) with a pronounced boundary such as depicted in the figure. Admittedly, if these statistics are spread uniformly across the plane, the distinction would be more ambiguous (in this case, a more advanced classification such as the ones based on clustering techniques might be helpful). Furthermore, for a parameter with large \(\mu_k\) and \(\sigma_k\), the method cannot distinguish between non-linearity effects from parameter interactions on the output.
Design of Experiment for Screening Analysis¶
There are two available experimental designs for to carry out the Morris
screening method in gsa-module
: the trajectory design and radial OAT
design.
Trajectory Design (Winding Stairs)¶
Trajectory design is the original Morris implementation of the design of experiment for screening design [2]. Essentially, it is a randomized one-at-a-time design where each parameter is perturbed once, similar to that of the winding stairs design proposed by Jansen et al. [3]. The most important feature of trajectory design is that it does not return to the original base point after perturbation, but continue perturbing another dimension for the last perturbed point. This ensures more efficient parameter space exploration although requires additional user-defined parameter called level [4].
A trajectory design is defined by the number of trajectories (r), the number of levels (p), and the number of model parameters (k). Each trajectories evaluate the model (k + 1) times so the economy of it in computing the elementary effects statistics is r * (k+1) code runs.
A randomized trajectory design matrix is given by \(b^*\) ([2], [5]),
- \(b\): a strictly lower triangular matrix of 1s, with dimension of (k + 1)-by-k
- \(x^*\): Random starting point in the parameter space, with dimension of (k + 1)-by-k - each row is the same.
- \(d^*\): a k-dimensional diagonal matrix which each element is either +1 or -1 with equal probability. This matrix determines whether a parameter value will decrease or increase.
- \(p^*\): k-by-k random permutation matrix in which each row contains one element equal to 1, all others are 0, and no two columns have 1s in the same position. This matrix determines the order in which parameters are perturbed.
- \(j_k\): (k + 1)-by-k matrix of 1s
- \(\Delta\): factorial increment in a diagonal matrix of (k + 1)-by-(k + 1)
The following is an example of a trajectory design in 2-dimensional input space with 4 trajectories (or replicates). The input parameter space is uniformly divided into 6 levels. The filled circles are the random base (nominal) points from which the random perturbation of the same size (i.e., the grid jump) is carried out one-at-a-time.

Radial Design¶
Radial design is a design for screening analysis proposed in [4]. Similar to trajectory design it is based on an extension of one-at-a-time design. In the implementation of [4], Sobol’ quasi-random sequence is used as the basis. Its main advantage over the trajectory design is that the specification of input discretization level by user is no longer required. Furthermore, the grid jump will also be varying from one input dimension to another, and from replicate to replicate incorporating additional possible sources of variation in the method.
- The procedure to generate radial design of r replicates is as follow:
- Generate Sobol’ sequence with dimension (r+R, 2*k). R is the shift to avoid repetition in the sequence. The value of R is recommended to be fixed at 4 following [4], but see Choosing Shifting Value below for additional comments.
- The first half of the matrix up to the r-th row will serve as the base points: \(a_i = (x_{i,1}, x_{i,2}, \ldots x_{i,k}) \; ; i = 1,\ldots r\). The second half of the matrix, starting from the R+1-th row will serve as the auxiliary points, from which the perturbed states of the base point are created: \(b_i = (x_{R+i,k+1}, x_{R+i,k+2}, \ldots x_{R+i,2k}) \; ; i = 1,\ldots r\)
- For each row of the base points, create a set of perturbed states by substituting the value at each dimension by the value from the auxiliary points at the same dimension, one at a time. For each base point, there will be additional k perturbed points. For instance the 1st perturbed point of the i-th base point a_i is \(a^{*,1}_i = (x_{R+i,k+1}, x_{i,2}, \ldots x_{i,k})\), while the second is \(a^{*,2}_i = (x_{i,1}, x_{R+i,k+2}, \ldots x_{i,k})\). In general the j-th perturbed point of the i-th base point is \(a^{*,j}_i = (x_{i,1}, \ldots x_{R+i,k+j}, \ldots x_{i,k})\).
- A single elementary effect for each input dimension can be computed on the basis of function evaluations at k+1 points: 1 base point and k perturbed points.
- Repeat the process until the requested r replications have been constructed.
An illustration of radial OAT design generation based on Sobol’ sequence can be seen in the figure below.

As such the radial design has the same economy as the trajectory design, that is r * (k+1) computations for a k-dimensional model with r replications. The computation of the elementary effect \(EE_i\), however, is slightly different due to the fact that now the grid jump differs for each input dimension at each replication.
- \(y(a^{*,j}_i)\): function value at j-th perturbed point of the i-th replicate.
- \(y(a_i)\): function value at the base point of the i-th replicate.
- \(x_{R+i,k+j}\): the perturbed input at dimension j of the i-th replicate.
- \(x_{i,j}\): the base input at dimension j of the i-th replicate.
As can be seen the average over many replications of the elementary effect defined above will automatically yield \(\mu^*\).
The following is an example of a radial design in 2-dimensional input space with 4 base points (filled circles), located not necessarily in a specific grid. The perturbations are carried out from these base points (crosses). The size of the perturbation differs from input dimension to input dimension and from replicate to replicate.

Choosing Shifting Value¶
As mentioned the recommendation given by [4] for the value of R is 4. This value reflects the fact that the a sample of Sobol’ sequence across dimension tends to repeat values, especially in the first several rows. For example, the first two Sobol’ samples used here have the values of 0.0 and 0.5 in all of the dimensions. If such repetition in value happened one or more rows in the \(\Delta\) matrix will be zero (so is the \(\Delta Y\) vector), and cause the system of linear equation to be under-determined.
But except for the obvious repetitions of values in different dimensions in the first several samples any other repetitions cannot be excluded to reoccur down the line of samples. As such the value of R has to be picked carefully and from our experience this value is highly dependent on the number of samples and/or dimension. Yet, the of the main points of using radial design in the first place was to avoid specifying the number of levels p. Choosing R for different number of samples and/or dimensions definitely defeat the purpose of using radial design.
A pragmatic solution for this problem, which is adopted here, is to check whether a given auxiliary point has the same value with the base point in one or more dimension, every time a block of one-at-a-time design is generated. If it has then use the next auxiliary point instead. Finally, to replace the missing auxiliary point, an additional point is generated using the Sobol’ sequence.
Miscellaneous Topics¶
Computation of the Elementary Effect¶
In gsa-module
, computing the elementary effect for each replications is
achieved by using matrix algebra, which is similar to the implementation in
[6]. There is slight difference between the computation of elementary effects
for trajectory design and radial design.
The following figure illustrate the computation of all the elementary effects
of a single replicate for 3-parameter model using trajectory design with
4 levels.

The following figure illustrate the same computation of a single replicate for 3-parameter model using radial design (no number of levels specification needed).

The statistics of the elementary effects are eventually computed after the same procedure are repeated for many replications.
Presenting the Results of the Analysis¶
Standardized Elementary Effect¶
In the original implementation of Morris method [2], the input parameter is normalized, that is all the parameters values lie between 0, 1. Furthermore, following the suggestion by Saltelli et al. [5], the grid jump size is kept constant for a given number of levels for all parameters. As such, the method is prone to misrank the important parameters if there is a vast difference in the original scale of various parameters (e.g., [0,1] in one parameter, [10,100] in another, etc.). The normalized scale of [0,1] would then be biased to the parameter who has the largest scale of variation. To compare the elementary effect in a common ground taking into account the original scale of variation for each parameter, it is advised in [7] to scale the elementary effect with the standard deviation of the input \(\sigma_{x_i}\) and of the output \(\sigma_y\),
In gsa-module
, the standardized elementary effect is automatically computed
if the rescaled input parameters values are specified. It is used to compute
the standard deviation for each of the parameters taking into account the
original scale of variation of each.
Optimized Trajectory Design¶
References¶
[1] | A. Saltelli et al., “Sensitivity Analysis in Practice: A Guide to Assessing Scientific Models,” John Wiley & Sons, Ltd. United Kingdom (2004). |
[2] | (1, 2, 3, 4, 5, 6) Max D. Morris, “Factorial Sampling Plans for Preliminary Computational Experiments”, Technometrics, Vol. 33, No. 2, pp. 161-174, 1991. |
[3] | Michiel J.W. Jansen, Walter A.H. Rossing, and Richard A. Daamen, “Monte Carlo Estimation of Uncertainty Contributions from Several Independent Multivariate Sources,” in Predictability and Nonlinear Modelling in Natural Sciences and Economics, Dordrecht, Germany, Kluwer Publishing, 1994, pp. 334 - 343. |
[4] | (1, 2, 3, 4, 5, 6) F. Campolongo, A. Saltelli, and J. Cariboni, “From Screening to Quantitative Sensitivity Analysis. A Unified Approach,” Computer Physics Communications, Vol. 192, pp. 978 - 988, 2011. |
[5] | (1, 2) A. Saltelli et al., “Global Sensitivity Analysis. The Primer,” West Sussex, John Wiley & Sons, 2008, pp. 114 |
[6] | Jon D. Herman, SALib [Source Code], March 2014, https://github.com/jdherman/SALib |
[7] | G. Sin and K. V. Gernaey, “Improving the Morris Method for Sensitivity Analysis by Scaling the Elementary Effects,” in Proc. 19th European Symposium on Computer Aided Process Engineering, 2009 |
Sobol’ Sensitivity Indices¶
Variance-based methods for global sensitivity analysis use variance as the basis to define a measure of input parameter influence on the overall output variation [1]. In a statistical framework of sensitivity and uncertainty analysis, this choice is natural because variance (or standard deviation, a related concept) is often used as a measure of dispersion or variability in the model prediction [2]. A measure of such dispersion, in turn, indicates the precision of the prediction due to variations in the input parameters.
This section of the documentation discusses the origin of Sobol’ sensitivity indices and the method to estimate them by Monte Carlo simulation.
High-Dimensional Model Representation¶
Consider once more a mathematical model \(f: \mathbf{x} \in [0,1]^D \mapsto y = f(\mathbf{x}) \in \mathbb{R}\). The high-dimensional model representation (HDMR) of \(f(\mathbf{x})\) is a linear combination of functions with increasing dimensionality [3],
where \(f_o\) is a constant.
The representation in the above equation is unique given the following condition [4]:
Assume now that \(\mathbf{X}\) is a random vector of independent and uniform random variables over a unit hypercube \(\{\Omega = \mathbf{x} | 0 \leq x_i \leq 1; i = 1,\cdots D\}\) such that
where \(Y\) is a random variable, resulting from the transformation of random vector \(\mathbf{X}\) by function \(f\). Using Eq. (2) to express each term in Eq. (1), it follows that
The same follows for higher-order terms in the decomposition. In Eq. (3), \(\mathbb{E}_{\sim e} [Y|X_e]\) corresponds to the conditional expectation operator, and the \(\sim\circ\) in the subscript means that the integration over the parameter space is carried out over all parameters except the specified parameter in the subscript. For instance, \(\mathbb{E}_{\sim 1} [Y|X_1]\) refers to the conditional mean of \(Y\) given \(X_1\), and the integration is carried out for all possible values of parameters in \(\mathbf{x}\) except \(x_1\). Note that because \(X_1\) is a random variable, the expectation conditioned on it is also a random variable.
Assuming that \(f\) is a square integrable function, applying the variance operator on \(Y\) results in
Sobol’ Sensitivity Indices¶
Division by \(\mathbb{V}[Y]\) aptly normalizes Eq. (4)
The Sobol’ main-effect sensitivity index \(S_d\) is defined as,
The numerator is the variance of the conditional expectation, and the index is a global sensitivity measure interpreted as the amount of variance reduction in the model output if the parameter \(X_d\) is fixed (i.e., its variance is reduced to zero).
A closely related sensitivity index proposed by Homma and Saltelli [5] is the Sobol’ total-effect index defined as,
The index, also a global sensitivity measure, can be interpreted as the amount of variance left in the output if the values of all input parameters, except \(x_d\), can be fixed.
These two sensitivity measures can be related to the objectives of global SA for model assessment as proposed by Saltelli et al. ([2] [6]). The main-effect index is relevant to parameter prioritization in the context of identifying the most influential parameter since fixing a parameter with the highest index value would, on average, lead to the greatest reduction in the output variation.
The total-effect index, on the other hand, is relevant to parameter fixing (or screening) in the context of identifying the least influential set of parameters since fixing any parameter that has a very small total-effect index value would not lead to significant reduction in the output variation. The use of total-effect index to identify which parameter can be fixed or excluded is similar to that of the elementary effect statistics of the Morris method, albeit more exact but also more computationally expensive to compute. And finally, the difference between the two indices of a given parameter (Eqs. (6) and (5)) is used to quantify the amount of all interactions involving that parameters in the model output.
The Sobol’-Saltelli Method¶
Monte Carlo Integration¶
In principle, the estimation of the Sobol’ indices defined by Eqs. (5) and (6) can be directly carried out using Monte Carlo (MC) simulation. The most straightforward, though rather naive, implementation of MC simulation to conduct the estimation is using two nested loops for the computation of the conditional variance and expectation appeared in both equations.
In the estimation of the main-effect index of parameter \(x_d\), for instance, the outer loop samples values of \(X_d\) while the inner loop samples values of \(\mathbf{X}_{\sim d}\) (anything else other than \(x_d\)). These samples, in turn, are used to evaluate the model output. In the inner loop, the mean of the model output (for a given value of \(X_d\) but over many values of \(\mathbf{X}_{\sim d}\)) is taken. Afterward, in the outer loop, the variance of the model output (over many values of \(X_d\)) is taken. This approach can easily become prohibitively expensive as the nested structure requires two \(N^2\) model evaluations per input dimension for either the main-effect and total-effect indices, while \(N\) (the size of MC samples) are typically in the range of \(10^2 - 10^4\) for a reliable estimate.
Sobol’ [4] and Saltelli [7] proposed an alternative approach that circumvent the nested structure of MC simulation to estimate the indices. The formulation starts by expressing the expectation and variance operators in their integral form. As the following formulation is defined on a unit hypercube of \(D\)-dimension parameter space where each parameter is a uniform and independent random variable, explicit writing of the distribution within the integration as well as the integration range are excluded for conciseness.
First, the variance operator shown in the numerator of Eq. (5) is written as
The notation \(\mathbb{E}_{\sim \circ}[\circ | \circ]\) was already explained in the previous subsection, while \(\mathbb{E}_{\circ} [\circ]\) corresponds to the marginal expectation operator where the integration is carried out over the range of parameters specified in the subscript.
Next, consider the term conditional expectation shown in Eq. (7), which per definition reads
Note that \(\mathbf{x} = \{\mathbf{x}_{\sim d}, x_d\}\).
Following the first term of Eq. ss_variance_integral, by squaring Eq. :eq:`ss_expectation_integral
and by defining a dummy vector variable :math:
mathbf{x}^{prime}_{sim d}`,
the product of the two integrals can be written in terms of a single multiple integrals
Returning to the full definition of variance of conditional expectation in Eq. (7),
Finally, the main-effect sensitivity index can be written as an integral as follows:
The integral form given above dispenses with the nested structure of multiple integrals in the original definition of main-effect index. The multidimensional integration is over \(2 \times D - 1\) dimensions and it is the basis of estimating sensitivity index using MC simulation in this implementation, hereinafter referred to as the Sobol’-Saltelli method. The same procedure applies to derive the total effect-index which yields,
For \(N\) number of MC samples and \(D\) number of model parameters, MC simulation procedure to estimate the sensitivity indices follows the sampling and resampling approach adopted in [4], [5], [7] described in the following.
Procedures¶
First, generate two \(N \times D\) independent random samples from a uniform independent distribution in \(D\)-dimension, \([0,1]^D\):
Second, construct \(D\) additional design of experiment matrices where each matrix is matrix \(A\) with the \(d\)-th column substituted by the \(d\)-th column of matrix :math:`B:
Third, rescale each element in the matrices of samples to the actual values of model parameters according to their actual range of variation through iso-probabilistic transformation.
Fourth, evaluate the model multiple times using input vectors that correspond to each row of \(A\), \(B\), and all the \(A_B^d\).
Fifth and finally, extract the quantities of interest (QoIs) from all the outputs and recast them as vectors. The main-effect and total-effect indices are then estimated using the estimators described below.
Monte Carlo Estimators¶
For the main-effect sensitivity index, two estimators are considered. One is proposed by Saltelli [7], and the other, as an alternative, is proposed by Janon et al. [8]. The latter proved to be more efficient, especially for a large variation around a parameter estimate [8].
The first term in the numerator of Eq. (11) is the same for both estimators and is given by
where the subscript \(n\) corresponds to the row of the sampled model parameters such that \(f(B)_n\) is the model output evaluated using inputs taken from the \(n\)-th row of matrix \(B\) and \(f(A_B^d)_n\) is the model output evaluated using inputs taken from the \(n\)-th row of matrix \(A_B^K\). The MC estimator for the second term in the numerator and for the denominator differ for the two considered estimators given in Table below.
Estimator | \(\mathbb{E}^2[Y] = \left( \int f d\mathbf{x}\right)^2\) | \(\mathbb{V}[Y] = \int f^2 d\mathbf{x} - \left( \int f d\mathbf{x}\right)^2\) |
---|---|---|
Saltelli [7] | \(\frac{1}{N} \sum f(A)_n \cdot f(B)_n\) | \(\frac{1}{N}\sum f(A)_n^2-\left(\frac{1}{N}\sum f(A)_n\right)^2\) |
Janon et al. [8] | \(\left(\frac{1}{N} \sum \frac{f(B)_n + f(A_B^d)_n}{2}\right)^2\) | \(\frac{1}{N} \sum \frac{f(B)_n^2 + f(A_B^d)_n^2}{2}\quad - \left(\frac{1}{N} \sum \frac{f(B)_n^2 + f(A_B^d)_n^2}{2}\right)^2\) |
The general formula of the main-effect sensitivity index estimator is
where and \(\mathbb{E}^2[Y]\) and \(\mathbb{V}[Y]\) are as prescribed in the table above.
The general formula of the total-effect sensitivity indices follows the definition of it as given in (6). The denominator \(\mathbb{E}_{\sim d}[\mathbb{V}_{d}[Y|\mathbf{X}_{\sim d}]]\) is estimated using the estimators listed in the table below, while \(\mathbb{V}[Y]\) is estimated by the Saltelli et al. estimator in table above.
Estimator | \(\mathbb{E}_{\sim d}[\mathbb{V}_{d}[Y|\mathbf{X}_{\sim d}]]\) |
---|---|
Sobol-Homma [5] | \(\frac{1}{N} \sum f^2(A)_n \cdot f(A)_n f(AB^d)_n\) |
Jansen [10] | \(\frac{1}{2N} \sum \left(f(A)_n - f(AB^d)_n\right)^2\) |
The computational cost associated with the estimation of all the main-effect and total-effect indices is \(N \times (D + 2)\) code runs, where \(N\) is the number of MC samples and \(D\) is the number of parameters. Compare this to the cost of brute force Monte Carlo of \(2 \times D \times N^2\) code runs to estimate all the main-effect and total-effect sensitivity indices.
As an additional comparison, the cost for Morris method to compute the statistics of elementary effect is \(N_R \times (D + 1)\) code runs, where :math`N_R` is the number of OAT design replications. In either methods, the number of samples :math`N` (in the case of the Sobol’-Saltelli method) and replications \(N_R\) (in the case of the Morris method) determines the precision of the estimates. A larger number of samples (and replications) increases the precision. Note, however, that in practice the typical number of Morris replications is between \(10^1 - 10^2\) [10], while the number of gls{mc} samples for the Sobol’ indices estimation amounts to \(10^2 - 10^4\) [4].
References¶
[1] | Dan G. Cacuci and Mihaela Ionescu-Bujor, “A Comparative Review of Sensitivity and Uncertainty Analysis of Large-Scale Systems - II: Statistical Methods,” Nuclear Science and Engineering, vol. 147, no. 3, pp. 204-217, 2004. |
[2] | (1, 2) A. Saltelli et al., “Global Sensitivity Analysis. The Primer,” West Sussex, John Wiley & Sons, 2008. |
[3] | Genyuan Li, Carey Rosenthal, and Herschel Rabitz, “High Dimensional Model Representations,” The Journal of Physical Chemistry A, vol. 105, no. 33, pp. 7765-7777, 2001. |
[4] | (1, 2, 3, 4) I. M. Sobol, “Global Sensitivity Analysis for nonlinear mathematical models and their Monte Carlo estimates,” Mathematics and Computers in Simulation, vol. 55, no. 1-3, pp. 271-280, 2001. |
[5] | (1, 2, 3) Toshimitsu Homma and Andrea Saltelli, “Importance Measures in Global Sensitivity Analysis of Nonlinear Models,” Reliability Engineering and System Safety, vol. 52, no. 1, pp. 1-17, 1996. |
[6] | A. Saltelli et al., “Sensitivity Analysis in Practice: a Guide to Assessing Scientific Models,” West Sussex, John Wiley & Sons, 2004. |
[7] | (1, 2, 3, 4) A. Saltelli, “Making best use of model evaluations to compute sensitivity indices,” Computer Physics Communications, vol. 145, no. 2, pp. 280-297, 2002. |
[8] | (1, 2, 3) A. Janon et al., “Asymptotic normality and efficiency of two Sobol’ index estimators,” ESAIM: Probability and Statistics, vol. 18, pp. 342-364, 2014. |
[9] | M. J. W. Jansen, “Analysis of variance designs for model output,” Computer Physics Communications, vol. 117, pp. 35-43, 1999. |
[10] | (1, 2) F. Campolongo, A. Saltelli, and J. Cariboni, “From Screening to Quantitative Sensitivity Analysis. A Unified Approach,” Computer Physic Communications, vol. 182, no. 4, pp. 978-988, 2011. |
Developer’s Guide¶
About gsa-module
¶
Contributors¶
License¶
gsa-module is licensed to you under the MIT License
Copyright (c) [2016] [Damar Wicaksono]
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Funding¶
Gallery of Applications to Test Functions¶
Ishigami Function¶
Ishigami function is a 3-dimensional function introduced by Ishigami and Homma [1],
the parameters a and b can be adjusted but have default values of 7 and 0.1, respectively.
Analytical Solution¶
The analytical formulas for the variance terms of the Ishigami function for \(\mathbf{X}_d \sim \mathcal{U}[-\pi,\pi]; \, d = 1, 2, 3\) and the given parameter \(a\) and \(b\) are the following
Marginal Variance
Top Marginal Variance
Bottom Marginal Variance
The analytical main- and total-effect sensitivity indices can be computed using their respective definition in relation to the variance terms given above.
Morris Screening Results¶
The function was used to test the implementation of the Morris screening and most precisely that of the two designs of experiment: the trajectory and radial designs (see Morris Screening Method).
Trajectory sampling design¶
The trajectory effect is the original design proposed by Morris. The design matrix was generated with:
- number of trajectories (r) equal to 10, 100 and 1000 times the number of parameter (k=3),
- and levels (p) equal to 4, 8, 12 and 20.
Each generated design was used to evaluate the Morris modified function and the associated elementary effects were calculated.
The following figures show the \(\sigma\) vs. \(\mu^*\) plot for the four parameters of the Ishigami function for different sets of r and p values. Each set of (r, p) value was repeated 1000 times and a histogram of the results is presented for each parameter in the figures.




Countrary to the Modified Morris Function, the value of the elementary design with a level p=4 is quite different that the values obtained with the other level values. This remains true for a very large number of trajectories (i.e. 3000). The predictions with a level of 8 or higher are consistent.
We also observe that a number of trajectories equals to 10 times the number of parameter (i.e. 30) is not sufficient to entirely classified the parameters (as the histograms of parameters 0 and 2 overlap). A minimum number of trajectories of about 100 times the number of parameter is necessary for the Ishigami function to separate the parameters.
Radial sampling design¶
The radial sampling design has been proposed by Campagnolo et al. and is described in more details at Morris Screening Method. Only a number of trajectories, here called blocks to differentiate from the previous design, is required. For testing purposes we investigated, as previously, numbers of blocks (r) equal to 10, 100 and 1000 times the number of parameter (k=3).
Each generated design was used to evaluate the Ishigami function and the associated elementary effects were calculated.
The following figures show the \(\sigma\) vs. \(\mu^*\) plot for the three parameters of the Ishigami function and for the three sets of r values. Countrary to the trajectory design, the radial design uses the Sobol generator, which is deterministic. As such no repetitions were performed to investigate the dispersion of the (\(\sigma\), \(\mu^*\)) values.



The values are found to be quite stable, even for a block value of r=30. They differ, however, significantly from that obtained with the trajectory design (Section Trajectory sampling design).
The exact value for the elementary effects were not found in litterature. However, the sensitivity indices are \(S_1=0.3138\), \(S_2=0.4424\) and \(S_3=0\) [1]. Because \(\mu^*\) and S quantify the same information, we expect them to be ordered in the same way. Therefore the results obtained with the radial sampling design appear preferable.
Sobol Sensitivity Indices Results¶
The function was used to test the implementation of the Sobol sensitivity indices. The main-effect (first order) and total-effect (total order) sensitivity indices are both computed. Both the sampling scheme type and the estimator for the sensitivity indices were tested. The tested sampling schemes are simple random sampling (srs), latin-hypercube sampling (lhs) and the sobol sampling (sobol). The tested estimators are janon and saltelli for the main-effect SI and jansen and sobol for the total-effect SI (see Sobol’ Sensitivity Indices).
The following figure shows the convergence of the main-effect SI (first order) with the number of samples for the first parameter (param0) of the Ishigami’s function. Each panel shows the janon and saltelli estimators, with their \(1-\sigma\) uncertainties, for a given sampling scheme. The dotted red line is the analytical solution (i.e. the target value).

A similar figure is shown below for the total-effect SI (tot-order) for the jansen and sobol estimators.

All estimators for the main- and total-effect SI converge to the analytical solution with a sufficient number of samples (i.e. \(10^4\) in the worst case). As expected the sobol and lhs sampling schemes for the design matrix are clearly superior to the simple random scheme (srs) as the calculated main- and total-effect SI converge faster and with a lower uncertainies; the sobol sampling scheme appears to be only slightly better than lhs. Finally, comparing the estimators the janon and jansen estimators show slightly better properties than the saltelli and sobol estimators. These conclusions remain the same for all input parameter of the Ishigami’s function.
From a practical point of view, we advise to use the sobol or lhs sampling scheme with at least 1000 points. The estimator does not play a significant role.
Modified Morris Function¶
A modified version of test function appeared in Morris’ original article [1] is used as a test function in this module. Instead of 20-dimensional function, the modified version is only 4-dimensional and truncated as follows,
The function accepts inputs \(\underline x\) the component of which are :math: 0 < x_i leq 1.
Morris Screening Results¶
The function was used to test the implementation of the Morris screening and most precisely that of the two designs of experiment: the trajectory and radial designs Morris Screening Method.
Trajectory sampling design¶
The trajectory effect is the original design proposed by Morris. The design matrix was generated with:
- number of trajectories (r) equal to 10, 100 and 1000 times the number of parameter (k=4),
- and levels (p) equal to 4, 8, 12 and 20.
Each generated design was used to evaluate the Morris modified function and the associated elementary effects were calculated.
The following figures show the \(\sigma\) vs. \(\mu^*\) plot for the four parameters of the Morris modified function for different sets of r and p values. Each set of (r, p) value was repeated 1000 times and a histogram of the results is presented for each parameter in the figures.




The elementary effects of the function are similar for all combinations of r and p values and consistent with the expected values. As expected, the histogram dispersion is reduced when the number of trajectories increases. The classification of the parameters is stable already for an number of trajectories equals to 10 times the nubmer of dimension.
Radial sampling design¶
The radial sampling design has been proposed by Campagnolo et al. and is described in more details at Morris Screening Method. Only a number of trajectories, here called blocks to differentiate from the previous design, is required. For testing purposes we investigated, as previously, numbers of blocks (r) equal to 10, 100 and 1000 times the number of parameter (k=4).
Each generated design was used to evaluate the Morris modified function and the associated elementary effects were calculated.
The following figures show the \(\sigma\) vs. \(\mu^*\) plot for the four parameters of the Morris modified function and for the three sets of r values. Countrary to the trajectory design, the radial design uses the Sobol generator, which is deterministic. As such no repetitions were performed to investigate the dispersion of the (\(\sigma\), \(\mu^*\)) values.



The values are found to be quite stable, even for a block value of r=40 and are in agreement with that obtained with the trajectory design (Section Trajectory sampling design). A better quanitification of the convergence to the reference values of \(\sigma\), \(\mu^*\) is shown in the table below. The only possible exception is for \(\sigma\) of parameter 1 (\(x_2\)).
Parameter | Statistics | Ref | r=40 | r=400 | r=4000 |
---|---|---|---|---|---|
Param0 | \(\mu^*\) | 90.05 | 87.11 | 89.97 | 90.02 |
Param0 | \(\sigma\) | 31.08 | 31.99 | 31.05 | 31.09 |
Param1 | \(\mu^*\) | 71.05 | 69.36 | 71.11 | 71.06 |
Param1 | \(\sigma\) | 28.86 | 24.10 | 27.09 | 26.18 |
Param2 | \(\mu^*\) | 41.47 | 40.50 | 41.38 | 41.47 |
Param2 | \(\sigma\) | 17.75 | 17.31 | 17.33 | 17.33 |
Param3 | \(\mu^*\) | 20.83 | 20.19 | 20.76 | 20.82 |
Param3 | \(\sigma\) | 11.54 | 11.53 | 11.55 | 11.55 |
Sobol-\(G^*\) Function¶
Sobol-\(G^*\) is a modified version of Sobol-G function proposed in [1] initially to avoid excluding singular value at \(\mathbf{x}=\{0.5\}\). The function reads,
where \(\mathbf{x} \in [0,1]^D\); \(a_d \in \mathbb{R}^+\) determines the first-order importance of parameter-\(d\); \(\delta \in [0,1]^D\) is the shift parameter; \(\boldsymbol{\alpha} \in \mathbb{R}^{D+}\) is the curvature parameter; and \(I[\circ]\) is the integer part of the number.
The parameters \(\mathbf{a}\) and \(\mathbf{\alpha}\) can be adjusted. The particular test function used in this module is \(G_2^*\) (see pp. 265 in [1]) where
Analytical Solution¶
The analytical formulas for the variance terms of the Sobol-\(G^*\) function for \(\mathbf{X}_d \sim \mathcal{U}[0,1]; \, d = 1, 2, \cdots, 10\) are the following
Marginal Variance
where \(V_d\) is the top marginal variance given below,
Top Marginal Variance
Bottom Marginal Variance
The analytical main- and total-effect sensitivity indices can be computed using their respective definition in relation to the variance terms given above.
Sobol Sensitivity Indices Results¶
The function was used to test the implementation of the Sobol sensitivity indices. The main-effect (first order) and total-effect (total order) sensitivity indices are both computed. Both the sampling scheme type and the estimator for the sensitivity indices were tested. The tested sampling schemes are simple random sampling (srs), latin-hypercube sampling (lhs) and the sobol sampling (sobol). The tested estimators are janon and saltelli for the main-effect SI and jansen and sobol for the total-effect SI (see Sobol’ Sensitivity Indices).
The following figure shows the convergence of the main-effect SI (first order) with the number of samples for the second parameter (param1) of the \(G_2\) function. Each panel shows the janon and saltelli estimators, with their \(1-\sigma\) uncertainties, for a given sampling scheme. The dotted red line is the analytical solution (i.e. the target value).

A similar figure is shown below for the total-effect SI (tot-order) for the jansen and sobol estimators.

All estimators for the main- and total-effect SI converge to the analytical solution with a sufficient number of samples (i.e. \(10^4\) in the worst case). As expected the sobol and lhs sampling schemes for the design matrix are clearly superior to the simple random scheme (srs) as the calculated main- and total-effect SI converge faster and with a lower uncertainies; the sobol sampling scheme appears to be slightly better than lhs. Finally, comparing the estimators the janon and jansen estimators show slightly better properties than the saltelli and sobol estimators. The better properties of the estimators and sampling schemes illustrated for param1 can vary slightly from parameter to parameter. The convergence of the calculation, however, remains. This confirms the good implementation of the SI calculation routines.
gsa-module
Modules reference documentation¶
Samples Package¶
Routines to generate generic purpose design of experiment, including simple
random sampling (srs
), latin hypercube (lhs
), optimized latin hypercube
(lhs-opt
), and Sobol’ sequence (sobol
).
gsa_module.samples.srs
¶
gsa_module.samples.lhs
¶
gsa_module.samples.lhs_opt
¶
gsa_module.samples.sobol
¶
Randomization of the quasi-MC samples can be achieved in the easiest manner by random shift (or the Cranley-Patterson rotation).