cube — Uniform sampling from the unit hypercube¶
Functions for (super-uniform) sampling from the unit hypercube.
Sampling algorithms¶
Stratified sampling¶
-
diversipy.cube.stratify_conventional(num_strata, dimension)¶ Stratification of the unit hypercube.
This algorithm divides the hypercube into num_points subcells and draws a random uniform point from each cell. Thus, the result is stochastic, but more uniform than a random uniform sample. For further information see [McKay1979].
Parameters: - num_strata (int) – The number of strata to generate.
num_strata ** (1/dimension)must be integer. - dimension (int) – The dimension of the space.
Returns: strata – As the strata are axis-aligned boxes in this case, each tuple in the returned list contains the lower and upper corner of a stratum.
Return type: list of tuple
- num_strata (int) – The number of strata to generate.
-
diversipy.cube.stratify_generalized(num_strata, dimension, cuboid=None, detect_special_case=True, avoid_odd_numbers=True)¶ Generalized stratification of the unit hypercube.
The adjective “generalized” pertains to the fact that the number of strata can be chosen arbitrarily, which is not possible with
stratify_conventional(). It is guaranteed that all strata have volumevolume(cuboid) / num_strata, apart from rounding error.Parameters: - num_strata (int) – The number of strata to generate. An arbitrary number of points is possible for this algorithm.
- dimension (int) – The dimension of the search space.
- cuboid (tuple of list, optional) – Optionally specify the hypercube to be sampled in. If None, the unit hypercube is chosen.
- detect_special_case (bool, optional) – If True,
num_points ** (1/dimension)is integer, and we are sampling the unit cube, use the original stratified sampling instratify_conventional(). - avoid_odd_numbers (bool, optional) – If this value is True, splits are chosen so that the resulting numbers are even, whenever possible. E.g., if a stratum with six points is splitted, it is not split into three and three, but two and four points. For more information on this option, see [Wessing2018].
Returns: strata – As the strata are axis-aligned boxes in this case, each tuple in the returned list contains the lower and upper corner of a stratum.
Return type: list of tuple
References
[Wessing2018] (1, 2) Wessing, Simon (2018). Experimental Analysis of a Generalized Stratified Sampling Algorithm for Hypercubes. arXiv eprint 1705.03809. https://arxiv.org/abs/1705.03809
-
diversipy.cube.sample_from_strata(strata, bates_param=1, latin='none', matching_init='approx', full_output=False)¶ Stratified sampling with given strata.
Parameters: - strata (list of tuple) – The strata to be sampled. Each tuple in the list must contain the lower and upper corner of a stratum.
- bates_param (int, optional) – Each coordinate of a point sampled in a stratum is determined as the mean of this number of independent random uniform variables. Thus, the coordinates follow the Bates distribution.
- latin (str, optional) – Indicates if and how the point set should be latinized. “none” is the fastest option, sampling the points in linear time without latinization. “approx” uses a heuristic with runtime \(O(nN \log N)\) that may produce a slightly imperfect latinization. “matching” produces an error-free latinization using a maximum cardinality matching algorithm with runtime \(O(nN^2)\). The union of the strata must be the unit hypercube for the second and third option to work.
- matching_init (str, optional) – This is an additional option to the setting
latin == "matching". Firstly, the approximative latinization fromlatin == "approx"can be used as initialization for the matching algorithm, with the matching acting as a repair method if necessary. This is the recommended choice due to its favorable runtime behavior. The other two options use a problem-agnostic greedy initialization. “greedy-rand” randomly shuffles the edges in the bipartite graph data structure. This order indirectly influences the distribution of the point sample, via the deterministic matching algorithm. “greedy-det” is the deterministic variant without extras, which may produce patterns due to the deterministic nature of the algorithm. - full_output (bool, optional) – Indicates if the indices of points with latin hypercube violations are returned in case of latinized sampling.
Returns: - points (numpy array) – The sampled points, in corresponding order to strata.
- error_indices (set) – Indices of points with violations of the latin hypercube property
in some dimension. Can only be non-empty for
latin == "approx".
-
diversipy.cube.strata_from_points(points, cuboid=None)¶ Partitions the cuboid so that each point has its own hyperbox.
This partitioning is stochastic (ties are broken randomly). The obtained strata will have different volumes. This function can be used to calculate an upper bound for the covering radius of arbitrary point sets via
covering_radius_upper_bound(). The idea for this approach was introduced in [Wessing2018].Parameters: - points (sequence of sequence) – The sampled points.
- cuboid (tuple of list, optional) – Optionally specify the outer bounds of the partitioning. If None, the unit hypercube is chosen.
Returns: strata – As the strata are axis-aligned boxes in this case, each tuple in the returned list contains the lower and upper corner of a stratum. The order corresponds to the order of points.
Return type: list of tuple
Designs¶
-
diversipy.cube.latin_design(num_points, dimension)¶ Generate a random latin hypercube design matrix.
Latin hypercube designs sometimes give an advantage over random uniform samples due to their perfect uniformity of one-dimensional projections. For further information see [McKay1979]. This algorithm has linear run time.
Parameters: - num_points (int) – The number of points to generate.
- dimension (int) – The dimension of the space.
Returns: design – Matrix with integers corresponding to the bins of a virtual grid. Each column consists of a permutation of {0, …, num_points - 1}.
Return type: (num_points, dimension) numpy array
References
[McKay1979] (1, 2, 3) McKay, M.D.; Beckman, R.J.; Conover, W.J. (May 1979). A Comparison of Three Methods for Selecting Values of Input Variables in the Analysis of Output from a Computer Code. Technometrics 21(2): 239-245.
-
diversipy.cube.improved_latin_design(num_points, dimension, num_candidates=100, target_value=None, dist_args={'max_dist': 1, 'norm': 1})¶ Generate an ‘improved’ latin hypercube design matrix.
This implementation uses an algorithm with quadratic run time. It is a greedy construction heuristic starting with a randomly chosen point. In each iteration, a number of random candidates is evaluated by a criterion that considers a candidate’s distance to the previously chosen points. The best point according to this criterion is included in the LHD. The concept originally stems from [Beachkofski2002]. The algorithm implemented here was proposed in [Wessing2015].
Parameters: - num_points (int) – The number of points to generate.
- dimension (int) – The dimension of the space.
- num_candidates (int, optional) – The number of random candidates considered for every point to be added to the LHD.
- target_value (float, optional) – The distance a candidate should ideally have to the already chosen points of the LHD.
- dist_args (dict, optional) – Defines the distance measure. Default is Manhattan distance on a torus.
Returns: design – Matrix with integers corresponding to the bins of a virtual grid. Each column consists of a permutation of {0, …, num_points - 1}.
Return type: (num_points, dimension) numpy array
References
[Beachkofski2002] Beachkofski, B.; Grandhi, R. (2002). Improved Distributed Hypercube Sampling. American Institute of Aeronautics and Astronautics Paper 1274.
-
diversipy.cube.is_latin(design)¶ Check if design matrix is latin, i.e. each column is a random permutation of all integers between 0 and n-1.
-
diversipy.cube.rank1_design(num_points, dimension, g=None)¶ Design matrix for a rank-1 lattice.
This algorithm is deterministic and has linear run time.
Parameters: - num_points (int) – The number of points to generate.
- dimension (int) – The dimension of the space.
- g (array_like, optional) – Generator vector of length dimension.
Returns: design – Matrix with integers corresponding to the bins of a virtual grid.
Return type: (num_points, dimension) numpy array
Other¶
-
diversipy.cube.sample_halton(num_points, dimension, skip=20)¶ Generate a Halton point set.
Low discrepency quasi-random sequence using the Van der Corput sequence with the first dimension prime numbers as base.
Parameters: - num_points (int) – The number of points to generate.
- dimension (int) – The dimension of the space.
- skip (int, optional) – The first skip points of the sequence will be left out.
Returns: points
Return type: (num_points, dimension) numpy array
-
diversipy.cube.sample_maximin(num_points, dimension, num_steps=None, initial_points=None, existing_points=None, use_reflection_edge_correction=False, dist_args={'max_dist': 1, 'norm': 1}, callback=None)¶ Maximize the minimal distance in the unit hypercube with extensions.
This algorithm carries out a user-specified number of iterations to maximize the minimal distance of a point in the set to 1) other points in the set, 2) existing (fixed) points, and 3) the boundary of the hypercube. Details can be found in [Wessing2015].
Parameters: - num_points (int) – The number of points to generate.
- dimension (int) – The dimension of the space.
- num_steps (int, optional) – The number of iterations to carry out. Default is
100 * num_points. - initial_points (array_like, optional) – The point set to improve (if None, a sample is drawn with
stratified_sampling()). - existing_points (array_like, optional) – Points that cannot be modified anymore, but should be considered in the distance computations.
- use_reflection_edge_correction (bool, optional) – If True, selection pressure in boundary regions will be increased by considering additional distances to virtual points, which are created by mirroring the real points at the boundary.
- dist_args (dict, optional) – Arguments for the distance calculation. Default is L1 distance on a torus.
- callback (callable, optional) – If provided, it is called in each iteration with the current point set as argument for monitoring progress.
Returns: points
Return type: (num_points, dimension) numpy array
References
[Wessing2015] (1, 2) Wessing, Simon (2015). Two-stage Methods for Multimodal Optimization. PhD Thesis, Technische Universität Dortmund. http://hdl.handle.net/2003/34148
-
diversipy.cube.sample_k_means(num_points, dimension, num_steps=None, initial_points=None, dist_args={}, callback=None)¶ MacQueen’s method.
In its default setup, this algorithm converges to a centroidal Voronoi tesselation of the unit hypercube. Further information is given in [MacQueen1967].
Parameters: - num_points (int) – The number of points to generate.
- dimension (int) – The dimension of the space.
- num_steps (int, optional) – The number of iterations to carry out. Default is
100 * num_points. - initial_points (array_like, optional) – The point set to improve (if None, a sample is drawn with
stratified_sampling()). - dist_args (dict, optional) – Arguments for the distance calculation.
- callback (callable, optional) – If provided, it is called in each iteration with the current point set as argument for monitoring progress.
Returns: cluster_centers
Return type: (num_points, dimension) numpy array
References
[MacQueen1967] MacQueen, J. Some methods for classification and analysis of multivariate observations. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics, pp. 281–297, University of California Press, Berkeley, Calif., 1967. http://projecteuclid.org/euclid.bsmsp/1200512992.
-
diversipy.cube.grid(n_levels, dimension, sukharev=False)¶ Create conventional grid in the unit hypercube.
Also related to full factorial designs.
Parameters: - n_levels (int) – The number of levels in each dimension.
- dimension (int) – The dimension of the space.
- sukharev (bool, optional) – Switch for creating a Sukharev grid where the points are located at the centroids of the subcells, e.g. [0.25, 0.75] instead of [0, 1].
Returns: points
Return type: (n_levels ** dimension, dimension) numpy array
Helper functions¶
-
diversipy.cube.unitcube(dimension)¶ Shortcut to generate a tuple of bounds of the unit hypercube.
-
diversipy.cube.design_to_unitcube(design, transform='perturbed')¶ Transform a design matrix into samples in the unit hypercube.
Parameters: - design (2d-array (n_points, dimension)) – A design matrix
- transform (str) – “centered”: Each point is placed at the centroid of its cell. “spread”: Maximize the minimal distance between points. “perturbed”: Apply individual perturbations so that each point is uniformly distributed in its cell. This variant was proposed in [McKay1979]. “shifted”: Apply a joint perturbation so that each point is uniformly distributed in its cell. This variant was proposed in [Cranley1976].
Returns: Samples in the unit hypercube.
Return type: 2d-array (n_points, dimension)
References
[Cranley1976] R. Cranley; T. N. L. Patterson (1976). Randomization of Number Theoretic Methods for Multiple Integration. SIAM Journal on Numerical Analysis, vol. 13, no. 6, pp. 904-914.
-
diversipy.cube.is_latin(design) Check if design matrix is latin, i.e. each column is a random permutation of all integers between 0 and n-1.