bss package¶
Subpackages¶
Submodules¶
bss.data module¶
Functions to load the feature matrix X, the response vector y, and covariance matrix sigma from data files.
-
bss.data.load_cor_file(path, delimiter=', ')¶ Load the correlation file from a given file specified by path.
Parameters: path : str
The full path to the ‘cor’ file containing correlation values of pairs of SNPs.
delimiter: str, optional
The delimiter for the csv file
Returns: ndarray
A DxD symmetric matrix of correlation values between pairs of SNPs
-
bss.data.load_data(pattern)¶ A function to load the ‘main’ data and correlation data for a given file pattern
Parameters: pattern : str
A file pattern (with directory path), consumable by the glob module for xy files to process For example: ‘/some/path/data/real0_yx_*.*’
Returns: tuple
A 3-tuple of numpy arrays:
X: A numpy feature matrix (mxn) of genotype values, for m phenotypes and n SNPs
Y: A numpy vector (mx1) of phenotype values - the response variable
cor: A numpy matrix (nxn) of correlation values for each pair of n SNPs
-
bss.data.load_xy_file(path, delimiter=', ')¶ Load the feature matrix X and response vector y from a given file specified by path.
Parameters: path : str
The full path to the ‘xy’ file containing SNP expression data.
delimiter: str, optional
The delimiter for the csv file
Returns: tuple
A 2 tuple of values - (a NxD feature matrix, a Nx1 response vector)
bss.mvn module¶
A multivariate normal module inspired by the scipy.stats._multivariate module.
-
class
bss.mvn.Mvn(mean=None, cov=None, min_eigenval=None, jitter=None, check_finite=True)¶ A multivariate normal random variable.
Parameters: mean : ndarray, optional
The dx1 mean vector of the normal distribution. Assumed 0 if not specified.
cov : ndarray
The dxd covariance matrix. Assumed the Identity matrix if not specified.
min_eigenval : float, optional
The minimum eigenvalue of the covariance matrix we’re willing to accept. All values below this threshold are set to this value. If None (the default), no eignvalues are adjusted. A useful value is 0, which results in the covariance matrix being made positive definite.
jitter: float, optional
A small amount of noise to add to the diagonals of the covariance matrix to make the covariance matrix invertible. If None (the default), then no jitter is applied.
Attributes
Methods
-
chol¶ ndarray: The Cholesky decomposition of the covariance matrix of this distribution. Computed after the covariance matrix is made positive definite inside the constructor.
-
correlate(x)¶ Transform a random variate x into a variate correlated according to this Multivariate normal distribution, and centered around this distribution’s mean. This is accomplished by affine transforming the given data vector or data matrix.
Parameters: x : ndarray
The dx1 vector for that we wish to transform, or more generally, the dxN data matrix that wish to transform
Returns: ndarray
The dx1 transformed vector, or more generally, the dxN transformed data matrix
Notes
- Uncorrelated random variables normally distributed with mean 0 and variance 1
- Z ~ N(0, I)
- can be transformed to correlated random variables X with mean A and covariance Sigma:
- X ~ N(A, Sigma)
- by selecting an affine transform:
- X = A + BZ
- where:
- B B’ = Sigma
We choose B to be the Cholesky factorization, since we have computed it for this class.
For a detailed explanation of why this may be useful, see [MA10]
-
logpdf(x, precision_multiplier=1)¶ Calculate the Log Probability Density Function value of a given variate, optionally applying a multiplicative factor to the precision matrix (or equivalently, dividing the covariance matrix by a factor).
Parameters: x : ndarray
The vector for which we wish to calculate the log PDF value
precision_multiplier : float, optional
Optional multiplier for the precision term in the covariance matrix, 1 by default.
Returns: float
The Log PDF value of the vector x
-
maha(x, precision_multiplier=1)¶ Calculate the Mahalanobis distance between a given vector x and the mean of this distribution, optionally applying a multiplicative factor to the precision matrix (or equivalently, dividing the covariance matrix by a factor).
Parameters: x : ndarray
The vector for which we wish to calculate the distance
precision_multiplier : float, optional
Optional multiplier for the precision term in the covariance matrix, 1 by default.
Returns: float
A scalar distance value between x and the mean of this distribution
-
rvs(precision_multiplier=1)¶ Generate a random variate for this normal distribution, optionally applying a multiplicative factor to the precision matrix (or equivalently, dividing the covariance matrix by a factor).
Parameters: precision_multiplier : float, optional
Optional multiplier for the precision term in the covariance matrix, 1 by default.
Returns: ndarray
A single Nx1 random variate from this normal distribution
-
whiten(x)¶ Transform the dx1 random variate x into a whitened random vector with unit diagonal covariance.
Parameters: x : ndarray
The dx1 vector for that we wish to whiten, or more generally, the dxN data matrix that wish to whiten
Returns: ndarray
The dx1 whitened vector, or more generally, the dxN whitened data matrix
Notes
For a detailed explanation of why this may be useful, see [MA10]
-