bss package¶

Subpackages¶

Submodules¶

bss.data module¶

Functions to load the feature matrix X, the response vector y, and covariance matrix sigma from data files.

bss.data.load_cor_file(path, delimiter=', ')¶

Load the correlation file from a given file specified by path.

Parameters:

path : str

The full path to the ‘cor’ file containing correlation values of pairs of SNPs.

delimiter: str, optional

The delimiter for the csv file

Returns:

ndarray

A DxD symmetric matrix of correlation values between pairs of SNPs

bss.data.load_data(pattern)¶

A function to load the ‘main’ data and correlation data for a given file pattern

Parameters:

pattern : str

A file pattern (with directory path), consumable by the glob module for xy files to process For example: ‘/some/path/data/real0_yx_*.*’

Returns:

tuple

A 3-tuple of numpy arrays:

X: A numpy feature matrix (mxn) of genotype values, for m phenotypes and n SNPs

Y: A numpy vector (mx1) of phenotype values - the response variable

cor: A numpy matrix (nxn) of correlation values for each pair of n SNPs

bss.data.load_xy_file(path, delimiter=', ')¶

Load the feature matrix X and response vector y from a given file specified by path.

Parameters:

path : str

The full path to the ‘xy’ file containing SNP expression data.

delimiter: str, optional

The delimiter for the csv file

Returns:

tuple

A 2 tuple of values - (a NxD feature matrix, a Nx1 response vector)

bss.mvn module¶

A multivariate normal module inspired by the scipy.stats._multivariate module.

class bss.mvn.Mvn(mean=None, cov=None, min_eigenval=None, jitter=None, check_finite=True)¶

A multivariate normal random variable.

Parameters:

mean : ndarray, optional

The dx1 mean vector of the normal distribution. Assumed 0 if not specified.

cov : ndarray

The dxd covariance matrix. Assumed the Identity matrix if not specified.

min_eigenval : float, optional

The minimum eigenvalue of the covariance matrix we’re willing to accept. All values below this threshold are set to this value. If None (the default), no eignvalues are adjusted. A useful value is 0, which results in the covariance matrix being made positive definite.

jitter: float, optional

A small amount of noise to add to the diagonals of the covariance matrix to make the covariance matrix invertible. If None (the default), then no jitter is applied.

Attributes

Methods

chol¶: ndarray: The Cholesky decomposition of the covariance matrix of this distribution. Computed after the covariance matrix is made positive definite inside the constructor.

correlate(x)¶

Transform a random variate x into a variate correlated according to this Multivariate normal distribution, and centered around this distribution’s mean. This is accomplished by affine transforming the given data vector or data matrix.

Parameters:

x : ndarray

The dx1 vector for that we wish to transform, or more generally, the dxN data matrix that wish to transform

Returns:

ndarray

The dx1 transformed vector, or more generally, the dxN transformed data matrix

Notes

Uncorrelated random variables normally distributed with mean 0 and variance 1: Z ~ N(0, I)
can be transformed to correlated random variables X with mean A and covariance Sigma:: X ~ N(A, Sigma)
by selecting an affine transform:: X = A + BZ
where:: B B’ = Sigma

We choose B to be the Cholesky factorization, since we have computed it for this class.

For a detailed explanation of why this may be useful, see [MA10]

logpdf(x, precision_multiplier=1)¶

Calculate the Log Probability Density Function value of a given variate, optionally applying a multiplicative factor to the precision matrix (or equivalently, dividing the covariance matrix by a factor).

Parameters:

x : ndarray

The vector for which we wish to calculate the log PDF value

precision_multiplier : float, optional

Optional multiplier for the precision term in the covariance matrix, 1 by default.

Returns:

float

The Log PDF value of the vector x

maha(x, precision_multiplier=1)¶

Calculate the Mahalanobis distance between a given vector x and the mean of this distribution, optionally applying a multiplicative factor to the precision matrix (or equivalently, dividing the covariance matrix by a factor).

Parameters:

x : ndarray

The vector for which we wish to calculate the distance

precision_multiplier : float, optional

Optional multiplier for the precision term in the covariance matrix, 1 by default.

Returns:

float

A scalar distance value between x and the mean of this distribution

rvs(precision_multiplier=1)¶

Generate a random variate for this normal distribution, optionally applying a multiplicative factor to the precision matrix (or equivalently, dividing the covariance matrix by a factor).

Parameters:

precision_multiplier : float, optional

Optional multiplier for the precision term in the covariance matrix, 1 by default.

Returns:

ndarray

A single Nx1 random variate from this normal distribution

whiten(x)¶

Transform the dx1 random variate x into a whitened random vector with unit diagonal covariance.

Parameters:

x : ndarray

The dx1 vector for that we wish to whiten, or more generally, the dxN data matrix that wish to whiten

Returns:

ndarray

The dx1 whitened vector, or more generally, the dxN whitened data matrix

Notes

For a detailed explanation of why this may be useful, see [MA10]

Module contents¶

bss.setup_logging(path='logging.yml')¶