eggplant package
eggplant.methods module
- class eggplant.methods.PoissonDiscSampler(crd: ndarray, min_dist: float, seed: Optional[int] = None)[source]
Bases:
object
Poisson Disc Sampler Designed according to the principles outlined in Bridson [Bri07] but for d=2.
- add_k_in_annulus(point: Union[ndarray, Tuple[float, float]], k: int = 5) ndarray [source]
adds k points randomly in annulus around point
- coord_to_cell(point: Union[ndarray, Tuple[float, float]]) Tuple[int, int] [source]
- helper function, transforms coordinates
to cell id (in grid)
- Parameters
point¶ (Union[np.ndarray, Tuple[float, float]]) – coordinates of point to get cell id for
- Returns
a tuple (i,j) where i is the grid row index and j is the column index.
- Return type
Tuple[int,int]
- get_neighbors(idx: Tuple[int, int]) List[Tuple[int, int]] [source]
get neighbors in grid
Note: includes self.
- Parameters
idx¶ (Tuple[int,int]) – index of cell to get neighbors of
- Returns
list of neighbor indices
- Return type
List[Tuple[int,int]]
- eggplant.methods.estimate_n_landmarks(adatas: Union[AnnData, List[AnnData], Dict[str, AnnData]], n_max_lmks: int = 50, n_min_lmks: Optional[int] = 1, n_evals: int = 10, n_reps: int = 3, feature: Optional[str] = None, layer: Optional[str] = None, device: Literal['cpu', 'gpu'] = 'cpu', n_epochs: int = 1000, learning_rate: float = 0.01, subsample: Optional[Union[float, int]] = None, verbose: bool = False, spatial_key: str = 'spatial', max_cg_iterations: int = 1000, tail_length: int = 200, seed: int = 1, spread_distance: Optional[float] = None, diameter_multiplier: float = 10) Tuple[ndarray, Union[Dict[str, List[float]], List[float]], Optional[Union[List[float], Dict[str, float]]]] [source]
Estimate how landmark numbers influence outcome
- Parameters
adatas¶ (Union[ad.AnnData, List[ad.AnnData], Dict[str, ad.AnnData]]) – Single AnnData file or list or dictionary with AnnDatas to be analyzed.
n_max_lmks¶ (Union[float, int] = 50) – max number of landmarks to include in the analysis.
n_evals¶ (int) – number of evaluations. The number of lansmarks tested will be equally spaced in the interval [1,n_max_lmks], defaults to 10.
layer¶ (Optional[str]) – which layer to use
device¶ (Literal["cpu", "gpu"]) – which device to perform computations on, defaults to “cpu”
n_epochs¶ (int) – number of epochs to use when learning the relationship between landmark distance and feature values, defaults to 1000.
rate¶ (learning) – learning rate to use in optimization, defaults to 0.01.
subsample¶ (Optional[Union[float, int]]) – whether to subsample the data or not. If a value less than 1 is given, then it’s interpreted as a fraction of the total number of observations, if larger than zero as absolute number of observations to keep. If exactly 1 or None, no subsampling will occur. Note, landmarks are selected before subsampling. Defaults to None.
verbose¶ (bool) – set to True to use verbose mode, defaults to False.
spatial_key¶ (str) – key to use to extract spatial coordinates from the obsm attribute. Defaults to “spatial”.
max_cg_iterations¶ (int) – The maximum number of conjugate gradient iterations to perform (when computing matrix solves). A higher value rarely results in more accurate solves – instead, lower the CG tolerance (from GPyTorch documentation), defaults to 1000.
tail_length¶ (int) – the last tail_length observations will be used to compute an average MLL value. If n_epochs are less than tail_length, all epochs will be used instead. Defaults to 50.
seed¶ (int) – value of random seed, defaults to 1.
spread_distance¶ (Optional[float]) – distance between points in Poisson Disc Sampling. Equivalent to
min_dist
.diameter_multiplier¶ (float) – applicable to assays where the
- Returns
A tuple with a vector listing the number of landmarks used in each evaluation as first element and as second the corresponding average MLL values.
- Return type
Tuple[np.ndarray, Union[Dict[str, List[float]], List[float]]]
- eggplant.methods.fa_transfer_to_reference(adatas: Union[AnnData, List[AnnData], Dict[str, AnnData]], reference: Reference, variance_threshold: float = 0.3, n_components: Optional[int] = None, use_highly_variable: bool = False, layer: Optional[str] = None, device: Literal['cpu', 'gpu', 'cuda'] = 'cpu', n_epochs: int = 1000, learning_rate: float = 0.01, subsample: Optional[Union[float, int]] = None, verbose: bool = False, return_models: bool = False, return_losses: bool = True, max_cg_iterations: int = 1000, meta_key: str = 'meta', inference_method: Literal['exact', 'variational'] = 'exact', **kwargs) Dict[str, Union[List[Union[GPModelExact, GPModelApprox]], List[ndarray]]] [source]
fast approximate transfer of observed data to a reference
similar to
transfer_to_reference
, but designed for fast approximate transfer of the full set of features. To speed up the transfer process we project the data into a low-dimensional space, transfer this representation, and then reconstruct the original data. This significantly reduces the number of features that need to be transferred to the reference, but comes at the cost of only an approximate representation, which depends on either of the specified variance_threshold or n_components parameters.- Parameters
adatas¶ (Union[ad.AnnData, List[ad.AnnData], Dict[str, ad.AnnData]]) – AnnData objects holding data to transfer
reference¶ (m.Reference) – reference to transfer data to
variance_threshold¶ (float) – fraction of variance that principal components should explain
n_components¶ (Optional[int]) – use instead of variance_threshold. If specified, exactly n_components will be used.
use_highly_variable¶ (bool) – only use highly_variable_genes to compute the principal components. Default is False.
layer¶ (Optional[str]) – which layer to extract data from, defaults to raw
device¶ (Litreal["cpu","gpu","cuda"]) – device to use for computations, defaults to “cpu”
n_epochs¶ (int) – number of epochs to use, defaults to 1000
learning_rate¶ (float) – learning rate, defaults to 0.01
subsample¶ – if <= 1 then interpreted of fraction of observations,
to keep. If > 1 interpreted as number of observations to keep in sumbsampling, defaults to None (no sumbsampling) :type _sphinx_paramlinks_eggplant.methods.fa_transfer_to_reference.subsample: Optional[Union[float, int]] = None, :param _sphinx_paramlinks_eggplant.methods.fa_transfer_to_reference.verbose: set to true to use verbose mode, defaults to True :type _sphinx_paramlinks_eggplant.methods.fa_transfer_to_reference.verbose: bool :param _sphinx_paramlinks_eggplant.methods.fa_transfer_to_reference.return_models: set to True to return fitted models, defaults to False :type _sphinx_paramlinks_eggplant.methods.fa_transfer_to_reference.return_models: bool :param _sphinx_paramlinks_eggplant.methods.fa_transfer_to_reference.return_losses: return loss history of each model, defaults to True :type _sphinx_paramlinks_eggplant.methods.fa_transfer_to_reference.return_losses: bool :param _sphinx_paramlinks_eggplant.methods.fa_transfer_to_reference.max_cg_iterations: The maximum number of conjugate gradient iterations to perform (when computing matrix solves). A higher value rarely results in more accurate solves – instead, lower the CG tolerance (from GPyTorch documentation), defaults to 1000. :type _sphinx_paramlinks_eggplant.methods.fa_transfer_to_reference.max_cg_iterations: int = 1000, :param _sphinx_paramlinks_eggplant.methods.fa_transfer_to_reference.meta_key: key in uns slot that holds additional meta info :type _sphinx_paramlinks_eggplant.methods.fa_transfer_to_reference.meta_key: str
- eggplant.methods.fit(model: Union[GPModelExact, GPModelApprox], n_epochs: int, optimizer: Optional[Optimizer] = None, fast_computation: bool = True, learning_rate: float = 0.01, verbose: bool = False, seed: int = 0, batch_size: Optional[int] = None, progress_message: Optional[str] = None, **kwargs) None [source]
fit GP Model
- Parameters
model¶ (m.GPModel,) – Model to fit
n_epochs¶ (int) – number of epochs
optimizer¶ (Optional[t.optim.Optimizer]) – optimizer to use during fitting, defaults to Adams
fast_computation¶ (bool = True,) – whether to use fast approximations to functions, defaults to True
learning_rate¶ (float) – learning rate, defaults to 0.01
verbose¶ (bool = False,) – set to True for verbose mode, prints progress, defaults to True
seed¶ (int) – random seed, defaults to 0
batch_size¶ (int) – Batch size. Defaults to None.
progress_message¶ (str) – message to include in progress bar
- eggplant.methods.landmark_lower_bound(n_lmks: ndarray, losses: Union[List[ndarray], ndarray, Dict[str, ndarray]], kneedle_s_param: float = 1) float [source]
automatic identification of lower bound
based on result from
estimate_n_landmarks()
.
- eggplant.methods.transfer_to_reference(adatas: Union[AnnData, List[AnnData], Dict[str, AnnData]], features: Union[str, List[str]], reference: Reference, layer: Optional[str] = None, device: Literal['cpu', 'gpu', 'cuda'] = 'cpu', n_epochs: int = 1000, learning_rate: float = 0.01, subsample: Optional[Union[float, int]] = None, verbose: bool = False, return_models: bool = False, return_losses: bool = True, max_cg_iterations: int = 1000, meta_key: str = 'meta', inference_method: Union[Literal['exact', 'variational'], List[Literal['exact', 'variational']], Dict[str, Literal['exact', 'variational']]] = 'exact', n_inducing_points: Optional[int] = None, batch_size: Optional[int] = None, **kwargs) Dict[str, Union[List[Union[GPModelExact, GPModelApprox]], List[ndarray]]] [source]
transfer observed data to a reference
- Parameters
adatas¶ (Union[ad.AnnData, List[ad.AnnData], Dict[str, ad.AnnData]]) – AnnData objects holding data to transfer
features¶ (Union[str, List[str]]) – name of feature(s) to transfer
reference¶ (m.Reference) – reference to transfer data to
layer¶ (Optional[str]) – which layer to extract data from, defaults to raw
device¶ (Litreal["cpu","gpu"]) – device to use for computations, defaults to “cpu”
n_epochs¶ (int) – number of epochs to use, defaults to 1000
learning_rate¶ (float) – learning rate, defaults to 0.01
subsample¶ (Optional[Union[float, int]] = None,) – if <= 1 then interpreted of fraction of observations, to keep. If > 1 interpreted as number of observations to keep in sumbsampling, defaults to None (no sumbsampling)
verbose¶ (bool) – set to true to use verbose mode, defaults to True
return_models¶ (bool) – set to True to return fitted models, defaults to False
return_losses¶ (bool) – return loss history of each model, defaults to True
max_cg_iterations¶ (int = 1000,) – The maximum number of conjugate gradient iterations to perform (when computing matrix solves). A higher value rarely results in more accurate solves – instead, lower the CG tolerance (from GPyTorch documentation), defaults to 1000.
meta_key¶ (str) – key in uns slot that holds additional meta info
inference_method¶ (Union[Literal["exact", "variational"], List[Literal["exact", "variational"]], Dict[str, Literal["exact", "variational"]]]) – which inference method to use, the options are “exact” and “variational” - use “variational” for large data sets. If a single string is given then this method is applied to all objects in the data set, otherwise a list or dict specifying the inference method for each object can be given. Defaults to “exact”.
n_inducing_points¶ (int) – number of inducing points to use in variational inference. Defaults to None, which will render 10% of observations in observed data if n_obs < 1e5 else 10000.
batch_size¶ (int) – batch_size to use in approximate (variational) inference. Must be less than or equal to the number of observations in a sample.
eggplant.models module
- class eggplant.models.BaseGP(landmark_distances: Union[Tensor, DataFrame, ndarray], feature_values: Tensor, landmark_names: Optional[List[str]] = None, mean_fun: Optional[Mean] = None, kernel_fun: Optional[Kernel] = None, device: Literal['cpu', 'gpu'] = 'cpu')[source]
Bases:
object
BaseModel for GP Regression
should be combined with one of gpytorch.models models, e.g., ApproximateGP or ExactGP
- __init__(landmark_distances: Union[Tensor, DataFrame, ndarray], feature_values: Tensor, landmark_names: Optional[List[str]] = None, mean_fun: Optional[Mean] = None, kernel_fun: Optional[Kernel] = None, device: Literal['cpu', 'gpu'] = 'cpu') None [source]
Constructor method
- Parameters
landmark_distance¶ (Union[t.Tensor, pd.DataFrame, np.ndarray]) – n_obs x n_landmarks array with distance to each landmark for every observation
feature_values¶ (t.Tensor) – n_obs x n_feature array with feature values for each observation
mean_fun¶ (gp.means.Mean, optional) – mean function
kernel¶ (gp.kernels.Kernel, optional) – Kernel to use in covariance function
device¶ (Literal["cpu","gpu"]) – device to execute operations on, defaults to “cpu”
- property loss_history: List[float]
Loss history record
- class eggplant.models.GPModelApprox(landmark_distances: Union[Tensor, DataFrame, ndarray], feature_values: Tensor, inducing_points: Union[Tensor, DataFrame, ndarray], landmark_names: Optional[List[str]] = None, likelihood: Optional[Likelihood] = None, mean_fun: Optional[Mean] = None, kernel_fun: Optional[Kernel] = None, device: Literal['cpu', 'gpu'] = 'cpu', learn_inducing_points: bool = True)[source]
Bases:
BaseGP
,ApproximateGP
- __init__(landmark_distances: Union[Tensor, DataFrame, ndarray], feature_values: Tensor, inducing_points: Union[Tensor, DataFrame, ndarray], landmark_names: Optional[List[str]] = None, likelihood: Optional[Likelihood] = None, mean_fun: Optional[Mean] = None, kernel_fun: Optional[Kernel] = None, device: Literal['cpu', 'gpu'] = 'cpu', learn_inducing_points: bool = True) None [source]
Constructor method for approximate (variational) inference using inducing points
- Parameters
landmark_distance¶ (Union[t.Tensor, pd.DataFrame, np.ndarray]) – n_obs x n_landmarks array with distance to each landmark for every observation
feature_values¶ (t.Tensor) – n_obs x n_feature array with feature values for each observation
inducing_points¶ (t.Tensor) – points to use as inducing points, if learn_inducing_points = True these act as intialization of inducing_points.
likelihood¶ (gp.likelihoods.Likelihood, optional) – likelihood function
mean_fun¶ (gp.means.Mean, optional) – mean function
kernel¶ (gp.kernels.Kernel, optional) – Kernel to use in covariance function
device¶ (Literal["cpu","gpu"]) – device to execute operations on, defaults to “cpu”
learn_inducing_points¶ (bool) – whether or not to treat inducing points as parameters to be learnt. Default is True.
- training: bool
- class eggplant.models.GPModelExact(landmark_distances: Union[Tensor, DataFrame, ndarray], feature_values: Tensor, landmark_names: Optional[List[str]] = None, likelihood: Optional[Likelihood] = None, mean_fun: Optional[Mean] = None, kernel_fun: Optional[Kernel] = None, device: Literal['cpu', 'gpu'] = 'cpu')[source]
Bases:
BaseGP
,ExactGP
- __init__(landmark_distances: Union[Tensor, DataFrame, ndarray], feature_values: Tensor, landmark_names: Optional[List[str]] = None, likelihood: Optional[Likelihood] = None, mean_fun: Optional[Mean] = None, kernel_fun: Optional[Kernel] = None, device: Literal['cpu', 'gpu'] = 'cpu') None [source]
Constructor method for exact inference
- Parameters
landmark_distance¶ (Union[t.Tensor, pd.DataFrame, np.ndarray]) – n_obs x n_landmarks array with distance to each landmark for every observation
feature_values¶ (t.Tensor) – n_obs x n_feature array with feature values for each observation
likelihood¶ (gp.likelihoods.Likelihood, optional) – likelihood function
mean_fun¶ (gp.means.Mean, optional) – mean function
kernel¶ (gp.kernels.Kernel, optional) – Kernel to use in covariance function
device¶ (Literal["cpu","gpu"]) – device to execute operations on, defaults to “cpu”
- training: bool
- class eggplant.models.Reference(domain: Union[tensor, ndarray], landmarks: Union[tensor, ndarray, DataFrame], meta: Optional[Union[DataFrame, dict]] = None)[source]
Bases:
object
Reference Container
- __init__(domain: Union[tensor, ndarray], landmarks: Union[tensor, ndarray, DataFrame], meta: Optional[Union[DataFrame, dict]] = None) None [source]
Constructor function
- composite_representation(by: str = 'feature')[source]
produce composite representation
- Parameters
by¶ (str, default to "feature") – consensus representation with respect to this meta data feature
- plot(models: Optional[Union[List[str], str]] = None, *args, **kwargs) None [source]
quick plot function
eggplant.plot module
- class eggplant.plot.ColorMapper(cmap: Dict[T, str])[source]
Bases:
object
helper class for colormaps
makes it easier to get color values for arrays and lists.
- eggplant.plot.distplot_transfer(ref: Reference, inside: Dict[str, str], outside: Optional[Dict[str, str]] = None, n_cols: Optional[int] = None, n_rows: Optional[int] = None, side_size: float = 4.0, swarm_marker_style: Optional[Dict[str, Any]] = None, mean_marker_style: Optional[Dict[str, Any]] = None, display_grid: bool = True, title_fontsize: float = 25, label_fontsize: float = 20, ticks_fontsize: float = 15, return_figures: bool = True) Optional[Tuple[Figure, Axes]] [source]
Swarmplot-like visualization of enrichment
- Parameters
ref¶ ("m.Reference",) – Reference holding transferred data
outside¶ (Optional[Dict[str, str]]) – attribute to compare features within. If None all inside features will be compared together.
inside¶ (Dict[str, str],) – feature to compare within outer attribute
n_cols¶ (Optional[int]) – number of columns, defaults to None
n_rows¶ (Optional[int] = None,) – number of rows, defaults to None
side_size¶ (float) – size of each outer panel, defaults to 4
swarm_marker_style¶ (Optional[Dict[str, Any]]) – data marker style, defaults to None
mean_marker_style¶ (Optional[Dict[str, Any]]) – marker style of mean indicator, defaults to None
display_grid¶ (bool) – set to True if grid shall be displayed in background, defaults to True
title_fontsize¶ (float) – fontsize of title, defaults to 25
label_fontsize¶ (float) – fontsize of x-and ylabel, defaults to 20
ticks_fontsize¶ (float) – fontisize of x-and yticks, defaults to 15
return_figures¶ (bool) – set to True if Figure and Axes objexts should be returned, defaults to True
- Returns
None or Figure and Axes objects, depending on return_figures value.
- Return type
Union[None,Tuple[plt.Figure,plt.Axes]]
- eggplant.plot.landmark_diagnostics(lmk_eval_res: Tuple[List[List[float]], Union[List[float], Dict[str, float]]], side_size: Optional[Union[Tuple[float, float], float]] = None, title_fontsize: float = 20, lower_bound: Optional[Union[List[float], Dict[str, float]]] = None, line_style_dict: Optional[Dict[str, Any]] = None, label_style_dict: Optional[Dict[str, Any]] = None, ticks_style_dict: Optional[Dict[str, Any]] = None, return_figures: bool = False, savgol_params: Optional[Dict[str, Any]] = None) Optional[Tuple[Figure, Axes]] [source]
- Parameters
lmk_eval_res¶ (Tuple[List[List[float]], Union[List[float], Dict[str, float]]]) – output from :fun:`~eggplant.methods.estimate_n_landmarks`
side_size¶ (Optional[Union[Tuple[float, float], float]] = None,) – side size of plot, if None then based on data, defaults to None
title_fontsize¶ (float) – fontsize of plot title, default 20
lower_bound¶ (Optional[Union[Dict[str, float], List[float]]]) – lower_bound value (for number of landmarks), will be indicated by a black dashed line
line_style_dict¶ – dictionary with style parameter for :fun:`~matplotlib.pyplot.plot`,
default None :type _sphinx_paramlinks_eggplant.plot.landmark_diagnostics.line_style_dict: Optional[Dict[str, Any]] :param _sphinx_paramlinks_eggplant.plot.landmark_diagnostics.label_style_dict: dictionary with
style parameters for :fun:`~matplotlib.pyplot.xlabel`, and ylabel, default None
- Parameters
ticks_style_dict¶ (Optional[Dict[str, Any]]) – dictionary with style parameters for xaxis.set_tick_params and yaxis.set_tick_params
return_figures¶ (bool) – set to true to return figure, otherwise only show, default False
savgol_params¶ (Dict[str, Any]) – parameters to scipy.signal.savgol_filter, default parameters polyorder=4 and window_length=5
- Returns
plot of nMLL loss against the number of landmarks
- Return type
Optional[Tuple[plt.Figure, plt.Axes]]
- eggplant.plot.model_diagnostics(models: Optional[Union[Dict[str, m.GPModel], m.GPModel]] = None, losses: Optional[Union[Dict[str, ndarray], ndarray]] = None, n_cols: int = 5, width: float = 5, height: float = 3, return_figures: bool = False) Optional[Tuple[Figure, axes]] [source]
plot loss history for models
can take either a set of models or losses.
- Parameters
models¶ (Optional[Union[Dict[str, "m.GPModel"], "m.GPModel"]] = None,) – models to investigate
losses¶ (Optional[Union[Dict[str, np.ndarray], np.ndarray]] = None,) – losses to visualize
n_cols¶ (int) – number of columns, defaults to 5
width¶ (float) – width of each subplot panel (visualizing one model’s loss over time), defaults to 5
height¶ (float = 3,) – height of each subplot panel (visualizing one model’s loss over time), defaults to 3
return_figures¶ (bool = False,) – set to True if Figure and Axes objects should be returned, defaults to False
- Returns
None or Figure and Axes objects, depending on return_figure value.
- Return type
Optional[Tuple[plt.Figure,plt.Axes]]
- eggplant.plot.visualize_landmark_spread(adata: AnnData, feature: Optional[str] = None, layer: Optional[str] = None, spread_distance: Optional[float] = None, diameter_multiplier: float = 1, side_size: float = 5, marker_size: float = 20, landmark_marker_size: Optional[float] = None, label_fontsize: float = 10, title_fontsize: float = 20, seed: int = 1, text_h_offset: Optional[float] = None, bbox_style: Optional[Dict[str, Any]] = None, return_figures: bool = False) Optional[Tuple[Figure, Axes]] [source]
function to visualize landmark distribution with given set of parameters. Here Poisson Disc Sampling (PSD) is used to randomly distribute the landmarks within the spatial domain.
- Parameters
adata¶ (ad.AnnData) – data to visualize
feature¶ (Optional[str]) – feature to visualize, if None then (normalized) sum across all features, defaults to None
layer¶ (Optional[str]) – layer to use for feature extraction, defaults to None
spread_distance¶ (Optional[float]) – see
spread_distance
diameter_multiplier¶ (float) – see
diameter_multiplier
side_size¶ (float) – side size of plot, defaults to 5
marker_size¶ (float) – marker size of spatial locations, defaults to 20
landmark_marker_size¶ (Optional[float]) – size of landmark indicators, if None then twice that of marker_size, defaults to None
label_fontsize¶ (float) – fontsize of landmark enumeration labels, defaults to 10
title_fontsize¶ (float) – fontsize of title, defaults to 20
seed¶ (int) – random seed (numpy)
text_h_offset¶ (Optional[float]) – horizontal distance between landmark enumeration label and landmark marker, if None then spread_distance / 5, defaults to None
bbox_style¶ (Optional[Dict[str, Any]]) – see bbox argument for
matplotlib.pyplot.text()
return_figures¶ (bool) – return figure and axes objects, defaults to False
- Returns
Figure and Axes object if return_figures=True, else plots the landmarks in the spatial domain.
- Return type
Optional[Tuple[plt.Figure, plt.Axes]]
- eggplant.plot.visualize_observed(adatas: Union[Dict[str, AnnData], List[AnnData]], features: Optional[Union[List[str], str]], layer: Optional[str] = None, **kwargs) None [source]
Visualize observed data to be transferred
- Parameters
adatas¶ (Union[Dict[str,ad.AnnData],List[ad.AnnData]]) – List or dictionary of AnnData objects holding the data to be transferred
features¶ (Union[str,List[str]]) – Name of feature to be visualized
n_cols¶ (Optional[int]) – number of desired colums
n_rows¶ (Optional[int]) – number of desired rows
marker_size¶ (float) – scatter plot marker size
show_landmarks¶ (bool) – show landmarks in plot
landmark_marker_size (_sphinx_paramlinks_eggplant.plot.visualize_observed.) – size of landmarks
side_size¶ (float) – side size for each figure sublot
landmark_cmap¶ (Optional[Dict[int,str]], optional) – colormap for landmarks
share_colorscale¶ (bool) – set to true if subplots should all have the same colorscale
return_figures¶ (bool) – set to true if figure and axes objects should be returned
include_colorbar¶ (bool) – set to true to include colorbar
separate_colorbar¶ (bool) – set to true if colorbar should be plotted in separate figure, only possible when share_colorscale = True
colorbar_orientation¶ (str) – choose between ‘horizontal’ and ‘vertical’ for orientation of colorbar
include_title¶ (bool) – set to true to include title
fontsize¶ (str) – font size of title
hspace¶ (Optional[float]) – height space between subplots. If none then default matplotlib settings are used.
wspace¶ (Optional[float]) – width space between subplots. If none then default matplotlib settings are used.
quantile_scaling¶ (bool) – set to true to use quantile scaling. Can help to minimize quenching effect of outliers
flip_y¶ (bool) – set to true if y-axis should be flipped
colorbar_fontsize¶ (float) – fontsize of colorbar ticks
show_image¶ (bool) – show tissue image in background.
- Returns
None or Figure and Axes objects, depending on return_figure value.
- Return type
Optional[Tuple[plt.Figure,plt.Axes]]
- eggplant.plot.visualize_sdea_results(ref: m.GPModel, dge_res: Dict[str, ndarray], cmap: str = 'RdBu_r', n_cols: int = 4, marker_size: float = 10, side_size: float = 8, title_fontsize: float = 20, colorbar_fontsize: float = 20, colorbar_orientation: Literal['horizontal', 'vertical'] = 'horizontal', no_sig_color: str = 'lightgray', reorder_axes: Optional[List[int]] = None, return_figures: bool = False) Optional[Tuple[Figure, Axes]] [source]
Visualize result from spatial differential expression analysis
- Parameters
ref¶ (m.GPModel) – reference object of type
Reference
, information should have been transferred to the object.cmap¶ (str) – colormap to use (choose from matplotlib), defaults to RdBu_r
n_cols¶ (int) – number of columns, defaults to 4
marker_size¶ (float) – size of marker, defaults to 10
title_fontsize¶ (float) – fontsize of title, defaults to 20
colorbar_fontsize¶ – fontsize of colorbar ticks, defaults to 20
colorbar_orientation¶ (Literal["horizontal", "vertical"]) – orientation of colorbar, defaults to horizontal
no_sig_color¶ (str) – color of locations with non-significant, difference in expression, defaults to lightgray
reorder_axes¶ (Optional[List[int]]) – new order of axes, original order is [0, 1, 2,..], give new new order as [1, 0, 2,…] to switch place of subplot 1 and 0.
return_figures¶ (Union[Tuple[plt.Figure, plt.Axes],None]) – return figure and axes objects. Default value is False.
- Colorbar_fontsize
float
- Returns
Figure and Axes objects
- Return type
Tuple[plt.Figure,plt.Axes]
- eggplant.plot.visualize_transfer(reference: Union[Reference, AnnData], attributes: Optional[Union[List[str], str]] = None, layer: Optional[str] = None, **kwargs) Optional[Tuple[Figure, Axes]] [source]
Visualize results after transfer to reference
- Parameters
reference¶ (Union[m.Reference,as.AnnData]) – reference object or AnnData holding transferred values
attributes¶ (Optional[Union[List[str],str]]) – visualize transferred models with these attributes. Must be found in .var slot. If none specified, all transfers will be visualized.
layer¶ (str) – name of layer to use
n_cols¶ (Optional[int]) – number of desired colums
n_rows¶ (Optional[int]) – number of desired rows
marker_size¶ (float) – scatter plot marker size
show_landmarks¶ (bool) – show landmarks in plot
landmark_marker_size (_sphinx_paramlinks_eggplant.plot.visualize_transfer.) – size of landmarks
side_size¶ (float) – side size for each figure sublot
landmark_cmap¶ (Optional[Dict[int,str]], optional) – colormap for landmarks
share_colorscale¶ (bool) – set to true if subplots should all have the same colorscale
return_figures¶ (bool) – set to true if figure and axes objects should be returned
include_colorbar¶ (bool) – set to true to include colorbar
separate_colorbar¶ (bool) – set to true if colorbar should be plotted in separate figure, only possible when share_colorscale = True
colorbar_orientation¶ (str) – choose between ‘horizontal’ and ‘vertical’ for orientation of colorbar
include_title¶ (bool) – set to true to include title
fontsize¶ (str) – font size of title
hspace¶ (Optional[float]) – height space between subplots. If none then default matplotlib settings are used.
wspace¶ (Optional[float]) – width space between subplots. If none then default matplotlib settings are used.
quantile_scaling¶ (bool) – set to true to use quantile scaling. Can help to minimize quenching effect of outliers
flip_y¶ (bool) – set to true if y-axis should be flipped
colorbar_fontsize¶ (float) – fontsize of colorbar ticks
show_image¶ (bool) – show tissue image in background.
- Returns
None or Figure and Axes objects, depending on return_figure value.
- Return type
Optional[Tuple[plt.Figure,plt.Axes]]
eggplant.preprocess module
- eggplant.preprocess.default_normalization(adata: AnnData, min_cells: float = 0.1, total_counts: float = 10000.0, exclude_highly_expressed: bool = False, compute_highly_variable_genes: bool = False, n_top_genes: int = 2000) None [source]
default normalization recipe
the normalization strategy that applied for a majority of the analyses presented in the original manuscript. We abstain from calling it a recommended strategy, as the best strategy is depends on your data. However, this strategy have worked well with several data types.
The recipe is based on preprocessing functions from the
scanpy.preprocess
module and is given as follows:sc.pp.filter_genes(adata, min_cells=min_cells) sc.pp.normalize_total(adata,total_counts, exclude_highly_expressed=exclude_highly_expressed) sc.pp.log1p(adata) sc.pp.scale(adata)
- Parameters
adata¶ (ad.AnnData,) – anndata object to normalize
min_cells¶ (float = 0.1,) – argument to
scanpy.preprocess.filter_genes()
total_counts¶ (float) – argument to
scanpy.preprocess.normalize_total()
, default is 1e4exclude_highly_expressed¶ (bool) – argument to
scanpy.preprocess.normalize_total()
, default False
- eggplant.preprocess.get_landmark_distance(adata: AnnData, landmark_position_key: str = 'curated_landmarks', landmark_distance_key: str = 'landmark_distances', reference: Optional[Union[Reference, ndarray]] = None, **kwargs) None [source]
compute landmark distances
- Parameters
adata¶ (ad.AnnData) – AnnData object where distance between landmarks and observations should be measured
landmark_position_key¶ (str) – key of landmark coordinates, defaults to “curated_landmarks
landmark_position_key¶ – key to use for landmark distances in .obsm, defaults to “landmark_distances”
reference¶ (Optional[Union[m.Reference, np.ndarray]]) – provide reference if non-homogeneous distortions should be corrected for using TPS (thin plate splines)
- eggplant.preprocess.intersect_features(adatas: Union[List[AnnData], Dict[str, AnnData]]) None [source]
- eggplant.preprocess.join_adatas(adatas: List[AnnData], **kwargs) None [source]
join together a set of AnnData objects
- Parameters
adatas¶ (List[ad.AnnData]) – AnnData objects to be merged
- eggplant.preprocess.joint_highly_variable_genes(adatas: Union[List[AnnData], Dict[str, AnnData]], **kwargs) None [source]
- eggplant.preprocess.match_scales(adata: AnnData, reference: Union[ndarray, Reference]) None [source]
match scale between observed and spatial domains
Simple scaling with a single value based on the distances between landmarks.
- eggplant.preprocess.reference_to_grid(ref_img: Union[Image, str], n_approx_points: int = 1000.0, background_color: Union[str, ndarray, tuple] = 'white', n_regions: int = 1) Tuple[ndarray, ndarray] [source]
convert image to grid of observations
when creating a reference we will discretize the domain into fixed locations where feature values will be predicted
- Parameters
ref_img¶ (Union[Image.Image, str]) – PIL.Image or path of/to reference image
n_approx_points¶ (int = 1e3,) – approximate number of points to include in the discretized grid. The number of grid points will be in the magnitude of the provided number, defaults to 1000.
background¶ – background color of reference image, all elements with this color will be excluded. Can be either an array/tuple of RGB values as well as matplotlib color strings. Defaults to “white”.
n_regions¶ (int = 1,) – number of regions (indicated by different colors) contained in the reference.
- Returns
A tuple where the first element is an n_obs x 2 array representing the coordinates of each grid point. Second element is a n_obs numeric vector where the i:th element indicates the region that the i:th observation belongs to.
- Return type
Tuple[np.ndarray,np.ndarray]
- eggplant.preprocess.spatial_smoothing(adata: AnnData, distance_key: str = 'spatial', n_neigh: int = 4, coord_type: Union[str, CoordType] = 'generic', sigma: float = 50, **kwargs) None [source]
spatial smoothing function
- Parameters
adata¶ (ad.AnnData,) – AnnData object holding data to be smoothed
distance_key¶ (str) – key holding spatial coordinates in .obsm, defaults to spatial
n_neigh¶ (int) – number of neighbors to use for smoothing, defaults to 4
coord_type¶ (Union[str, CoordType],) – type of coordinates, see squidpy documentation for more information, defaults to “generic”.
sigma¶ (float = 50,) – sigma value to use in smoothing, higher values gives higher influence to far away points on a given grid point.
eggplant.sdea module
- eggplant.sdea.get_sde_features(data: Union[AnnData, Reference], group_by: str = 'model', compare: str = 'feature', labels: Optional[str] = None, n_features: Optional[int] = None) Dict[str, Series] [source]
Get spatially differentially (SD)genes
will identify genes that exhibit different spatial distributions between two different conditions.
:param : data :type _sphinx_paramlinks_eggplant.sdea.get_sde_features.data: Union[ad.AnnData,”m.Reference”] :param _sphinx_paramlinks_eggplant.sdea.get_sde_features.group_by: :type _sphinx_paramlinks_eggplant.sdea.get_sde_features.group_by: str :param _sphinx_paramlinks_eggplant.sdea.get_sde_features.compare: :type _sphinx_paramlinks_eggplant.sdea.get_sde_features.compare: str :param _sphinx_paramlinks_eggplant.sdea.get_sde_features.labels: :type _sphinx_paramlinks_eggplant.sdea.get_sde_features.labels: Optional[str] :param _sphinx_paramlinks_eggplant.sdea.get_sde_features.n_features: :type _sphinx_paramlinks_eggplant.sdea.get_sde_features.n_features: Optional[int]
- eggplant.sdea.mixed_normal(mus: ndarray, vrs: ndarray, ws: Optional[ndarray] = None) Tuple[ndarray, ndarray] [source]
mean and var for weighted mixed normal.
For n distributions \(N_i(\mu_i,\sigma^2_i)\) we compute the mean and variance for the new weighted mix:
\(\mu_{new} = \sum_{i=1}^n w_i\mu_i\) \(\sigma^2_{new} = \sum_{i=1}^n w_i^2\sigma^2_i\)
- Parameters
- Returns
a tuple being \((\mu_{new},\sigma^2_{new})\)
- Return type
Tuple[np.ndarray,np.ndarray]
- eggplant.sdea.sdea(data: Union[AnnData, Reference], group_col: str, n_std: int = 2, subset: Optional[Dict[str, Any]] = None, weights: Optional[ndarray] = None) Dict[str, Dict[str, ndarray]] [source]
spatial differential expression analysis
conduct spatial differential expression analysis (sDEA)
- Parameters
data¶ (Union[ad.AnnData, "m.Reference"]) – object (either anndata or reference) containing the spatial profiles to be compared.
group_col¶ (str) – column to make comparison with respect to
n_std¶ (int) – number of standard deviations that should be used when testing for differential expression. If the interval mean_1 +/- n_std*std_1 overlaps with the interval mean_2 +/- n_std*std_2 the features are considered as non-differentially expressed, defaults to 2
subset¶ (Optional[Dict[str, Any]]) – subset groups in the contrastive analysis, for example subset={feature:value} will only compare those profiles where the value of feature is value, defaults to no subsampling
weights¶ (Optional[np.ndarray]) – n_samples vector of weights, where the i:th value of the vector indicates the weight that should be assigned to each sample in the sdea analysis, default to 1/n_samples.
- Returns
a dictionary where each analyzed feature is an entry, and each entry is a dictionary with two values: diff being the spot-wise difference between the samples, and sig being an indicator of whether the difference is significant or not.
- Return type
Dict[str, Dict[str, np.ndarray]]
- eggplant.sdea.test_region_wise_enrichment(data: Union[AnnData, Reference], feature: str, region_1: Union[str, int], region_2: Union[str, int], include_models: Union[List[str], str] = 'composite', col_name: str = 'region', feature_col: str = 'feature', alpha: float = 0.05, n_permutations: Optional[int] = None) Dict[str, Dict[str, Union[float, bool, str]]] [source]
region-wise enrichment test
This function tests whether feature is higher expressed in region_1 compared to region_2 using a permutation test.
- Parameters
data¶ (Union[ad.AnnData, "m.Reference"]) – object containing feature data
feature¶ (str,) – feature to inspect
region_1¶ (Union[str, int]) – label of region 1
region_2¶ (Union[str, int]) – label of region 2
include_models¶ (Union[List[str], str]) – models to include, defaults to composite
col_name¶ (str) – column name on adata.obs that indicates region label, defaults to region
feature_col¶ (str) – column name in adata.var that indicates feature, defaults to feature
alpha¶ (float) – significance level, defaults to 0.01
n_permutations¶ (Optional[int]) – number of permutations to perform. 1/alpha must be larger than n_permutations, otherwise an exception will be raised. Defaults to 1 / alpha.
- Returns
Dictionary with result of permutation test. The keys are: - pvalue : caluclated pvalue - is_sig : whether the result is considered significant or not - feature : name of feature that was examined
- Return type
Dict[str, Dict[str, Union[float, bool, str]]]