eggplant package

eggplant.methods module

class eggplant.methods.PoissonDiscSampler(crd: ndarray, min_dist: float, seed: Optional[int] = None)[source]

Bases: object

Poisson Disc Sampler Designed according to the principles outlined in Bridson [Bri07] but for d=2.

add_k_in_annulus(point: Union[ndarray, Tuple[float, float]], k: int = 5) → ndarray[source]

adds k points randomly in annulus around point

Parameters

point¶ (Union[np.ndarray, Tuple[float, float]],) – point to create annulus around
k¶ (int) – number of points to add to annulus, default 5

Returns

array of coordinates with added points

Return type

np.ndarray

coord_to_cell(point: Union[ndarray, Tuple[float, float]]) → Tuple[int, int][source]

helper function, transforms coordinates: to cell id (in grid)

Parameters: point¶ (Union[np.ndarray, Tuple[float, float]]) – coordinates of point to get cell id for
Returns: a tuple (i,j) where i is the grid row index and j is the column index.
Return type: Tuple[int,int]

get_neighbors(idx: Tuple[int, int]) → List[Tuple[int, int]][source]

get neighbors in grid

Note: includes self.

Parameters: idx¶ (Tuple[int,int]) – index of cell to get neighbors of
Returns: list of neighbor indices
Return type: List[Tuple[int,int]]

sample(max_points: Optional[int] = None, k: Optional[int] = 4) → ndarray[source]

sample from domain

Parameters

max_points¶ (Optional[int]) – max number of points to sample from domain.
k¶ (Optional[int]) – number of points to position in annulus, default 4

Returns

array of samples from domain

Return type

np.ndarray

eggplant.methods.estimate_n_landmarks(adatas: Union[AnnData, List[AnnData], Dict[str, AnnData]], n_max_lmks: int = 50, n_min_lmks: Optional[int] = 1, n_evals: int = 10, n_reps: int = 3, feature: Optional[str] = None, layer: Optional[str] = None, device: Literal['cpu', 'gpu'] = 'cpu', n_epochs: int = 1000, learning_rate: float = 0.01, subsample: Optional[Union[float, int]] = None, verbose: bool = False, spatial_key: str = 'spatial', max_cg_iterations: int = 1000, tail_length: int = 200, seed: int = 1, spread_distance: Optional[float] = None, diameter_multiplier: float = 10) → Tuple[ndarray, Union[Dict[str, List[float]], List[float]], Optional[Union[List[float], Dict[str, float]]]][source]

Estimate how landmark numbers influence outcome

Parameters

adatas¶ (Union[ad.AnnData, List[ad.AnnData], Dict[str, ad.AnnData]]) – Single AnnData file or list or dictionary with AnnDatas to be analyzed.
n_max_lmks¶ (Union[float, int] = 50) – max number of landmarks to include in the analysis.
n_evals¶ (int) – number of evaluations. The number of lansmarks tested will be equally spaced in the interval [1,n_max_lmks], defaults to 10.
layer¶ (Optional[str]) – which layer to use
device¶ (Literal["cpu", "gpu"]) – which device to perform computations on, defaults to “cpu”
n_epochs¶ (int) – number of epochs to use when learning the relationship between landmark distance and feature values, defaults to 1000.
rate¶ (learning) – learning rate to use in optimization, defaults to 0.01.
subsample¶ (Optional[Union[float, int]]) – whether to subsample the data or not. If a value less than 1 is given, then it’s interpreted as a fraction of the total number of observations, if larger than zero as absolute number of observations to keep. If exactly 1 or None, no subsampling will occur. Note, landmarks are selected before subsampling. Defaults to None.
verbose¶ (bool) – set to True to use verbose mode, defaults to False.
spatial_key¶ (str) – key to use to extract spatial coordinates from the obsm attribute. Defaults to “spatial”.
max_cg_iterations¶ (int) – The maximum number of conjugate gradient iterations to perform (when computing matrix solves). A higher value rarely results in more accurate solves – instead, lower the CG tolerance (from GPyTorch documentation), defaults to 1000.
tail_length¶ (int) – the last tail_length observations will be used to compute an average MLL value. If n_epochs are less than tail_length, all epochs will be used instead. Defaults to 50.
seed¶ (int) – value of random seed, defaults to 1.
spread_distance¶ (Optional[float]) – distance between points in Poisson Disc Sampling. Equivalent to min_dist.
diameter_multiplier¶ (float) – applicable to assays where the

Returns

A tuple with a vector listing the number of landmarks used in each evaluation as first element and as second the corresponding average MLL values.

Return type

Tuple[np.ndarray, Union[Dict[str, List[float]], List[float]]]

eggplant.methods.fa_transfer_to_reference(adatas: Union[AnnData, List[AnnData], Dict[str, AnnData]], reference: Reference, variance_threshold: float = 0.3, n_components: Optional[int] = None, use_highly_variable: bool = False, layer: Optional[str] = None, device: Literal['cpu', 'gpu', 'cuda'] = 'cpu', n_epochs: int = 1000, learning_rate: float = 0.01, subsample: Optional[Union[float, int]] = None, verbose: bool = False, return_models: bool = False, return_losses: bool = True, max_cg_iterations: int = 1000, meta_key: str = 'meta', inference_method: Literal['exact', 'variational'] = 'exact', **kwargs) → Dict[str, Union[List[Union[GPModelExact, GPModelApprox]], List[ndarray]]][source]

fast approximate transfer of observed data to a reference

similar to transfer_to_reference, but designed for fast approximate transfer of the full set of features. To speed up the transfer process we project the data into a low-dimensional space, transfer this representation, and then reconstruct the original data. This significantly reduces the number of features that need to be transferred to the reference, but comes at the cost of only an approximate representation, which depends on either of the specified variance_threshold or n_components parameters.

Parameters

adatas¶ (Union[ad.AnnData, List[ad.AnnData], Dict[str, ad.AnnData]]) – AnnData objects holding data to transfer
reference¶ (m.Reference) – reference to transfer data to
variance_threshold¶ (float) – fraction of variance that principal components should explain
n_components¶ (Optional[int]) – use instead of variance_threshold. If specified, exactly n_components will be used.
use_highly_variable¶ (bool) – only use highly_variable_genes to compute the principal components. Default is False.
layer¶ (Optional[str]) – which layer to extract data from, defaults to raw
device¶ (Litreal["cpu","gpu","cuda"]) – device to use for computations, defaults to “cpu”
n_epochs¶ (int) – number of epochs to use, defaults to 1000
learning_rate¶ (float) – learning rate, defaults to 0.01
subsample¶ – if <= 1 then interpreted of fraction of observations,

to keep. If > 1 interpreted as number of observations to keep in sumbsampling, defaults to None (no sumbsampling) :type _sphinx_paramlinks_eggplant.methods.fa_transfer_to_reference.subsample: Optional[Union[float, int]] = None, :param _sphinx_paramlinks_eggplant.methods.fa_transfer_to_reference.verbose: set to true to use verbose mode, defaults to True :type _sphinx_paramlinks_eggplant.methods.fa_transfer_to_reference.verbose: bool :param _sphinx_paramlinks_eggplant.methods.fa_transfer_to_reference.return_models: set to True to return fitted models, defaults to False :type _sphinx_paramlinks_eggplant.methods.fa_transfer_to_reference.return_models: bool :param _sphinx_paramlinks_eggplant.methods.fa_transfer_to_reference.return_losses: return loss history of each model, defaults to True :type _sphinx_paramlinks_eggplant.methods.fa_transfer_to_reference.return_losses: bool :param _sphinx_paramlinks_eggplant.methods.fa_transfer_to_reference.max_cg_iterations: The maximum number of conjugate gradient iterations to perform (when computing matrix solves). A higher value rarely results in more accurate solves – instead, lower the CG tolerance (from GPyTorch documentation), defaults to 1000. :type _sphinx_paramlinks_eggplant.methods.fa_transfer_to_reference.max_cg_iterations: int = 1000, :param _sphinx_paramlinks_eggplant.methods.fa_transfer_to_reference.meta_key: key in uns slot that holds additional meta info :type _sphinx_paramlinks_eggplant.methods.fa_transfer_to_reference.meta_key: str

eggplant.methods.fit(model: Union[GPModelExact, GPModelApprox], n_epochs: int, optimizer: Optional[Optimizer] = None, fast_computation: bool = True, learning_rate: float = 0.01, verbose: bool = False, seed: int = 0, batch_size: Optional[int] = None, progress_message: Optional[str] = None, **kwargs) → None[source]

fit GP Model

Parameters

model¶ (m.GPModel,) – Model to fit
n_epochs¶ (int) – number of epochs
optimizer¶ (Optional[t.optim.Optimizer]) – optimizer to use during fitting, defaults to Adams
fast_computation¶ (bool = True,) – whether to use fast approximations to functions, defaults to True
learning_rate¶ (float) – learning rate, defaults to 0.01
verbose¶ (bool = False,) – set to True for verbose mode, prints progress, defaults to True
seed¶ (int) – random seed, defaults to 0
batch_size¶ (int) – Batch size. Defaults to None.
progress_message¶ (str) – message to include in progress bar

eggplant.methods.landmark_lower_bound(n_lmks: ndarray, losses: Union[List[ndarray], ndarray, Dict[str, ndarray]], kneedle_s_param: float = 1) → float[source]

automatic identification of lower bound

based on result from estimate_n_landmarks().

eggplant.methods.transfer_to_reference(adatas: Union[AnnData, List[AnnData], Dict[str, AnnData]], features: Union[str, List[str]], reference: Reference, layer: Optional[str] = None, device: Literal['cpu', 'gpu', 'cuda'] = 'cpu', n_epochs: int = 1000, learning_rate: float = 0.01, subsample: Optional[Union[float, int]] = None, verbose: bool = False, return_models: bool = False, return_losses: bool = True, max_cg_iterations: int = 1000, meta_key: str = 'meta', inference_method: Union[Literal['exact', 'variational'], List[Literal['exact', 'variational']], Dict[str, Literal['exact', 'variational']]] = 'exact', n_inducing_points: Optional[int] = None, batch_size: Optional[int] = None, **kwargs) → Dict[str, Union[List[Union[GPModelExact, GPModelApprox]], List[ndarray]]][source]

transfer observed data to a reference

Parameters

adatas¶ (Union[ad.AnnData, List[ad.AnnData], Dict[str, ad.AnnData]]) – AnnData objects holding data to transfer
features¶ (Union[str, List[str]]) – name of feature(s) to transfer
reference¶ (m.Reference) – reference to transfer data to
layer¶ (Optional[str]) – which layer to extract data from, defaults to raw
device¶ (Litreal["cpu","gpu"]) – device to use for computations, defaults to “cpu”
n_epochs¶ (int) – number of epochs to use, defaults to 1000
learning_rate¶ (float) – learning rate, defaults to 0.01
subsample¶ (Optional[Union[float, int]] = None,) – if <= 1 then interpreted of fraction of observations, to keep. If > 1 interpreted as number of observations to keep in sumbsampling, defaults to None (no sumbsampling)
verbose¶ (bool) – set to true to use verbose mode, defaults to True
return_models¶ (bool) – set to True to return fitted models, defaults to False
return_losses¶ (bool) – return loss history of each model, defaults to True
max_cg_iterations¶ (int = 1000,) – The maximum number of conjugate gradient iterations to perform (when computing matrix solves). A higher value rarely results in more accurate solves – instead, lower the CG tolerance (from GPyTorch documentation), defaults to 1000.
meta_key¶ (str) – key in uns slot that holds additional meta info
inference_method¶ (Union[Literal["exact", "variational"], List[Literal["exact", "variational"]], Dict[str, Literal["exact", "variational"]]]) – which inference method to use, the options are “exact” and “variational” - use “variational” for large data sets. If a single string is given then this method is applied to all objects in the data set, otherwise a list or dict specifying the inference method for each object can be given. Defaults to “exact”.
n_inducing_points¶ (int) – number of inducing points to use in variational inference. Defaults to None, which will render 10% of observations in observed data if n_obs < 1e5 else 10000.
batch_size¶ (int) – batch_size to use in approximate (variational) inference. Must be less than or equal to the number of observations in a sample.

eggplant.models module

class eggplant.models.BaseGP(landmark_distances: Union[Tensor, DataFrame, ndarray], feature_values: Tensor, landmark_names: Optional[List[str]] = None, mean_fun: Optional[Mean] = None, kernel_fun: Optional[Kernel] = None, device: Literal['cpu', 'gpu'] = 'cpu')[source]

Bases: object

BaseModel for GP Regression

should be combined with one of gpytorch.models models, e.g., ApproximateGP or ExactGP

__init__(landmark_distances: Union[Tensor, DataFrame, ndarray], feature_values: Tensor, landmark_names: Optional[List[str]] = None, mean_fun: Optional[Mean] = None, kernel_fun: Optional[Kernel] = None, device: Literal['cpu', 'gpu'] = 'cpu') → None[source]

Constructor method

Parameters

landmark_distance¶ (Union[t.Tensor, pd.DataFrame, np.ndarray]) – n_obs x n_landmarks array with distance to each landmark for every observation
feature_values¶ (t.Tensor) – n_obs x n_feature array with feature values for each observation
mean_fun¶ (gp.means.Mean, optional) – mean function
kernel¶ (gp.kernels.Kernel, optional) – Kernel to use in covariance function
device¶ (Literal["cpu","gpu"]) – device to execute operations on, defaults to “cpu”

property loss_history: List[float]: Loss history record

class eggplant.models.GPModelApprox(landmark_distances: Union[Tensor, DataFrame, ndarray], feature_values: Tensor, inducing_points: Union[Tensor, DataFrame, ndarray], landmark_names: Optional[List[str]] = None, likelihood: Optional[Likelihood] = None, mean_fun: Optional[Mean] = None, kernel_fun: Optional[Kernel] = None, device: Literal['cpu', 'gpu'] = 'cpu', learn_inducing_points: bool = True)[source]

Bases: BaseGP, ApproximateGP

__init__(landmark_distances: Union[Tensor, DataFrame, ndarray], feature_values: Tensor, inducing_points: Union[Tensor, DataFrame, ndarray], landmark_names: Optional[List[str]] = None, likelihood: Optional[Likelihood] = None, mean_fun: Optional[Mean] = None, kernel_fun: Optional[Kernel] = None, device: Literal['cpu', 'gpu'] = 'cpu', learn_inducing_points: bool = True) → None[source]

Constructor method for approximate (variational) inference using inducing points

Parameters

landmark_distance¶ (Union[t.Tensor, pd.DataFrame, np.ndarray]) – n_obs x n_landmarks array with distance to each landmark for every observation
feature_values¶ (t.Tensor) – n_obs x n_feature array with feature values for each observation
inducing_points¶ (t.Tensor) – points to use as inducing points, if learn_inducing_points = True these act as intialization of inducing_points.
likelihood¶ (gp.likelihoods.Likelihood, optional) – likelihood function
mean_fun¶ (gp.means.Mean, optional) – mean function
kernel¶ (gp.kernels.Kernel, optional) – Kernel to use in covariance function
device¶ (Literal["cpu","gpu"]) – device to execute operations on, defaults to “cpu”
learn_inducing_points¶ (bool) – whether or not to treat inducing points as parameters to be learnt. Default is True.

forward(x: tensor) → tensor[source]: forward step in prediction

training: bool

class eggplant.models.GPModelExact(landmark_distances: Union[Tensor, DataFrame, ndarray], feature_values: Tensor, landmark_names: Optional[List[str]] = None, likelihood: Optional[Likelihood] = None, mean_fun: Optional[Mean] = None, kernel_fun: Optional[Kernel] = None, device: Literal['cpu', 'gpu'] = 'cpu')[source]

Bases: BaseGP, ExactGP

__init__(landmark_distances: Union[Tensor, DataFrame, ndarray], feature_values: Tensor, landmark_names: Optional[List[str]] = None, likelihood: Optional[Likelihood] = None, mean_fun: Optional[Mean] = None, kernel_fun: Optional[Kernel] = None, device: Literal['cpu', 'gpu'] = 'cpu') → None[source]

Constructor method for exact inference

Parameters

landmark_distance¶ (Union[t.Tensor, pd.DataFrame, np.ndarray]) – n_obs x n_landmarks array with distance to each landmark for every observation
feature_values¶ (t.Tensor) – n_obs x n_feature array with feature values for each observation
likelihood¶ (gp.likelihoods.Likelihood, optional) – likelihood function
mean_fun¶ (gp.means.Mean, optional) – mean function
kernel¶ (gp.kernels.Kernel, optional) – Kernel to use in covariance function
device¶ (Literal["cpu","gpu"]) – device to execute operations on, defaults to “cpu”

forward(x: tensor) → tensor[source]: forward step in prediction

training: bool

class eggplant.models.Reference(domain: Union[tensor, ndarray], landmarks: Union[tensor, ndarray, DataFrame], meta: Optional[Union[DataFrame, dict]] = None)[source]

Bases: object

Reference Container

__init__(domain: Union[tensor, ndarray], landmarks: Union[tensor, ndarray, DataFrame], meta: Optional[Union[DataFrame, dict]] = None) → None[source]

Constructor function

Parameters

domain¶ (Union[t.tensor, np.ndarray]) – n_obs x n_dims spatial coordinates of observations
landmarks¶ (Union[t.tensor, np.ndarray, pd.DataFrame]) – n_landmarks x n_dims spatial coordinates of landmarks
meta¶ (Optional[Union[pd.DataFrame, dict]], optional) – n_obs x n_categories meta data

clean() → None[source]: clean reference from transferred data

composite_representation(by: str = 'feature')[source]

produce composite representation

Parameters: by¶ (str, default to "feature") – consensus representation with respect to this meta data feature

plot(models: Optional[Union[List[str], str]] = None, *args, **kwargs) → None[source]

quick plot function

Parameters

models¶ (Union[List[str], str], optional) – models to be visualized, if None then all are displayed
*args¶ – args to sc.pl.spatial
**kwargs¶ – kwargs to sc.pl.spatial

transfer(models: Union[BaseGP, List[BaseGP]], meta: Optional[DataFrame] = None, names: Optional[Union[List[str], str]] = None) → None[source]

transfer fitted models to reference

Parameters

models¶ (Union[GPModel, List[GPModel]]) – Models to be transferred
meta¶ (Optional[pd.DataFrame], optional) – model meta data, e.g., sample
names¶ (Optional[Union[List[str], str], optional) – name of models

eggplant.plot module

class eggplant.plot.ColorMapper(cmap: Dict[T, str])[source]

Bases: object

helper class for colormaps

makes it easier to get color values for arrays and lists.

eggplant.plot.distplot_transfer(ref: Reference, inside: Dict[str, str], outside: Optional[Dict[str, str]] = None, n_cols: Optional[int] = None, n_rows: Optional[int] = None, side_size: float = 4.0, swarm_marker_style: Optional[Dict[str, Any]] = None, mean_marker_style: Optional[Dict[str, Any]] = None, display_grid: bool = True, title_fontsize: float = 25, label_fontsize: float = 20, ticks_fontsize: float = 15, return_figures: bool = True) → Optional[Tuple[Figure, Axes]][source]

Swarmplot-like visualization of enrichment

Parameters

ref¶ ("m.Reference",) – Reference holding transferred data
outside¶ (Optional[Dict[str, str]]) – attribute to compare features within. If None all inside features will be compared together.
inside¶ (Dict[str, str],) – feature to compare within outer attribute
n_cols¶ (Optional[int]) – number of columns, defaults to None
n_rows¶ (Optional[int] = None,) – number of rows, defaults to None
side_size¶ (float) – size of each outer panel, defaults to 4
swarm_marker_style¶ (Optional[Dict[str, Any]]) – data marker style, defaults to None
mean_marker_style¶ (Optional[Dict[str, Any]]) – marker style of mean indicator, defaults to None
display_grid¶ (bool) – set to True if grid shall be displayed in background, defaults to True
title_fontsize¶ (float) – fontsize of title, defaults to 25
label_fontsize¶ (float) – fontsize of x-and ylabel, defaults to 20
ticks_fontsize¶ (float) – fontisize of x-and yticks, defaults to 15
return_figures¶ (bool) – set to True if Figure and Axes objexts should be returned, defaults to True

Returns

None or Figure and Axes objects, depending on return_figures value.

Return type

Union[None,Tuple[plt.Figure,plt.Axes]]

eggplant.plot.landmark_diagnostics(lmk_eval_res: Tuple[List[List[float]], Union[List[float], Dict[str, float]]], side_size: Optional[Union[Tuple[float, float], float]] = None, title_fontsize: float = 20, lower_bound: Optional[Union[List[float], Dict[str, float]]] = None, line_style_dict: Optional[Dict[str, Any]] = None, label_style_dict: Optional[Dict[str, Any]] = None, ticks_style_dict: Optional[Dict[str, Any]] = None, return_figures: bool = False, savgol_params: Optional[Dict[str, Any]] = None) → Optional[Tuple[Figure, Axes]][source]

Parameters

lmk_eval_res¶ (Tuple[List[List[float]], Union[List[float], Dict[str, float]]]) – output from :fun:`~eggplant.methods.estimate_n_landmarks`
side_size¶ (Optional[Union[Tuple[float, float], float]] = None,) – side size of plot, if None then based on data, defaults to None
title_fontsize¶ (float) – fontsize of plot title, default 20
lower_bound¶ (Optional[Union[Dict[str, float], List[float]]]) – lower_bound value (for number of landmarks), will be indicated by a black dashed line
line_style_dict¶ – dictionary with style parameter for :fun:`~matplotlib.pyplot.plot`,

default None :type _sphinx_paramlinks_eggplant.plot.landmark_diagnostics.line_style_dict: Optional[Dict[str, Any]] :param _sphinx_paramlinks_eggplant.plot.landmark_diagnostics.label_style_dict: dictionary with

style parameters for :fun:`~matplotlib.pyplot.xlabel`, and ylabel, default None

Parameters

ticks_style_dict¶ (Optional[Dict[str, Any]]) – dictionary with style parameters for xaxis.set_tick_params and yaxis.set_tick_params
return_figures¶ (bool) – set to true to return figure, otherwise only show, default False
savgol_params¶ (Dict[str, Any]) – parameters to scipy.signal.savgol_filter, default parameters polyorder=4 and window_length=5

Returns

plot of nMLL loss against the number of landmarks

Return type

Optional[Tuple[plt.Figure, plt.Axes]]

eggplant.plot.model_diagnostics(models: Optional[Union[Dict[str, m.GPModel], m.GPModel]] = None, losses: Optional[Union[Dict[str, ndarray], ndarray]] = None, n_cols: int = 5, width: float = 5, height: float = 3, return_figures: bool = False) → Optional[Tuple[Figure, axes]][source]

plot loss history for models

can take either a set of models or losses.

Parameters

models¶ (Optional[Union[Dict[str, "m.GPModel"], "m.GPModel"]] = None,) – models to investigate
losses¶ (Optional[Union[Dict[str, np.ndarray], np.ndarray]] = None,) – losses to visualize
n_cols¶ (int) – number of columns, defaults to 5
width¶ (float) – width of each subplot panel (visualizing one model’s loss over time), defaults to 5
height¶ (float = 3,) – height of each subplot panel (visualizing one model’s loss over time), defaults to 3
return_figures¶ (bool = False,) – set to True if Figure and Axes objects should be returned, defaults to False

Returns

None or Figure and Axes objects, depending on return_figure value.

Return type

Optional[Tuple[plt.Figure,plt.Axes]]

eggplant.plot.visualize_landmark_spread(adata: AnnData, feature: Optional[str] = None, layer: Optional[str] = None, spread_distance: Optional[float] = None, diameter_multiplier: float = 1, side_size: float = 5, marker_size: float = 20, landmark_marker_size: Optional[float] = None, label_fontsize: float = 10, title_fontsize: float = 20, seed: int = 1, text_h_offset: Optional[float] = None, bbox_style: Optional[Dict[str, Any]] = None, return_figures: bool = False) → Optional[Tuple[Figure, Axes]][source]

function to visualize landmark distribution with given set of parameters. Here Poisson Disc Sampling (PSD) is used to randomly distribute the landmarks within the spatial domain.

Parameters

adata¶ (ad.AnnData) – data to visualize
feature¶ (Optional[str]) – feature to visualize, if None then (normalized) sum across all features, defaults to None
layer¶ (Optional[str]) – layer to use for feature extraction, defaults to None
spread_distance¶ (Optional[float]) – see spread_distance
diameter_multiplier¶ (float) – see diameter_multiplier
side_size¶ (float) – side size of plot, defaults to 5
marker_size¶ (float) – marker size of spatial locations, defaults to 20
landmark_marker_size¶ (Optional[float]) – size of landmark indicators, if None then twice that of marker_size, defaults to None
label_fontsize¶ (float) – fontsize of landmark enumeration labels, defaults to 10
title_fontsize¶ (float) – fontsize of title, defaults to 20
seed¶ (int) – random seed (numpy)
text_h_offset¶ (Optional[float]) – horizontal distance between landmark enumeration label and landmark marker, if None then spread_distance / 5, defaults to None
bbox_style¶ (Optional[Dict[str, Any]]) – see bbox argument for matplotlib.pyplot.text()
return_figures¶ (bool) – return figure and axes objects, defaults to False

Returns

Figure and Axes object if return_figures=True, else plots the landmarks in the spatial domain.

Return type

Optional[Tuple[plt.Figure, plt.Axes]]

eggplant.plot.visualize_observed(adatas: Union[Dict[str, AnnData], List[AnnData]], features: Optional[Union[List[str], str]], layer: Optional[str] = None, **kwargs) → None[source]

Visualize observed data to be transferred

Parameters

adatas¶ (Union[Dict[str,ad.AnnData],List[ad.AnnData]]) – List or dictionary of AnnData objects holding the data to be transferred
features¶ (Union[str,List[str]]) – Name of feature to be visualized
n_cols¶ (Optional[int]) – number of desired colums
n_rows¶ (Optional[int]) – number of desired rows
marker_size¶ (float) – scatter plot marker size
show_landmarks¶ (bool) – show landmarks in plot
landmark_marker_size (_sphinx_paramlinks_eggplant.plot.visualize_observed.) – size of landmarks
side_size¶ (float) – side size for each figure sublot
landmark_cmap¶ (Optional[Dict[int,str]], optional) – colormap for landmarks
share_colorscale¶ (bool) – set to true if subplots should all have the same colorscale
return_figures¶ (bool) – set to true if figure and axes objects should be returned
include_colorbar¶ (bool) – set to true to include colorbar
separate_colorbar¶ (bool) – set to true if colorbar should be plotted in separate figure, only possible when share_colorscale = True
colorbar_orientation¶ (str) – choose between ‘horizontal’ and ‘vertical’ for orientation of colorbar
include_title¶ (bool) – set to true to include title
fontsize¶ (str) – font size of title
hspace¶ (Optional[float]) – height space between subplots. If none then default matplotlib settings are used.
wspace¶ (Optional[float]) – width space between subplots. If none then default matplotlib settings are used.
quantile_scaling¶ (bool) – set to true to use quantile scaling. Can help to minimize quenching effect of outliers
flip_y¶ (bool) – set to true if y-axis should be flipped
colorbar_fontsize¶ (float) – fontsize of colorbar ticks
show_image¶ (bool) – show tissue image in background.

Returns

None or Figure and Axes objects, depending on return_figure value.

Return type

Optional[Tuple[plt.Figure,plt.Axes]]

eggplant.plot.visualize_sdea_results(ref: m.GPModel, dge_res: Dict[str, ndarray], cmap: str = 'RdBu_r', n_cols: int = 4, marker_size: float = 10, side_size: float = 8, title_fontsize: float = 20, colorbar_fontsize: float = 20, colorbar_orientation: Literal['horizontal', 'vertical'] = 'horizontal', no_sig_color: str = 'lightgray', reorder_axes: Optional[List[int]] = None, return_figures: bool = False) → Optional[Tuple[Figure, Axes]][source]

Visualize result from spatial differential expression analysis

Parameters

ref¶ (m.GPModel) – reference object of type Reference, information should have been transferred to the object.
dge_res¶ (Dict[str, np.ndarray],) – result from sdea()
cmap¶ (str) – colormap to use (choose from matplotlib), defaults to RdBu_r
n_cols¶ (int) – number of columns, defaults to 4
marker_size¶ (float) – size of marker, defaults to 10
title_fontsize¶ (float) – fontsize of title, defaults to 20
colorbar_fontsize¶ – fontsize of colorbar ticks, defaults to 20
colorbar_orientation¶ (Literal["horizontal", "vertical"]) – orientation of colorbar, defaults to horizontal
no_sig_color¶ (str) – color of locations with non-significant, difference in expression, defaults to lightgray
reorder_axes¶ (Optional[List[int]]) – new order of axes, original order is [0, 1, 2,..], give new new order as [1, 0, 2,…] to switch place of subplot 1 and 0.
return_figures¶ (Union[Tuple[plt.Figure, plt.Axes],None]) – return figure and axes objects. Default value is False.

Colorbar_fontsize

float

Returns

Figure and Axes objects

Return type

Tuple[plt.Figure,plt.Axes]

eggplant.plot.visualize_transfer(reference: Union[Reference, AnnData], attributes: Optional[Union[List[str], str]] = None, layer: Optional[str] = None, **kwargs) → Optional[Tuple[Figure, Axes]][source]

Visualize results after transfer to reference

Parameters

reference¶ (Union[m.Reference,as.AnnData]) – reference object or AnnData holding transferred values
attributes¶ (Optional[Union[List[str],str]]) – visualize transferred models with these attributes. Must be found in .var slot. If none specified, all transfers will be visualized.
layer¶ (str) – name of layer to use
n_cols¶ (Optional[int]) – number of desired colums
n_rows¶ (Optional[int]) – number of desired rows
marker_size¶ (float) – scatter plot marker size
show_landmarks¶ (bool) – show landmarks in plot
landmark_marker_size (_sphinx_paramlinks_eggplant.plot.visualize_transfer.) – size of landmarks
side_size¶ (float) – side size for each figure sublot
landmark_cmap¶ (Optional[Dict[int,str]], optional) – colormap for landmarks
share_colorscale¶ (bool) – set to true if subplots should all have the same colorscale
return_figures¶ (bool) – set to true if figure and axes objects should be returned
include_colorbar¶ (bool) – set to true to include colorbar
separate_colorbar¶ (bool) – set to true if colorbar should be plotted in separate figure, only possible when share_colorscale = True
colorbar_orientation¶ (str) – choose between ‘horizontal’ and ‘vertical’ for orientation of colorbar
include_title¶ (bool) – set to true to include title
fontsize¶ (str) – font size of title
hspace¶ (Optional[float]) – height space between subplots. If none then default matplotlib settings are used.
wspace¶ (Optional[float]) – width space between subplots. If none then default matplotlib settings are used.
quantile_scaling¶ (bool) – set to true to use quantile scaling. Can help to minimize quenching effect of outliers
flip_y¶ (bool) – set to true if y-axis should be flipped
colorbar_fontsize¶ (float) – fontsize of colorbar ticks
show_image¶ (bool) – show tissue image in background.

Returns

None or Figure and Axes objects, depending on return_figure value.

Return type

Optional[Tuple[plt.Figure,plt.Axes]]

eggplant.preprocess module

eggplant.preprocess.default_normalization(adata: AnnData, min_cells: float = 0.1, total_counts: float = 10000.0, exclude_highly_expressed: bool = False, compute_highly_variable_genes: bool = False, n_top_genes: int = 2000) → None[source]

default normalization recipe

the normalization strategy that applied for a majority of the analyses presented in the original manuscript. We abstain from calling it a recommended strategy, as the best strategy is depends on your data. However, this strategy have worked well with several data types.

The recipe is based on preprocessing functions from the scanpy.preprocess module and is given as follows:

sc.pp.filter_genes(adata, min_cells=min_cells)
sc.pp.normalize_total(adata,total_counts,
exclude_highly_expressed=exclude_highly_expressed)
sc.pp.log1p(adata)
sc.pp.scale(adata)

Parameters

adata¶ (ad.AnnData,) – anndata object to normalize
min_cells¶ (float = 0.1,) – argument to scanpy.preprocess.filter_genes()
total_counts¶ (float) – argument to scanpy.preprocess.normalize_total(), default is 1e4
exclude_highly_expressed¶ (bool) – argument to scanpy.preprocess.normalize_total(), default False

eggplant.preprocess.get_landmark_distance(adata: AnnData, landmark_position_key: str = 'curated_landmarks', landmark_distance_key: str = 'landmark_distances', reference: Optional[Union[Reference, ndarray]] = None, **kwargs) → None[source]

compute landmark distances

Parameters

adata¶ (ad.AnnData) – AnnData object where distance between landmarks and observations should be measured
landmark_position_key¶ (str) – key of landmark coordinates, defaults to “curated_landmarks
landmark_position_key¶ – key to use for landmark distances in .obsm, defaults to “landmark_distances”
reference¶ (Optional[Union[m.Reference, np.ndarray]]) – provide reference if non-homogeneous distortions should be corrected for using TPS (thin plate splines)

eggplant.preprocess.intersect_features(adatas: Union[List[AnnData], Dict[str, AnnData]]) → None[source]

eggplant.preprocess.join_adatas(adatas: List[AnnData], **kwargs) → None[source]

join together a set of AnnData objects

Parameters: adatas¶ (List[ad.AnnData]) – AnnData objects to be merged

eggplant.preprocess.joint_highly_variable_genes(adatas: Union[List[AnnData], Dict[str, AnnData]], **kwargs) → None[source]

eggplant.preprocess.match_scales(adata: AnnData, reference: Union[ndarray, Reference]) → None[source]

match scale between observed and spatial domains

Simple scaling with a single value based on the distances between landmarks.

Parameters

adata¶ (ad.AnnData) – AnnData object holding observed data
reference¶ (Union[np.ndarray, "m.Reference"]) – Refernce to which observed data will be transferred

eggplant.preprocess.reference_to_grid(ref_img: Union[Image, str], n_approx_points: int = 1000.0, background_color: Union[str, ndarray, tuple] = 'white', n_regions: int = 1) → Tuple[ndarray, ndarray][source]

convert image to grid of observations

when creating a reference we will discretize the domain into fixed locations where feature values will be predicted

Parameters

ref_img¶ (Union[Image.Image, str]) – PIL.Image or path of/to reference image
n_approx_points¶ (int = 1e3,) – approximate number of points to include in the discretized grid. The number of grid points will be in the magnitude of the provided number, defaults to 1000.
background¶ – background color of reference image, all elements with this color will be excluded. Can be either an array/tuple of RGB values as well as matplotlib color strings. Defaults to “white”.
n_regions¶ (int = 1,) – number of regions (indicated by different colors) contained in the reference.

Returns

A tuple where the first element is an n_obs x 2 array representing the coordinates of each grid point. Second element is a n_obs numeric vector where the i:th element indicates the region that the i:th observation belongs to.

Return type

Tuple[np.ndarray,np.ndarray]

eggplant.preprocess.spatial_smoothing(adata: AnnData, distance_key: str = 'spatial', n_neigh: int = 4, coord_type: Union[str, CoordType] = 'generic', sigma: float = 50, **kwargs) → None[source]

spatial smoothing function

Parameters

adata¶ (ad.AnnData,) – AnnData object holding data to be smoothed
distance_key¶ (str) – key holding spatial coordinates in .obsm, defaults to spatial
n_neigh¶ (int) – number of neighbors to use for smoothing, defaults to 4
coord_type¶ (Union[str, CoordType],) – type of coordinates, see squidpy documentation for more information, defaults to “generic”.
sigma¶ (float = 50,) – sigma value to use in smoothing, higher values gives higher influence to far away points on a given grid point.

eggplant.sdea module

eggplant.sdea.get_sde_features(data: Union[AnnData, Reference], group_by: str = 'model', compare: str = 'feature', labels: Optional[str] = None, n_features: Optional[int] = None) → Dict[str, Series][source]

Get spatially differentially (SD)genes

will identify genes that exhibit different spatial distributions between two different conditions.

:param : data :type _sphinx_paramlinks_eggplant.sdea.get_sde_features.data: Union[ad.AnnData,”m.Reference”] :param _sphinx_paramlinks_eggplant.sdea.get_sde_features.group_by: :type _sphinx_paramlinks_eggplant.sdea.get_sde_features.group_by: str :param _sphinx_paramlinks_eggplant.sdea.get_sde_features.compare: :type _sphinx_paramlinks_eggplant.sdea.get_sde_features.compare: str :param _sphinx_paramlinks_eggplant.sdea.get_sde_features.labels: :type _sphinx_paramlinks_eggplant.sdea.get_sde_features.labels: Optional[str] :param _sphinx_paramlinks_eggplant.sdea.get_sde_features.n_features: :type _sphinx_paramlinks_eggplant.sdea.get_sde_features.n_features: Optional[int]

eggplant.sdea.mixed_normal(mus: ndarray, vrs: ndarray, ws: Optional[ndarray] = None) → Tuple[ndarray, ndarray][source]

mean and var for weighted mixed normal.

For n distributions \(N_i(\mu_i,\sigma^2_i)\) we compute the mean and variance for the new weighted mix:

\(\mu_{new} = \sum_{i=1}^n w_i\mu_i\) \(\sigma^2_{new} = \sum_{i=1}^n w_i^2\sigma^2_i\)

Parameters

mus¶ (np.ndarray) – mean values for each sample
vrs¶ (np.ndarray) – variance values for each sample
ws¶ (Optional[np.ndarray]) – weights to use when computing the new mean for the mixed distribution.

Returns

a tuple being \((\mu_{new},\sigma^2_{new})\)

Return type

Tuple[np.ndarray,np.ndarray]

eggplant.sdea.sdea(data: Union[AnnData, Reference], group_col: str, n_std: int = 2, subset: Optional[Dict[str, Any]] = None, weights: Optional[ndarray] = None) → Dict[str, Dict[str, ndarray]][source]

spatial differential expression analysis

conduct spatial differential expression analysis (sDEA)

Parameters

data¶ (Union[ad.AnnData, "m.Reference"]) – object (either anndata or reference) containing the spatial profiles to be compared.
group_col¶ (str) – column to make comparison with respect to
n_std¶ (int) – number of standard deviations that should be used when testing for differential expression. If the interval mean_1 +/- n_std*std_1 overlaps with the interval mean_2 +/- n_std*std_2 the features are considered as non-differentially expressed, defaults to 2
subset¶ (Optional[Dict[str, Any]]) – subset groups in the contrastive analysis, for example subset={feature:value} will only compare those profiles where the value of feature is value, defaults to no subsampling
weights¶ (Optional[np.ndarray]) – n_samples vector of weights, where the i:th value of the vector indicates the weight that should be assigned to each sample in the sdea analysis, default to 1/n_samples.

Returns

a dictionary where each analyzed feature is an entry, and each entry is a dictionary with two values: diff being the spot-wise difference between the samples, and sig being an indicator of whether the difference is significant or not.

Return type

Dict[str, Dict[str, np.ndarray]]

eggplant.sdea.test_region_wise_enrichment(data: Union[AnnData, Reference], feature: str, region_1: Union[str, int], region_2: Union[str, int], include_models: Union[List[str], str] = 'composite', col_name: str = 'region', feature_col: str = 'feature', alpha: float = 0.05, n_permutations: Optional[int] = None) → Dict[str, Dict[str, Union[float, bool, str]]][source]

region-wise enrichment test

This function tests whether feature is higher expressed in region_1 compared to region_2 using a permutation test.

Parameters

data¶ (Union[ad.AnnData, "m.Reference"]) – object containing feature data
feature¶ (str,) – feature to inspect
region_1¶ (Union[str, int]) – label of region 1
region_2¶ (Union[str, int]) – label of region 2
include_models¶ (Union[List[str], str]) – models to include, defaults to composite
col_name¶ (str) – column name on adata.obs that indicates region label, defaults to region
feature_col¶ (str) – column name in adata.var that indicates feature, defaults to feature
alpha¶ (float) – significance level, defaults to 0.01
n_permutations¶ (Optional[int]) – number of permutations to perform. 1/alpha must be larger than n_permutations, otherwise an exception will be raised. Defaults to 1 / alpha.

Returns

Dictionary with result of permutation test. The keys are: - pvalue : caluclated pvalue - is_sig : whether the result is considered significant or not - feature : name of feature that was examined

Return type

Dict[str, Dict[str, Union[float, bool, str]]]