eggplant package

eggplant.methods module

class eggplant.methods.PoissonDiscSampler(crd: ndarray, min_dist: float, seed: Optional[int] = None)[source]

Bases: object

Poisson Disc Sampler Designed according to the principles outlined in Bridson [Bri07] but for d=2.

add_k_in_annulus(point: Union[ndarray, Tuple[float, float]], k: int = 5) ndarray[source]

adds k points randomly in annulus around point

Parameters
  • point (Union[np.ndarray, Tuple[float, float]],) – point to create annulus around

  • k (int) – number of points to add to annulus, default 5

Returns

array of coordinates with added points

Return type

np.ndarray

coord_to_cell(point: Union[ndarray, Tuple[float, float]]) Tuple[int, int][source]
helper function, transforms coordinates

to cell id (in grid)

Parameters

point (Union[np.ndarray, Tuple[float, float]]) – coordinates of point to get cell id for

Returns

a tuple (i,j) where i is the grid row index and j is the column index.

Return type

Tuple[int,int]

get_neighbors(idx: Tuple[int, int]) List[Tuple[int, int]][source]

get neighbors in grid

Note: includes self.

Parameters

idx (Tuple[int,int]) – index of cell to get neighbors of

Returns

list of neighbor indices

Return type

List[Tuple[int,int]]

sample(max_points: Optional[int] = None, k: Optional[int] = 4) ndarray[source]

sample from domain

Parameters
  • max_points (Optional[int]) – max number of points to sample from domain.

  • k (Optional[int]) – number of points to position in annulus, default 4

Returns

array of samples from domain

Return type

np.ndarray

eggplant.methods.estimate_n_landmarks(adatas: Union[AnnData, List[AnnData], Dict[str, AnnData]], n_max_lmks: int = 50, n_min_lmks: Optional[int] = 1, n_evals: int = 10, n_reps: int = 3, feature: Optional[str] = None, layer: Optional[str] = None, device: Literal['cpu', 'gpu'] = 'cpu', n_epochs: int = 1000, learning_rate: float = 0.01, subsample: Optional[Union[float, int]] = None, verbose: bool = False, spatial_key: str = 'spatial', max_cg_iterations: int = 1000, tail_length: int = 200, seed: int = 1, spread_distance: Optional[float] = None, diameter_multiplier: float = 10) Tuple[ndarray, Union[Dict[str, List[float]], List[float]], Optional[Union[List[float], Dict[str, float]]]][source]

Estimate how landmark numbers influence outcome

Parameters
  • adatas (Union[ad.AnnData, List[ad.AnnData], Dict[str, ad.AnnData]]) – Single AnnData file or list or dictionary with AnnDatas to be analyzed.

  • n_max_lmks (Union[float, int] = 50) – max number of landmarks to include in the analysis.

  • n_evals (int) – number of evaluations. The number of lansmarks tested will be equally spaced in the interval [1,n_max_lmks], defaults to 10.

  • layer (Optional[str]) – which layer to use

  • device (Literal["cpu", "gpu"]) – which device to perform computations on, defaults to “cpu”

  • n_epochs (int) – number of epochs to use when learning the relationship between landmark distance and feature values, defaults to 1000.

  • rate (learning) – learning rate to use in optimization, defaults to 0.01.

  • subsample (Optional[Union[float, int]]) – whether to subsample the data or not. If a value less than 1 is given, then it’s interpreted as a fraction of the total number of observations, if larger than zero as absolute number of observations to keep. If exactly 1 or None, no subsampling will occur. Note, landmarks are selected before subsampling. Defaults to None.

  • verbose (bool) – set to True to use verbose mode, defaults to False.

  • spatial_key (str) – key to use to extract spatial coordinates from the obsm attribute. Defaults to “spatial”.

  • max_cg_iterations (int) – The maximum number of conjugate gradient iterations to perform (when computing matrix solves). A higher value rarely results in more accurate solves – instead, lower the CG tolerance (from GPyTorch documentation), defaults to 1000.

  • tail_length (int) – the last tail_length observations will be used to compute an average MLL value. If n_epochs are less than tail_length, all epochs will be used instead. Defaults to 50.

  • seed (int) – value of random seed, defaults to 1.

  • spread_distance (Optional[float]) – distance between points in Poisson Disc Sampling. Equivalent to min_dist.

  • diameter_multiplier (float) – applicable to assays where the

Returns

A tuple with a vector listing the number of landmarks used in each evaluation as first element and as second the corresponding average MLL values.

Return type

Tuple[np.ndarray, Union[Dict[str, List[float]], List[float]]]

eggplant.methods.fa_transfer_to_reference(adatas: Union[AnnData, List[AnnData], Dict[str, AnnData]], reference: Reference, variance_threshold: float = 0.3, n_components: Optional[int] = None, use_highly_variable: bool = False, layer: Optional[str] = None, device: Literal['cpu', 'gpu', 'cuda'] = 'cpu', n_epochs: int = 1000, learning_rate: float = 0.01, subsample: Optional[Union[float, int]] = None, verbose: bool = False, return_models: bool = False, return_losses: bool = True, max_cg_iterations: int = 1000, meta_key: str = 'meta', inference_method: Literal['exact', 'variational'] = 'exact', **kwargs) Dict[str, Union[List[Union[GPModelExact, GPModelApprox]], List[ndarray]]][source]

fast approximate transfer of observed data to a reference

similar to transfer_to_reference, but designed for fast approximate transfer of the full set of features. To speed up the transfer process we project the data into a low-dimensional space, transfer this representation, and then reconstruct the original data. This significantly reduces the number of features that need to be transferred to the reference, but comes at the cost of only an approximate representation, which depends on either of the specified variance_threshold or n_components parameters.

Parameters
  • adatas (Union[ad.AnnData, List[ad.AnnData], Dict[str, ad.AnnData]]) – AnnData objects holding data to transfer

  • reference (m.Reference) – reference to transfer data to

  • variance_threshold (float) – fraction of variance that principal components should explain

  • n_components (Optional[int]) – use instead of variance_threshold. If specified, exactly n_components will be used.

  • use_highly_variable (bool) – only use highly_variable_genes to compute the principal components. Default is False.

  • layer (Optional[str]) – which layer to extract data from, defaults to raw

  • device (Litreal["cpu","gpu","cuda"]) – device to use for computations, defaults to “cpu”

  • n_epochs (int) – number of epochs to use, defaults to 1000

  • learning_rate (float) – learning rate, defaults to 0.01

  • subsample – if <= 1 then interpreted of fraction of observations,

to keep. If > 1 interpreted as number of observations to keep in sumbsampling, defaults to None (no sumbsampling) :type _sphinx_paramlinks_eggplant.methods.fa_transfer_to_reference.subsample: Optional[Union[float, int]] = None, :param _sphinx_paramlinks_eggplant.methods.fa_transfer_to_reference.verbose: set to true to use verbose mode, defaults to True :type _sphinx_paramlinks_eggplant.methods.fa_transfer_to_reference.verbose: bool :param _sphinx_paramlinks_eggplant.methods.fa_transfer_to_reference.return_models: set to True to return fitted models, defaults to False :type _sphinx_paramlinks_eggplant.methods.fa_transfer_to_reference.return_models: bool :param _sphinx_paramlinks_eggplant.methods.fa_transfer_to_reference.return_losses: return loss history of each model, defaults to True :type _sphinx_paramlinks_eggplant.methods.fa_transfer_to_reference.return_losses: bool :param _sphinx_paramlinks_eggplant.methods.fa_transfer_to_reference.max_cg_iterations: The maximum number of conjugate gradient iterations to perform (when computing matrix solves). A higher value rarely results in more accurate solves – instead, lower the CG tolerance (from GPyTorch documentation), defaults to 1000. :type _sphinx_paramlinks_eggplant.methods.fa_transfer_to_reference.max_cg_iterations: int = 1000, :param _sphinx_paramlinks_eggplant.methods.fa_transfer_to_reference.meta_key: key in uns slot that holds additional meta info :type _sphinx_paramlinks_eggplant.methods.fa_transfer_to_reference.meta_key: str

eggplant.methods.fit(model: Union[GPModelExact, GPModelApprox], n_epochs: int, optimizer: Optional[Optimizer] = None, fast_computation: bool = True, learning_rate: float = 0.01, verbose: bool = False, seed: int = 0, batch_size: Optional[int] = None, progress_message: Optional[str] = None, **kwargs) None[source]

fit GP Model

Parameters
  • model (m.GPModel,) – Model to fit

  • n_epochs (int) – number of epochs

  • optimizer (Optional[t.optim.Optimizer]) – optimizer to use during fitting, defaults to Adams

  • fast_computation (bool = True,) – whether to use fast approximations to functions, defaults to True

  • learning_rate (float) – learning rate, defaults to 0.01

  • verbose (bool = False,) – set to True for verbose mode, prints progress, defaults to True

  • seed (int) – random seed, defaults to 0

  • batch_size (int) – Batch size. Defaults to None.

  • progress_message (str) – message to include in progress bar

eggplant.methods.landmark_lower_bound(n_lmks: ndarray, losses: Union[List[ndarray], ndarray, Dict[str, ndarray]], kneedle_s_param: float = 1) float[source]

automatic identification of lower bound

based on result from estimate_n_landmarks().

eggplant.methods.transfer_to_reference(adatas: Union[AnnData, List[AnnData], Dict[str, AnnData]], features: Union[str, List[str]], reference: Reference, layer: Optional[str] = None, device: Literal['cpu', 'gpu', 'cuda'] = 'cpu', n_epochs: int = 1000, learning_rate: float = 0.01, subsample: Optional[Union[float, int]] = None, verbose: bool = False, return_models: bool = False, return_losses: bool = True, max_cg_iterations: int = 1000, meta_key: str = 'meta', inference_method: Union[Literal['exact', 'variational'], List[Literal['exact', 'variational']], Dict[str, Literal['exact', 'variational']]] = 'exact', n_inducing_points: Optional[int] = None, batch_size: Optional[int] = None, **kwargs) Dict[str, Union[List[Union[GPModelExact, GPModelApprox]], List[ndarray]]][source]

transfer observed data to a reference

Parameters
  • adatas (Union[ad.AnnData, List[ad.AnnData], Dict[str, ad.AnnData]]) – AnnData objects holding data to transfer

  • features (Union[str, List[str]]) – name of feature(s) to transfer

  • reference (m.Reference) – reference to transfer data to

  • layer (Optional[str]) – which layer to extract data from, defaults to raw

  • device (Litreal["cpu","gpu"]) – device to use for computations, defaults to “cpu”

  • n_epochs (int) – number of epochs to use, defaults to 1000

  • learning_rate (float) – learning rate, defaults to 0.01

  • subsample (Optional[Union[float, int]] = None,) – if <= 1 then interpreted of fraction of observations, to keep. If > 1 interpreted as number of observations to keep in sumbsampling, defaults to None (no sumbsampling)

  • verbose (bool) – set to true to use verbose mode, defaults to True

  • return_models (bool) – set to True to return fitted models, defaults to False

  • return_losses (bool) – return loss history of each model, defaults to True

  • max_cg_iterations (int = 1000,) – The maximum number of conjugate gradient iterations to perform (when computing matrix solves). A higher value rarely results in more accurate solves – instead, lower the CG tolerance (from GPyTorch documentation), defaults to 1000.

  • meta_key (str) – key in uns slot that holds additional meta info

  • inference_method (Union[Literal["exact", "variational"], List[Literal["exact", "variational"]], Dict[str, Literal["exact", "variational"]]]) – which inference method to use, the options are “exact” and “variational” - use “variational” for large data sets. If a single string is given then this method is applied to all objects in the data set, otherwise a list or dict specifying the inference method for each object can be given. Defaults to “exact”.

  • n_inducing_points (int) – number of inducing points to use in variational inference. Defaults to None, which will render 10% of observations in observed data if n_obs < 1e5 else 10000.

  • batch_size (int) – batch_size to use in approximate (variational) inference. Must be less than or equal to the number of observations in a sample.

eggplant.models module

class eggplant.models.BaseGP(landmark_distances: Union[Tensor, DataFrame, ndarray], feature_values: Tensor, landmark_names: Optional[List[str]] = None, mean_fun: Optional[Mean] = None, kernel_fun: Optional[Kernel] = None, device: Literal['cpu', 'gpu'] = 'cpu')[source]

Bases: object

BaseModel for GP Regression

should be combined with one of gpytorch.models models, e.g., ApproximateGP or ExactGP

__init__(landmark_distances: Union[Tensor, DataFrame, ndarray], feature_values: Tensor, landmark_names: Optional[List[str]] = None, mean_fun: Optional[Mean] = None, kernel_fun: Optional[Kernel] = None, device: Literal['cpu', 'gpu'] = 'cpu') None[source]

Constructor method

Parameters
  • landmark_distance (Union[t.Tensor, pd.DataFrame, np.ndarray]) – n_obs x n_landmarks array with distance to each landmark for every observation

  • feature_values (t.Tensor) – n_obs x n_feature array with feature values for each observation

  • mean_fun (gp.means.Mean, optional) – mean function

  • kernel (gp.kernels.Kernel, optional) – Kernel to use in covariance function

  • device (Literal["cpu","gpu"]) – device to execute operations on, defaults to “cpu”

property loss_history: List[float]

Loss history record

class eggplant.models.GPModelApprox(landmark_distances: Union[Tensor, DataFrame, ndarray], feature_values: Tensor, inducing_points: Union[Tensor, DataFrame, ndarray], landmark_names: Optional[List[str]] = None, likelihood: Optional[Likelihood] = None, mean_fun: Optional[Mean] = None, kernel_fun: Optional[Kernel] = None, device: Literal['cpu', 'gpu'] = 'cpu', learn_inducing_points: bool = True)[source]

Bases: BaseGP, ApproximateGP

__init__(landmark_distances: Union[Tensor, DataFrame, ndarray], feature_values: Tensor, inducing_points: Union[Tensor, DataFrame, ndarray], landmark_names: Optional[List[str]] = None, likelihood: Optional[Likelihood] = None, mean_fun: Optional[Mean] = None, kernel_fun: Optional[Kernel] = None, device: Literal['cpu', 'gpu'] = 'cpu', learn_inducing_points: bool = True) None[source]

Constructor method for approximate (variational) inference using inducing points

Parameters
  • landmark_distance (Union[t.Tensor, pd.DataFrame, np.ndarray]) – n_obs x n_landmarks array with distance to each landmark for every observation

  • feature_values (t.Tensor) – n_obs x n_feature array with feature values for each observation

  • inducing_points (t.Tensor) – points to use as inducing points, if learn_inducing_points = True these act as intialization of inducing_points.

  • likelihood (gp.likelihoods.Likelihood, optional) – likelihood function

  • mean_fun (gp.means.Mean, optional) – mean function

  • kernel (gp.kernels.Kernel, optional) – Kernel to use in covariance function

  • device (Literal["cpu","gpu"]) – device to execute operations on, defaults to “cpu”

  • learn_inducing_points (bool) – whether or not to treat inducing points as parameters to be learnt. Default is True.

forward(x: tensor) tensor[source]

forward step in prediction

training: bool
class eggplant.models.GPModelExact(landmark_distances: Union[Tensor, DataFrame, ndarray], feature_values: Tensor, landmark_names: Optional[List[str]] = None, likelihood: Optional[Likelihood] = None, mean_fun: Optional[Mean] = None, kernel_fun: Optional[Kernel] = None, device: Literal['cpu', 'gpu'] = 'cpu')[source]

Bases: BaseGP, ExactGP

__init__(landmark_distances: Union[Tensor, DataFrame, ndarray], feature_values: Tensor, landmark_names: Optional[List[str]] = None, likelihood: Optional[Likelihood] = None, mean_fun: Optional[Mean] = None, kernel_fun: Optional[Kernel] = None, device: Literal['cpu', 'gpu'] = 'cpu') None[source]

Constructor method for exact inference

Parameters
  • landmark_distance (Union[t.Tensor, pd.DataFrame, np.ndarray]) – n_obs x n_landmarks array with distance to each landmark for every observation

  • feature_values (t.Tensor) – n_obs x n_feature array with feature values for each observation

  • likelihood (gp.likelihoods.Likelihood, optional) – likelihood function

  • mean_fun (gp.means.Mean, optional) – mean function

  • kernel (gp.kernels.Kernel, optional) – Kernel to use in covariance function

  • device (Literal["cpu","gpu"]) – device to execute operations on, defaults to “cpu”

forward(x: tensor) tensor[source]

forward step in prediction

training: bool
class eggplant.models.Reference(domain: Union[tensor, ndarray], landmarks: Union[tensor, ndarray, DataFrame], meta: Optional[Union[DataFrame, dict]] = None)[source]

Bases: object

Reference Container

__init__(domain: Union[tensor, ndarray], landmarks: Union[tensor, ndarray, DataFrame], meta: Optional[Union[DataFrame, dict]] = None) None[source]

Constructor function

Parameters
  • domain (Union[t.tensor, np.ndarray]) – n_obs x n_dims spatial coordinates of observations

  • landmarks (Union[t.tensor, np.ndarray, pd.DataFrame]) – n_landmarks x n_dims spatial coordinates of landmarks

  • meta (Optional[Union[pd.DataFrame, dict]], optional) – n_obs x n_categories meta data

clean() None[source]

clean reference from transferred data

composite_representation(by: str = 'feature')[source]

produce composite representation

Parameters

by (str, default to "feature") – consensus representation with respect to this meta data feature

plot(models: Optional[Union[List[str], str]] = None, *args, **kwargs) None[source]

quick plot function

Parameters
  • models (Union[List[str], str], optional) – models to be visualized, if None then all are displayed

  • *args – args to sc.pl.spatial

  • **kwargs – kwargs to sc.pl.spatial

transfer(models: Union[BaseGP, List[BaseGP]], meta: Optional[DataFrame] = None, names: Optional[Union[List[str], str]] = None) None[source]

transfer fitted models to reference

Parameters
  • models (Union[GPModel, List[GPModel]]) – Models to be transferred

  • meta (Optional[pd.DataFrame], optional) – model meta data, e.g., sample

  • names (Optional[Union[List[str], str], optional) – name of models

eggplant.plot module

class eggplant.plot.ColorMapper(cmap: Dict[T, str])[source]

Bases: object

helper class for colormaps

makes it easier to get color values for arrays and lists.

eggplant.plot.distplot_transfer(ref: Reference, inside: Dict[str, str], outside: Optional[Dict[str, str]] = None, n_cols: Optional[int] = None, n_rows: Optional[int] = None, side_size: float = 4.0, swarm_marker_style: Optional[Dict[str, Any]] = None, mean_marker_style: Optional[Dict[str, Any]] = None, display_grid: bool = True, title_fontsize: float = 25, label_fontsize: float = 20, ticks_fontsize: float = 15, return_figures: bool = True) Optional[Tuple[Figure, Axes]][source]

Swarmplot-like visualization of enrichment

Parameters
  • ref ("m.Reference",) – Reference holding transferred data

  • outside (Optional[Dict[str, str]]) – attribute to compare features within. If None all inside features will be compared together.

  • inside (Dict[str, str],) – feature to compare within outer attribute

  • n_cols (Optional[int]) – number of columns, defaults to None

  • n_rows (Optional[int] = None,) – number of rows, defaults to None

  • side_size (float) – size of each outer panel, defaults to 4

  • swarm_marker_style (Optional[Dict[str, Any]]) – data marker style, defaults to None

  • mean_marker_style (Optional[Dict[str, Any]]) – marker style of mean indicator, defaults to None

  • display_grid (bool) – set to True if grid shall be displayed in background, defaults to True

  • title_fontsize (float) – fontsize of title, defaults to 25

  • label_fontsize (float) – fontsize of x-and ylabel, defaults to 20

  • ticks_fontsize (float) – fontisize of x-and yticks, defaults to 15

  • return_figures (bool) – set to True if Figure and Axes objexts should be returned, defaults to True

Returns

None or Figure and Axes objects, depending on return_figures value.

Return type

Union[None,Tuple[plt.Figure,plt.Axes]]

eggplant.plot.landmark_diagnostics(lmk_eval_res: Tuple[List[List[float]], Union[List[float], Dict[str, float]]], side_size: Optional[Union[Tuple[float, float], float]] = None, title_fontsize: float = 20, lower_bound: Optional[Union[List[float], Dict[str, float]]] = None, line_style_dict: Optional[Dict[str, Any]] = None, label_style_dict: Optional[Dict[str, Any]] = None, ticks_style_dict: Optional[Dict[str, Any]] = None, return_figures: bool = False, savgol_params: Optional[Dict[str, Any]] = None) Optional[Tuple[Figure, Axes]][source]
Parameters
  • lmk_eval_res (Tuple[List[List[float]], Union[List[float], Dict[str, float]]]) – output from :fun:`~eggplant.methods.estimate_n_landmarks`

  • side_size (Optional[Union[Tuple[float, float], float]] = None,) – side size of plot, if None then based on data, defaults to None

  • title_fontsize (float) – fontsize of plot title, default 20

  • lower_bound (Optional[Union[Dict[str, float], List[float]]]) – lower_bound value (for number of landmarks), will be indicated by a black dashed line

  • line_style_dict – dictionary with style parameter for :fun:`~matplotlib.pyplot.plot`,

default None :type _sphinx_paramlinks_eggplant.plot.landmark_diagnostics.line_style_dict: Optional[Dict[str, Any]] :param _sphinx_paramlinks_eggplant.plot.landmark_diagnostics.label_style_dict: dictionary with

style parameters for :fun:`~matplotlib.pyplot.xlabel`, and ylabel, default None

Parameters
  • ticks_style_dict (Optional[Dict[str, Any]]) – dictionary with style parameters for xaxis.set_tick_params and yaxis.set_tick_params

  • return_figures (bool) – set to true to return figure, otherwise only show, default False

  • savgol_params (Dict[str, Any]) – parameters to scipy.signal.savgol_filter, default parameters polyorder=4 and window_length=5

Returns

plot of nMLL loss against the number of landmarks

Return type

Optional[Tuple[plt.Figure, plt.Axes]]

eggplant.plot.model_diagnostics(models: Optional[Union[Dict[str, m.GPModel], m.GPModel]] = None, losses: Optional[Union[Dict[str, ndarray], ndarray]] = None, n_cols: int = 5, width: float = 5, height: float = 3, return_figures: bool = False) Optional[Tuple[Figure, axes]][source]

plot loss history for models

can take either a set of models or losses.

Parameters
  • models (Optional[Union[Dict[str, "m.GPModel"], "m.GPModel"]] = None,) – models to investigate

  • losses (Optional[Union[Dict[str, np.ndarray], np.ndarray]] = None,) – losses to visualize

  • n_cols (int) – number of columns, defaults to 5

  • width (float) – width of each subplot panel (visualizing one model’s loss over time), defaults to 5

  • height (float = 3,) – height of each subplot panel (visualizing one model’s loss over time), defaults to 3

  • return_figures (bool = False,) – set to True if Figure and Axes objects should be returned, defaults to False

Returns

None or Figure and Axes objects, depending on return_figure value.

Return type

Optional[Tuple[plt.Figure,plt.Axes]]

eggplant.plot.visualize_landmark_spread(adata: AnnData, feature: Optional[str] = None, layer: Optional[str] = None, spread_distance: Optional[float] = None, diameter_multiplier: float = 1, side_size: float = 5, marker_size: float = 20, landmark_marker_size: Optional[float] = None, label_fontsize: float = 10, title_fontsize: float = 20, seed: int = 1, text_h_offset: Optional[float] = None, bbox_style: Optional[Dict[str, Any]] = None, return_figures: bool = False) Optional[Tuple[Figure, Axes]][source]

function to visualize landmark distribution with given set of parameters. Here Poisson Disc Sampling (PSD) is used to randomly distribute the landmarks within the spatial domain.

Parameters
  • adata (ad.AnnData) – data to visualize

  • feature (Optional[str]) – feature to visualize, if None then (normalized) sum across all features, defaults to None

  • layer (Optional[str]) – layer to use for feature extraction, defaults to None

  • spread_distance (Optional[float]) – see spread_distance

  • diameter_multiplier (float) – see diameter_multiplier

  • side_size (float) – side size of plot, defaults to 5

  • marker_size (float) – marker size of spatial locations, defaults to 20

  • landmark_marker_size (Optional[float]) – size of landmark indicators, if None then twice that of marker_size, defaults to None

  • label_fontsize (float) – fontsize of landmark enumeration labels, defaults to 10

  • title_fontsize (float) – fontsize of title, defaults to 20

  • seed (int) – random seed (numpy)

  • text_h_offset (Optional[float]) – horizontal distance between landmark enumeration label and landmark marker, if None then spread_distance / 5, defaults to None

  • bbox_style (Optional[Dict[str, Any]]) – see bbox argument for matplotlib.pyplot.text()

  • return_figures (bool) – return figure and axes objects, defaults to False

Returns

Figure and Axes object if return_figures=True, else plots the landmarks in the spatial domain.

Return type

Optional[Tuple[plt.Figure, plt.Axes]]

eggplant.plot.visualize_observed(adatas: Union[Dict[str, AnnData], List[AnnData]], features: Optional[Union[List[str], str]], layer: Optional[str] = None, **kwargs) None[source]

Visualize observed data to be transferred

Parameters
  • adatas (Union[Dict[str,ad.AnnData],List[ad.AnnData]]) – List or dictionary of AnnData objects holding the data to be transferred

  • features (Union[str,List[str]]) – Name of feature to be visualized

  • n_cols (Optional[int]) – number of desired colums

  • n_rows (Optional[int]) – number of desired rows

  • marker_size (float) – scatter plot marker size

  • show_landmarks (bool) – show landmarks in plot

  • landmark_marker_size (_sphinx_paramlinks_eggplant.plot.visualize_observed.) – size of landmarks

  • side_size (float) – side size for each figure sublot

  • landmark_cmap (Optional[Dict[int,str]], optional) – colormap for landmarks

  • share_colorscale (bool) – set to true if subplots should all have the same colorscale

  • return_figures (bool) – set to true if figure and axes objects should be returned

  • include_colorbar (bool) – set to true to include colorbar

  • separate_colorbar (bool) – set to true if colorbar should be plotted in separate figure, only possible when share_colorscale = True

  • colorbar_orientation (str) – choose between ‘horizontal’ and ‘vertical’ for orientation of colorbar

  • include_title (bool) – set to true to include title

  • fontsize (str) – font size of title

  • hspace (Optional[float]) – height space between subplots. If none then default matplotlib settings are used.

  • wspace (Optional[float]) – width space between subplots. If none then default matplotlib settings are used.

  • quantile_scaling (bool) – set to true to use quantile scaling. Can help to minimize quenching effect of outliers

  • flip_y (bool) – set to true if y-axis should be flipped

  • colorbar_fontsize (float) – fontsize of colorbar ticks

  • show_image (bool) – show tissue image in background.

Returns

None or Figure and Axes objects, depending on return_figure value.

Return type

Optional[Tuple[plt.Figure,plt.Axes]]

eggplant.plot.visualize_sdea_results(ref: m.GPModel, dge_res: Dict[str, ndarray], cmap: str = 'RdBu_r', n_cols: int = 4, marker_size: float = 10, side_size: float = 8, title_fontsize: float = 20, colorbar_fontsize: float = 20, colorbar_orientation: Literal['horizontal', 'vertical'] = 'horizontal', no_sig_color: str = 'lightgray', reorder_axes: Optional[List[int]] = None, return_figures: bool = False) Optional[Tuple[Figure, Axes]][source]

Visualize result from spatial differential expression analysis

Parameters
  • ref (m.GPModel) – reference object of type Reference, information should have been transferred to the object.

  • dge_res (Dict[str, np.ndarray],) – result from sdea()

  • cmap (str) – colormap to use (choose from matplotlib), defaults to RdBu_r

  • n_cols (int) – number of columns, defaults to 4

  • marker_size (float) – size of marker, defaults to 10

  • title_fontsize (float) – fontsize of title, defaults to 20

  • colorbar_fontsize – fontsize of colorbar ticks, defaults to 20

  • colorbar_orientation (Literal["horizontal", "vertical"]) – orientation of colorbar, defaults to horizontal

  • no_sig_color (str) – color of locations with non-significant, difference in expression, defaults to lightgray

  • reorder_axes (Optional[List[int]]) – new order of axes, original order is [0, 1, 2,..], give new new order as [1, 0, 2,…] to switch place of subplot 1 and 0.

  • return_figures (Union[Tuple[plt.Figure, plt.Axes],None]) – return figure and axes objects. Default value is False.

Colorbar_fontsize

float

Returns

Figure and Axes objects

Return type

Tuple[plt.Figure,plt.Axes]

eggplant.plot.visualize_transfer(reference: Union[Reference, AnnData], attributes: Optional[Union[List[str], str]] = None, layer: Optional[str] = None, **kwargs) Optional[Tuple[Figure, Axes]][source]

Visualize results after transfer to reference

Parameters
  • reference (Union[m.Reference,as.AnnData]) – reference object or AnnData holding transferred values

  • attributes (Optional[Union[List[str],str]]) – visualize transferred models with these attributes. Must be found in .var slot. If none specified, all transfers will be visualized.

  • layer (str) – name of layer to use

  • n_cols (Optional[int]) – number of desired colums

  • n_rows (Optional[int]) – number of desired rows

  • marker_size (float) – scatter plot marker size

  • show_landmarks (bool) – show landmarks in plot

  • landmark_marker_size (_sphinx_paramlinks_eggplant.plot.visualize_transfer.) – size of landmarks

  • side_size (float) – side size for each figure sublot

  • landmark_cmap (Optional[Dict[int,str]], optional) – colormap for landmarks

  • share_colorscale (bool) – set to true if subplots should all have the same colorscale

  • return_figures (bool) – set to true if figure and axes objects should be returned

  • include_colorbar (bool) – set to true to include colorbar

  • separate_colorbar (bool) – set to true if colorbar should be plotted in separate figure, only possible when share_colorscale = True

  • colorbar_orientation (str) – choose between ‘horizontal’ and ‘vertical’ for orientation of colorbar

  • include_title (bool) – set to true to include title

  • fontsize (str) – font size of title

  • hspace (Optional[float]) – height space between subplots. If none then default matplotlib settings are used.

  • wspace (Optional[float]) – width space between subplots. If none then default matplotlib settings are used.

  • quantile_scaling (bool) – set to true to use quantile scaling. Can help to minimize quenching effect of outliers

  • flip_y (bool) – set to true if y-axis should be flipped

  • colorbar_fontsize (float) – fontsize of colorbar ticks

  • show_image (bool) – show tissue image in background.

Returns

None or Figure and Axes objects, depending on return_figure value.

Return type

Optional[Tuple[plt.Figure,plt.Axes]]

eggplant.preprocess module

eggplant.preprocess.default_normalization(adata: AnnData, min_cells: float = 0.1, total_counts: float = 10000.0, exclude_highly_expressed: bool = False, compute_highly_variable_genes: bool = False, n_top_genes: int = 2000) None[source]

default normalization recipe

the normalization strategy that applied for a majority of the analyses presented in the original manuscript. We abstain from calling it a recommended strategy, as the best strategy is depends on your data. However, this strategy have worked well with several data types.

The recipe is based on preprocessing functions from the scanpy.preprocess module and is given as follows:

sc.pp.filter_genes(adata, min_cells=min_cells)
sc.pp.normalize_total(adata,total_counts,
exclude_highly_expressed=exclude_highly_expressed)
sc.pp.log1p(adata)
sc.pp.scale(adata)
Parameters
  • adata (ad.AnnData,) – anndata object to normalize

  • min_cells (float = 0.1,) – argument to scanpy.preprocess.filter_genes()

  • total_counts (float) – argument to scanpy.preprocess.normalize_total(), default is 1e4

  • exclude_highly_expressed (bool) – argument to scanpy.preprocess.normalize_total(), default False

eggplant.preprocess.get_landmark_distance(adata: AnnData, landmark_position_key: str = 'curated_landmarks', landmark_distance_key: str = 'landmark_distances', reference: Optional[Union[Reference, ndarray]] = None, **kwargs) None[source]

compute landmark distances

Parameters
  • adata (ad.AnnData) – AnnData object where distance between landmarks and observations should be measured

  • landmark_position_key (str) – key of landmark coordinates, defaults to “curated_landmarks

  • landmark_position_key – key to use for landmark distances in .obsm, defaults to “landmark_distances”

  • reference (Optional[Union[m.Reference, np.ndarray]]) – provide reference if non-homogeneous distortions should be corrected for using TPS (thin plate splines)

eggplant.preprocess.intersect_features(adatas: Union[List[AnnData], Dict[str, AnnData]]) None[source]
eggplant.preprocess.join_adatas(adatas: List[AnnData], **kwargs) None[source]

join together a set of AnnData objects

Parameters

adatas (List[ad.AnnData]) – AnnData objects to be merged

eggplant.preprocess.joint_highly_variable_genes(adatas: Union[List[AnnData], Dict[str, AnnData]], **kwargs) None[source]
eggplant.preprocess.match_scales(adata: AnnData, reference: Union[ndarray, Reference]) None[source]

match scale between observed and spatial domains

Simple scaling with a single value based on the distances between landmarks.

Parameters
  • adata (ad.AnnData) – AnnData object holding observed data

  • reference (Union[np.ndarray, "m.Reference"]) – Refernce to which observed data will be transferred

eggplant.preprocess.reference_to_grid(ref_img: Union[Image, str], n_approx_points: int = 1000.0, background_color: Union[str, ndarray, tuple] = 'white', n_regions: int = 1) Tuple[ndarray, ndarray][source]

convert image to grid of observations

when creating a reference we will discretize the domain into fixed locations where feature values will be predicted

Parameters
  • ref_img (Union[Image.Image, str]) – PIL.Image or path of/to reference image

  • n_approx_points (int = 1e3,) – approximate number of points to include in the discretized grid. The number of grid points will be in the magnitude of the provided number, defaults to 1000.

  • background – background color of reference image, all elements with this color will be excluded. Can be either an array/tuple of RGB values as well as matplotlib color strings. Defaults to “white”.

  • n_regions (int = 1,) – number of regions (indicated by different colors) contained in the reference.

Returns

A tuple where the first element is an n_obs x 2 array representing the coordinates of each grid point. Second element is a n_obs numeric vector where the i:th element indicates the region that the i:th observation belongs to.

Return type

Tuple[np.ndarray,np.ndarray]

eggplant.preprocess.spatial_smoothing(adata: AnnData, distance_key: str = 'spatial', n_neigh: int = 4, coord_type: Union[str, CoordType] = 'generic', sigma: float = 50, **kwargs) None[source]

spatial smoothing function

Parameters
  • adata (ad.AnnData,) – AnnData object holding data to be smoothed

  • distance_key (str) – key holding spatial coordinates in .obsm, defaults to spatial

  • n_neigh (int) – number of neighbors to use for smoothing, defaults to 4

  • coord_type (Union[str, CoordType],) – type of coordinates, see squidpy documentation for more information, defaults to “generic”.

  • sigma (float = 50,) – sigma value to use in smoothing, higher values gives higher influence to far away points on a given grid point.

eggplant.sdea module

eggplant.sdea.get_sde_features(data: Union[AnnData, Reference], group_by: str = 'model', compare: str = 'feature', labels: Optional[str] = None, n_features: Optional[int] = None) Dict[str, Series][source]

Get spatially differentially (SD)genes

will identify genes that exhibit different spatial distributions between two different conditions.

:param : data :type _sphinx_paramlinks_eggplant.sdea.get_sde_features.data: Union[ad.AnnData,”m.Reference”] :param _sphinx_paramlinks_eggplant.sdea.get_sde_features.group_by: :type _sphinx_paramlinks_eggplant.sdea.get_sde_features.group_by: str :param _sphinx_paramlinks_eggplant.sdea.get_sde_features.compare: :type _sphinx_paramlinks_eggplant.sdea.get_sde_features.compare: str :param _sphinx_paramlinks_eggplant.sdea.get_sde_features.labels: :type _sphinx_paramlinks_eggplant.sdea.get_sde_features.labels: Optional[str] :param _sphinx_paramlinks_eggplant.sdea.get_sde_features.n_features: :type _sphinx_paramlinks_eggplant.sdea.get_sde_features.n_features: Optional[int]

eggplant.sdea.mixed_normal(mus: ndarray, vrs: ndarray, ws: Optional[ndarray] = None) Tuple[ndarray, ndarray][source]

mean and var for weighted mixed normal.

For n distributions \(N_i(\mu_i,\sigma^2_i)\) we compute the mean and variance for the new weighted mix:

\(\mu_{new} = \sum_{i=1}^n w_i\mu_i\) \(\sigma^2_{new} = \sum_{i=1}^n w_i^2\sigma^2_i\)

Parameters
  • mus (np.ndarray) – mean values for each sample

  • vrs (np.ndarray) – variance values for each sample

  • ws (Optional[np.ndarray]) – weights to use when computing the new mean for the mixed distribution.

Returns

a tuple being \((\mu_{new},\sigma^2_{new})\)

Return type

Tuple[np.ndarray,np.ndarray]

eggplant.sdea.sdea(data: Union[AnnData, Reference], group_col: str, n_std: int = 2, subset: Optional[Dict[str, Any]] = None, weights: Optional[ndarray] = None) Dict[str, Dict[str, ndarray]][source]

spatial differential expression analysis

conduct spatial differential expression analysis (sDEA)

Parameters
  • data (Union[ad.AnnData, "m.Reference"]) – object (either anndata or reference) containing the spatial profiles to be compared.

  • group_col (str) – column to make comparison with respect to

  • n_std (int) – number of standard deviations that should be used when testing for differential expression. If the interval mean_1 +/- n_std*std_1 overlaps with the interval mean_2 +/- n_std*std_2 the features are considered as non-differentially expressed, defaults to 2

  • subset (Optional[Dict[str, Any]]) – subset groups in the contrastive analysis, for example subset={feature:value} will only compare those profiles where the value of feature is value, defaults to no subsampling

  • weights (Optional[np.ndarray]) – n_samples vector of weights, where the i:th value of the vector indicates the weight that should be assigned to each sample in the sdea analysis, default to 1/n_samples.

Returns

a dictionary where each analyzed feature is an entry, and each entry is a dictionary with two values: diff being the spot-wise difference between the samples, and sig being an indicator of whether the difference is significant or not.

Return type

Dict[str, Dict[str, np.ndarray]]

eggplant.sdea.test_region_wise_enrichment(data: Union[AnnData, Reference], feature: str, region_1: Union[str, int], region_2: Union[str, int], include_models: Union[List[str], str] = 'composite', col_name: str = 'region', feature_col: str = 'feature', alpha: float = 0.05, n_permutations: Optional[int] = None) Dict[str, Dict[str, Union[float, bool, str]]][source]

region-wise enrichment test

This function tests whether feature is higher expressed in region_1 compared to region_2 using a permutation test.

Parameters
  • data (Union[ad.AnnData, "m.Reference"]) – object containing feature data

  • feature (str,) – feature to inspect

  • region_1 (Union[str, int]) – label of region 1

  • region_2 (Union[str, int]) – label of region 2

  • include_models (Union[List[str], str]) – models to include, defaults to composite

  • col_name (str) – column name on adata.obs that indicates region label, defaults to region

  • feature_col (str) – column name in adata.var that indicates feature, defaults to feature

  • alpha (float) – significance level, defaults to 0.01

  • n_permutations (Optional[int]) – number of permutations to perform. 1/alpha must be larger than n_permutations, otherwise an exception will be raised. Defaults to 1 / alpha.

Returns

Dictionary with result of permutation test. The keys are: - pvalue : caluclated pvalue - is_sig : whether the result is considered significant or not - feature : name of feature that was examined

Return type

Dict[str, Dict[str, Union[float, bool, str]]]