UMAP Dimensionality Reduction

Typing name : TASK.gws_design_of_experiments.UMAPTask

Brick : gws_design_of_experiments

UMAP for dimensionality reduction and visualization

Performs UMAP (Uniform Manifold Approximation and Projection) dimensionality reduction.

This task reduces high-dimensional data to 2D or 3D for visualization and optionally performs clustering to identify groups in the data.

The task performs the following steps:

Optionally scales the data using StandardScaler
Applies UMAP dimensionality reduction
Optionally performs K-Means clustering on the UMAP embedding
Generates interactive visualizations

Inputs: - data: Table containing the features to reduce

Outputs: - umap_plot: Interactive plot of UMAP embedding with optional clusters - umap_table: Table containing UMAP coordinates and cluster assignments

Configuration: - n_neighbors: Number of neighbors for UMAP (controls local vs global structure) - min_dist: Minimum distance between points in low-dimensional space - metric: Distance metric to use - scale_data: Whether to standardize features before UMAP - n_clusters: Number of clusters for K-Means (optional) - color_by: Column name to color points by (optional) - columns_to_exclude: Comma-separated list of column names to exclude from UMAP analysis

Input

Data

Input data for UMAP

Table

Output

UMAP 2D Plot

Interactive UMAP 2D embedding visualization

Plotly resource

UMAP 3D Plot

Interactive UMAP 3D embedding visualization

Plotly resource

UMAP 2D Table

Table with UMAP 2D coordinates and cluster assignments

Table

UMAP 3D Table

Table with UMAP 3D coordinates and cluster assignments

Table

Configuration

n_neighbors

Optional

Controls how UMAP balances local vs global structure

Type : int

Default value : 15

min_dist

Optional

Minimum distance between points in the embedding

Type : float

Default value : 0.1

metric

Optional

Distance metric to use (euclidean, manhattan, cosine, etc.)

Type : string

Allowed values :

euclidean
manhattan
chebyshev
minkowski
canberra
braycurtis
mahalanobis
wminkowski
seuclidean
cosine
correlation
haversine
hamming
jaccard
dice
russelrao
kulsinski
ll_dirichlet
hellinger
rogerstanimoto
sokalmichener
sokalsneath
yule

Default value : euclidean

scale_data

Optional

Whether to scale the data before applying UMAP

Type : bool

Default value : true

n_clusters

Optional

Number of clusters for K-Means clustering (optional)

Type : int

color_by

Optional

Column name to color points by (optional)

Type : string

columns_to_exclude

Optional

List of column names to exclude from UMAP analysis

Type : list

hover_data_columns

Optional

List of column names to display as metadata on hover

Type : list

Input

Output

Configuration

Have you developed a brick?