Need to reduce your Table's dimension? Here is the solution!
If you need to reduce the dimensions of your Table we have the solution!
Melt is a function used to reshape a dataframe from a wide format to a long format. Here is an example.
As you can see, after melting your DataFrame you obtain 3 columns :
- the unique identifier/ID
- the variable column with the former columns names as values ,
- the value columns with all the values from the former columns,
This task can be useful in some cases for analyses or visualisations that require the long format.
For example, melt can be used for Principal Component Analysis (PCA). A PCA aims to reduce the number of variable of a dataset while preserving as much information as possible. It may be easier to plot the PCA if the data are in a long format rather than a wide format.
The task Melt available on Constellab is quite simple, thus it has some optional parameters :
id_vars: column(s) to use for ID, will not be melted
value_vars: columns to melt, you can melt as many as you want, if empty, will melt all the columns
var_name: name to use for the 'variable' column
value_name: name to use for the 'value' column
col_level: if columns are a MultiIndex then use this level to melt.
ignore_index: if True, original index is ignored. If False, the original index is retained. Index labels will be repeated as necessary.
For more information, click here.
To perfom this tutorial, you need :
- An access to Constellab and a digital lab
- The brick "gws_core" (version>0.5.16)
- A dataset in the wide format. Here we will use the Iris dataset (available in the ressources of this story).
Steps to follow
- Upload your dataset and convert it as a Table
- Link this to the
- Check the parameters, "variety" for
id_varswas the only added parameter for this tutorial
- Save your parameters
- Run your experiment
- Get your transformed Table!
As an example, we will use the Iris dataset, it contains 50 samples from 3 species of iris, with the sepal width and length, and the petal width and length.
After running the experiment you will obtain this Table:
You can see that the dataset is now in long format, ready for further analysis!