Data visualization in Constellab
This story is under construction
Introduction
Here we briefly show how to use data charts to visualize data with Constellab.
Generalities on Plotly within Constellab
On Constellab, 2D line plot, scatter plot, violin plot, box plot, bar plot and histogram from plotly share common features.
They have the same input and output. It takes as an input a Table
and it produces a PlotlyResource
as an output.
And they also have parameters in common, here are the most used ones :
- The
x-axis
andy-axis
are the names of a columns to use in the input data title
: Title of the plot.x_axis_name
: Title for the x-axis.y_axis_name
: Title for the y-axis.color
: Column name or variable used to color the lines or markers on the plot.hover_data
: List of columns to display as hover text when hovering over data points on the plot.facet_col
: Column name or variable used for creating column facets (subplots).facet_row
: Column name or variable used for creating row facets (subplots).log_x
: Specifies whether the x-axis should be displayed on a logarithmic scale (True/False).log_y
: Specifies whether the y-axis should be displayed on a logarithmic scale (True/False).
Prerequisites
To plot your data, you need to put it into a Table
.
Several datasets will be used in this story:
- The IRIS dataset: The Iris dataset consists of 50 samples from each of three species of Iris flower (Iris setosa, Iris virginica and Iris versicolor), in which four features are measured from each sample: the length and the width of the sepals and petals, in centimeters.
- The titanic dataset : The titanic dataset consists of the passengers of the titanic with various information such as their age, sex, embark town, fare, passenger class, whether they were travelling alone or not, and their deck.
- The Gapminder dataset : The gapminder dataset collects data from a handful of sources, it has a unique identifier : the country, and mutliple variables such as the life expectancy, tje employ rate, the urban rate... for more information : https://www.kaggle.com/datasets/sansuthi/gapminder-dataset
- The penguins dataset : the penguins archipelago dataset consists of 344 samples of three species of penguins from three islands. it present the island, the species, the sex, the mass, the bill length and depth and the flipper length of each being
Notice
Basically, Constellab used its own chart engine for visualizations. This engine is being replaced by the Plotly engine to provide a broader range of charts to users.
2D line plots
To learn more about all the parameters, please refer to our technical documentation and Plotly documentation.
It has some specific parameters, the useful ones are :
line_shape
: you can have a smooth curve or a straight line between each pointsmarkers
: to show or not the marker for each point
Live code
The code used to render the plot is given as follows
# parameters x='year' y='population' title='Population of Country per year' y_axis_name = "population" x_axis_name = "Time (year)" line_group='continent' color='country' #live code from gws_core import Table, PlotlyResource import plotly.express as px fig = plotly.express.line(data_frame=sources[0], x=x, y=y, color=color, title=title, line_group=line_group ) # Update axis titles fig.update_xaxes(title='x_axis_name') fig.update_yaxes(title='y_axis_name') outputs = [PlotlyResource(fig)]
Rendering
The rendering of the plot is as follows :
2D scatter plots
It has some specific parameters, the useful ones are :
size
: for the size of the markersopacity
: for the opacity of the markerssymbol
: to choose a symbol per value
For this example, we will use the penguins dataset and the following parameters:
x-axis
: bill_length_mmy-axis
: bill_depth_mmColor
: islandTitle
: Bill length depending of the depthy axis name
: bill depthx axis name
: bill lengthsymbol
: species
For more informations about each parameter , please click here.
Bar plots
It has some specific parameters, the useful one is :
bar_mode
: to chose how to show your bars, stacks, side by side...
For this example, we used the penguins dataset and the following parameters:
x-axis
: speciescolor
: sex;Title
: gender of penguins per species;y -axis name
: countx-axis name
: species, bar_mode: stack.
For more informations about each parameter , please click here.
Box plots
It has some specific parameters, the useful ones are :
boxmode
: places the boxes beside or on the top of each otherpoints
: shows outliers, suspect outliers or all pointsnotched
: draw boxes with (or without) notches
For this example, the penguins
data set and the following parameters were used:
x-axis
: islandy-axis
: bill_length_mm-
color
: species Title
: Penguins bill length per species and islandy-axis name
: Bill Lengthx-axis name
: Island
For more informations about each parameter , please click here.
Violin plots
It has some specific parameters, the useful ones are :
violinmode
: places the violins beside or on the top of each otherpoints
: shows outliers, suspect outliers or all pointsbox
: draw boxes with (or without) notches
For this example, we will use the penguins
dataset and the following parameters :
x-axis
: islandy-axis
: bill_length_mmColor
: speciesTitle
: Bill length per species and islandX-axis Name
: islandY-axis Name
: bill length
For more informations about each parameter , please click here.
Histograms
It has some specific parameters, the useful ones are :
bar_norm
: to show the relative valueshistnorm
: for the histogram normalisationhistfunc
: to chose the function to use among count, sum, avg, min, maxcumulative
: for a cumulative histogramnbins
: to set the number of bins.opacity
: for the opacity of the binsbarmode
: For stacked of grouped histograms
For this exemple, the Penguins dataset and the following parameters were used :
x-axis
: flipper_length_mmColor
: speciesTitle
: Penguins flipper length per speciesY-axis name
: countX-axis name
: Flipper length (in mm)histogram function
: count
For more informations about each parameter , please click here.
If you need a particular plot that is not presented above, you should try the smart plot task.
Hint on the choice of charts!
The choice of a chart depends on the type of task you want to show: showing change over time, showing a part-to-whole composition, looking at how data is distributed, comparing values between groups, observing relationships between variables,...
For example, line charts are well suited for showing change over time whereas box plots can be used to look at how data is distributed. Scatter plot is a standard way of showing the relationship between two variables.
Smart Plot
After having imported the iris dataset in the Databox of the lab, click on 'views' and select 'Smart interactive plot'. You can now write your request for the figure you want to plot from the data. For example, one can ask for a 2D scatter plot of the data with different colors according the species of the sample and specify the axis labels and figure title.
Other chart types can be used to extract other kinds of interesting information. For example, boxplot chart as shown below. Here we write a request for a boxplot of the values of each features of the Iris dataset. With this type of chart, we get a synthetic overview of the distribution of the samples (median, 1st and 3rd quartile,...) for each feature.