Basic Statistics

The Statistics tab displays statistical information for the selected columns.

See also the "View Basic Statistics" section in Getting Started.

Measurement Scales and Statistics

MIDAS displays only statistically meaningful items based on the column's measurement scale (Nominal, Ordinal, Interval, Ratio).

Statistics Displayed for Each Scale

Statistic	Nominal	Ordinal	Interval	Ratio
Valid values	o	o	o	o
Missing values	o	o	o	o
Mode	o	o	o	o
Min / Max		o	o	o
Median		o	o	o
Mean			o	o
Std Dev			o	o
Coef. of Variation				o
Geometric Mean				o

Description of Each Statistic

Valid values: Number of non-missing (valid) data points in the column
Missing values: Number of missing (null) values in the column
Mode: The most frequently occurring value
Min / Max: Minimum and maximum values
Median: The middle value when data is sorted in ascending order
Mean: Average value
Std Dev: Standard deviation (measure of data spread)
Coef. of Variation: Coefficient of variation (standard deviation / mean x 100%). Represents the relative size of variation compared to the mean
Geometric Mean: Used for calculating the average of ratio data (only computable for positive values)

Example: Measurement Scales and Statistics

For example, postal codes should be treated as Nominal scale. When treated as nominal, mean and standard deviation are not displayed. This is because numerical magnitude has no meaning for nominal scales (postal code 100-0001 being "smaller" than 150-0001 has no significance).

On the other hand, temperature data should be treated as Interval scale. When treated as interval, mean and standard deviation are calculated.

See Data Preparation and Import for how to change measurement scales.

Grouping Feature

Use the Show stats by option to group data by a categorical column and view statistics for each group.

How to Use

Open the Settings section in the Statistics tab
Select a column to use for grouping from the Show stats by dropdown (e.g., species)
Statistics are displayed for each value in the selected column

Usage Example

When selecting the sepal_length column in the Iris dataset and grouping by species:

Statistics for sepal_length of setosa
Statistics for sepal_length of versicolor
Statistics for sepal_length of virginica

are displayed separately, enabling comparison between species.

Statistics by Data Type

String Type

When selecting a string column, the following are displayed:

Unique values: Number of unique values
Most frequent (top 10): Top 10 most frequent values and their counts

Boolean Type

When selecting a True/False column, the following are displayed:

True: Count and percentage of True values
False: Count and percentage of False values

Datetime Type

When selecting a datetime column, the following are displayed:

Earliest: The oldest datetime
Latest: The most recent datetime
Time span: Duration (e.g., "5 days, 3 hours")

Row Selection Integration

You can select data rows from histograms and scatter plots in the Statistics tab.

Selection from Histogram

Click a bar in the histogram
Rows within that bin (range) are selected
Selected rows can be viewed in the Selected Rows tab

Adding to selection: Hold Ctrl (Mac: Cmd) while clicking to add to existing selection.

Selection from Scatter Plot

The correlation scatter plot displayed when multiple numeric columns are selected also supports row selection:

Click a point on the scatter plot
The corresponding row is selected