Basic Statistics

The Statistics tab displays statistical information for the selected columns.

See also the "View Basic Statistics" section in Getting Started.

Measurement Scales and Statistics

MIDAS displays only statistically meaningful items based on the column's measurement scale (Nominal, Ordinal, Interval, Ratio).

Statistics Displayed for Each Scale

Statistic Nominal Ordinal Interval Ratio
Valid values o o o o
Missing values o o o o
Mode o o o o
Min / Max o o o
Median o o o
Mean o o
Std Dev o o
Coef. of Variation o
Geometric Mean o

Description of Each Statistic

  • Valid values: Number of non-missing (valid) data points in the column
  • Missing values: Number of missing (null) values in the column
  • Mode: The most frequently occurring value
  • Min / Max: Minimum and maximum values
  • Median: The middle value when data is sorted in ascending order
  • Mean: Average value
  • Std Dev: Standard deviation (measure of data spread)
  • Coef. of Variation: Coefficient of variation (standard deviation / mean x 100%). Represents the relative size of variation compared to the mean
  • Geometric Mean: Used for calculating the average of ratio data (only computable for positive values)

Example: Measurement Scales and Statistics

For example, postal codes should be treated as Nominal scale. When treated as nominal, mean and standard deviation are not displayed. This is because numerical magnitude has no meaning for nominal scales (postal code 100-0001 being "smaller" than 150-0001 has no significance).

On the other hand, temperature data should be treated as Interval scale. When treated as interval, mean and standard deviation are calculated.

See Data Preparation and Import for how to change measurement scales.

Grouping Feature

Use the Show stats by option to group data by a categorical column and view statistics for each group.

How to Use

  1. Open the Settings section in the Statistics tab
  2. Select a column to use for grouping from the Show stats by dropdown (e.g., species)
  3. Statistics are displayed for each value in the selected column

Usage Example

When selecting the sepal_length column in the Iris dataset and grouping by species:

  • Statistics for sepal_length of setosa
  • Statistics for sepal_length of versicolor
  • Statistics for sepal_length of virginica

are displayed separately, enabling comparison between species.

Statistics by Data Type

String Type

When selecting a string column, the following are displayed:

  • Unique values: Number of unique values
  • Most frequent (top 10): Top 10 most frequent values and their counts

Boolean Type

When selecting a True/False column, the following are displayed:

  • True: Count and percentage of True values
  • False: Count and percentage of False values

Datetime Type

When selecting a datetime column, the following are displayed:

  • Earliest: The oldest datetime
  • Latest: The most recent datetime
  • Time span: Duration (e.g., "5 days, 3 hours")

Row Selection Integration

You can select data rows from histograms and scatter plots in the Statistics tab.

Selection from Histogram

  1. Click a bar in the histogram
  2. Rows within that bin (range) are selected
  3. Selected rows can be viewed in the Selected Rows tab

Adding to selection: Hold Ctrl (Mac: Cmd) while clicking to add to existing selection.

Selection from Scatter Plot

The correlation scatter plot displayed when multiple numeric columns are selected also supports row selection:

  1. Click a point on the scatter plot
  2. The corresponding row is selected