Data Preparation and Import
To analyze data in MIDAS, you need to load a data file. This page explains supported file formats, data types, and measurement scales.
Supported File Formats
MIDAS supports the following text-based data file formats:
CSV (Comma-Separated Values)
The most common data format. Columns are separated by commas (,). File extension is typically .csv.
TSV (Tab-Separated Values)
A file format where columns are separated by tab characters. File extension is typically .tsv or .txt.
Character Encoding UTF-8 encoding is supported. When saving CSV from Excel, select "CSV UTF-8 (Comma delimited)" format.
File Structure
MIDAS assumes data files have the following structure:
- Row 1: Column names (header row)
- Row 2 onwards: Data rows
Example:
Name,Age,Country
Alice,25,USA
Bob,30,Japan
Charlie,28,UK
Data Types
MIDAS automatically determines data types when loading. The following data types are supported:
boolean
Boolean values represented by true/false, 1/0, yes/no, etc.
int64 (integer)
Numbers without decimal points (e.g., 1, 42, -10).
float64 (floating point)
Numbers with decimal points (e.g., 3.14, 0.5, -2.71).
date
Date data (e.g., 2025-11-17, 2025/11/17).
datetime
Data including both date and time (e.g., 2025-11-17 14:30:00).
timespan
Time of day data (e.g., 14:30:00, 09:15).
duration
Duration data (e.g., 1h 30m, 2d 3h).
string Text data that does not match any of the above types.
Data types are displayed in parentheses in column headers (e.g., Age (int64)). If a data type is not correctly determined, right-click the column in the data table and execute type conversion from "Convert Column Type".
Measurement Scales
Columns are automatically assigned a statistical "measurement scale". Measurement scales are determined based on data types, but may need to be changed according to the actual meaning of the data. Measurement scales indicate what kind of statistical processing is appropriate for the data.
Nominal Scale Data representing categories with no meaningful order.
Examples: Gender (male/female), colors (red/blue/green), country names
Ordinal Scale Data representing categories with meaningful order.
Examples: Satisfaction (low/medium/high), grade level (1st/2nd/3rd year), grades (A/B/C/D)
Interval Scale Equally spaced numeric data where differences between values are meaningful. However, "how many times" operations are not meaningful.
Examples: Temperature (Celsius), year (AD)
- The difference between 20 and 10 degrees is meaningfully 10 degrees
- However, 20 degrees is not "twice as warm" as 10 degrees
Ratio Scale Equally spaced numeric data where both differences and "how many times" operations are meaningful.
Examples: Height, weight, price, age
- The difference between 20kg and 10kg is meaningfully 10kg
- Furthermore, 20kg is "twice as heavy" as 10kg
Measurement scales affect graph type selection and statistical analysis. You can change measurement scales by right-clicking columns in the data table as needed.
Common Issues and Solutions
Character Encoding Issues
The file's character encoding may not be UTF-8. Re-save from Excel using "CSV UTF-8 (Comma delimited)" format.
Dates Not Recognized Correctly
The date format may not be a common format (like YYYY-MM-DD). Change the date column format in Excel, or load as string and convert afterward.
Want to Load Excel Files
MIDAS cannot directly load Excel files (.xlsx). In Excel, use "Save As" and select "CSV UTF-8 (Comma delimited)" format, then load that file.
Related Pages
- Creating Graphs - Visualizing data
- Advanced Graph Creation - Flexible visualization with Grammar of Graphics