Sample Datasets
MIDAS includes sample data that you can use to learn data analysis and visualization.
How to Open Sample Data
- Open MIDAS to see the launcher screen
- Click the dataset you want from the "Sample Data" section in the left sidebar
- The data loads and the project screen opens
Palmer Penguins
Measurement data of three penguin species observed in Antarctica (344 rows, 8 columns). Suitable for classification and visualization practice.
Columns
species: Penguin species (Adelie, Chinstrap, Gentoo)island: Island namebill_length_mm: Bill lengthbill_depth_mm: Bill depthflipper_length_mm: Flipper lengthbody_mass_g: Body masssex: Sexyear: Survey year
Contains some missing values, making it useful for data cleaning practice.
Data source: https://allisonhorst.github.io/palmerpenguins/
License: CC0 (Public Domain)
Gapminder
Country-level data from 1952 to 2007 (1,704 rows, 6 columns). Analyze trends in life expectancy, population, and GDP.
Columns
country: Country namecontinent: Continentyear: YearlifeExp: Life expectancypop: PopulationgdpPercap: GDP per capita
Useful for time series visualization and examining relationships between economic development and life expectancy.
Data source: https://www.gapminder.org/data/
License: CC BY 4.0
Attribution: "Data from Gapminder Foundation, https://www.gapminder.org/data/, CC BY 4.0"
Auto MPG
Automobile fuel efficiency data from 1970 to 1982 (398 rows, 9 columns). Suitable for regression analysis practice.
Columns
mpg: Fuel efficiency (miles per gallon)cylinders: Number of cylinders (4, 6, 8)displacement: Engine displacement (cubic inches)horsepower: Horsepowerweight: Vehicle weight (pounds)acceleration: Acceleration (0-60 mph time in seconds)model_year: Model year (70 = 1970, 82 = 1982)origin: Country of origin (usa, europe, japan)name: Vehicle model name
Analyze relationships between fuel efficiency and vehicle characteristics. Observe patterns such as: heavier vehicles have worse fuel efficiency, fewer cylinders mean better efficiency, and efficiency improves over the years. Contains some missing values.
Data source: https://archive.ics.uci.edu/dataset/9/auto+mpg
License: Public Domain
World Bank
Development indicators for 50 major countries (50 rows, 10 columns, 2021-2022 data). Includes GDP, population, life expectancy, and internet penetration. Suitable for bar charts and cross tabulation practice.
Columns
country: Country namecountry_code: Country coderegion: Regionincome_group: Income grouppopulation_2022: Population (2022)gdp_usd_billions_2022: GDP (billions USD, 2022)gdp_per_capita_2022: GDP per capita (2022)life_expectancy_2021: Life expectancy (2021)urban_population_percent_2022: Urban population percentage (2022)internet_users_percent_2021: Internet usage rate (2021)
Suitable for comparing economic development and social indicators across countries.
Data source: https://data.worldbank.org/
License: CC BY 4.0
Attribution: "Data from World Bank Open Data, https://data.worldbank.org/, CC BY 4.0"
Bike Sharing
Washington D.C. bike sharing data (2011-2012). Available in two versions: daily (731 rows) and hourly (17,379 rows). Useful for analyzing usage patterns by weather and season, and for count data analysis using GLM.
Time Variables
instant: Record IDdteday: Date (YYYY-MM-DD)season: Season (1: Spring, 2: Summer, 3: Fall, 4: Winter)yr: Year (0: 2011, 1: 2012)mnth: Month (1-12)hr: Hour (0-23, hourly data only)weekday: Day of week (0: Sunday, 6: Saturday)holiday: Holiday flag (0: Regular day, 1: Holiday)workingday: Working day flag (1: Weekday, 0: Weekend or holiday)
Weather Variables
weathersit: Weather condition- 1: Clear, few clouds, partly cloudy
- 2: Mist + cloudy, mist + broken clouds
- 3: Light snow, light rain + thunderstorm + scattered clouds
- 4: Heavy rain + ice pellets + thunderstorm + mist
temp: Normalized temperature (Celsius divided by 41)atemp: Normalized feeling temperature (Celsius divided by 50)hum: Normalized humidity (humidity divided by 100)windspeed: Normalized wind speed (wind speed divided by 67)
Usage Counts
casual: Casual user countregistered: Registered user countcnt: Total count (casual + registered)
Suitable for Poisson regression usage prediction, time series pattern analysis, and evaluating weather/seasonal factor impacts. This is count data with expected overdispersion (variance > mean).
Data source: https://archive.ics.uci.edu/dataset/275/bike+sharing+dataset
License: CC0 (Public Domain)
Earthquakes
Worldwide earthquake data from September 2024 (1,041 rows, 7 columns, magnitude 4.0+). Suitable for datetime data visualization.
Columns
time: Occurrence datetimelatitude,longitude: Locationdepth: Depthmag: Magnitudeplace: Location description
Explore distributions by date, time, and day of week.
Data source: https://www.usgs.gov/programs/earthquake-hazards
License: Public Domain (USGS Data)
Iris
Measurement data of three iris species, a classic classification dataset (150 rows, 5 columns).
Columns
sepal_length,sepal_width: Sepal dimensionspetal_length,petal_width: Petal dimensionsspecies: Species
Data source: https://archive.ics.uci.edu/dataset/53/iris
License: Public Domain