Sample Datasets

MIDAS includes sample data that you can use to learn data analysis and visualization.

How to Open Sample Data

  1. Open MIDAS to see the launcher screen
  2. Click the dataset you want from the "Sample Data" section in the left sidebar
  3. The data loads and the project screen opens

Palmer Penguins

Measurement data of three penguin species observed in Antarctica (344 rows, 8 columns). Suitable for classification and visualization practice.

Columns

  • species: Penguin species (Adelie, Chinstrap, Gentoo)
  • island: Island name
  • bill_length_mm: Bill length
  • bill_depth_mm: Bill depth
  • flipper_length_mm: Flipper length
  • body_mass_g: Body mass
  • sex: Sex
  • year: Survey year

Contains some missing values, making it useful for data cleaning practice.

Data source: https://allisonhorst.github.io/palmerpenguins/

License: CC0 (Public Domain)

Gapminder

Country-level data from 1952 to 2007 (1,704 rows, 6 columns). Analyze trends in life expectancy, population, and GDP.

Columns

  • country: Country name
  • continent: Continent
  • year: Year
  • lifeExp: Life expectancy
  • pop: Population
  • gdpPercap: GDP per capita

Useful for time series visualization and examining relationships between economic development and life expectancy.

Data source: https://www.gapminder.org/data/

License: CC BY 4.0

Attribution: "Data from Gapminder Foundation, https://www.gapminder.org/data/, CC BY 4.0"

Auto MPG

Automobile fuel efficiency data from 1970 to 1982 (398 rows, 9 columns). Suitable for regression analysis practice.

Columns

  • mpg: Fuel efficiency (miles per gallon)
  • cylinders: Number of cylinders (4, 6, 8)
  • displacement: Engine displacement (cubic inches)
  • horsepower: Horsepower
  • weight: Vehicle weight (pounds)
  • acceleration: Acceleration (0-60 mph time in seconds)
  • model_year: Model year (70 = 1970, 82 = 1982)
  • origin: Country of origin (usa, europe, japan)
  • name: Vehicle model name

Analyze relationships between fuel efficiency and vehicle characteristics. Observe patterns such as: heavier vehicles have worse fuel efficiency, fewer cylinders mean better efficiency, and efficiency improves over the years. Contains some missing values.

Data source: https://archive.ics.uci.edu/dataset/9/auto+mpg

License: Public Domain

World Bank

Development indicators for 50 major countries (50 rows, 10 columns, 2021-2022 data). Includes GDP, population, life expectancy, and internet penetration. Suitable for bar charts and cross tabulation practice.

Columns

  • country: Country name
  • country_code: Country code
  • region: Region
  • income_group: Income group
  • population_2022: Population (2022)
  • gdp_usd_billions_2022: GDP (billions USD, 2022)
  • gdp_per_capita_2022: GDP per capita (2022)
  • life_expectancy_2021: Life expectancy (2021)
  • urban_population_percent_2022: Urban population percentage (2022)
  • internet_users_percent_2021: Internet usage rate (2021)

Suitable for comparing economic development and social indicators across countries.

Data source: https://data.worldbank.org/

License: CC BY 4.0

Attribution: "Data from World Bank Open Data, https://data.worldbank.org/, CC BY 4.0"

Bike Sharing

Washington D.C. bike sharing data (2011-2012). Available in two versions: daily (731 rows) and hourly (17,379 rows). Useful for analyzing usage patterns by weather and season, and for count data analysis using GLM.

Time Variables

  • instant: Record ID
  • dteday: Date (YYYY-MM-DD)
  • season: Season (1: Spring, 2: Summer, 3: Fall, 4: Winter)
  • yr: Year (0: 2011, 1: 2012)
  • mnth: Month (1-12)
  • hr: Hour (0-23, hourly data only)
  • weekday: Day of week (0: Sunday, 6: Saturday)
  • holiday: Holiday flag (0: Regular day, 1: Holiday)
  • workingday: Working day flag (1: Weekday, 0: Weekend or holiday)

Weather Variables

  • weathersit: Weather condition
    • 1: Clear, few clouds, partly cloudy
    • 2: Mist + cloudy, mist + broken clouds
    • 3: Light snow, light rain + thunderstorm + scattered clouds
    • 4: Heavy rain + ice pellets + thunderstorm + mist
  • temp: Normalized temperature (Celsius divided by 41)
  • atemp: Normalized feeling temperature (Celsius divided by 50)
  • hum: Normalized humidity (humidity divided by 100)
  • windspeed: Normalized wind speed (wind speed divided by 67)

Usage Counts

  • casual: Casual user count
  • registered: Registered user count
  • cnt: Total count (casual + registered)

Suitable for Poisson regression usage prediction, time series pattern analysis, and evaluating weather/seasonal factor impacts. This is count data with expected overdispersion (variance > mean).

Data source: https://archive.ics.uci.edu/dataset/275/bike+sharing+dataset

License: CC0 (Public Domain)

Earthquakes

Worldwide earthquake data from September 2024 (1,041 rows, 7 columns, magnitude 4.0+). Suitable for datetime data visualization.

Columns

  • time: Occurrence datetime
  • latitude, longitude: Location
  • depth: Depth
  • mag: Magnitude
  • place: Location description

Explore distributions by date, time, and day of week.

Data source: https://www.usgs.gov/programs/earthquake-hazards

License: Public Domain (USGS Data)

Iris

Measurement data of three iris species, a classic classification dataset (150 rows, 5 columns).

Columns

  • sepal_length, sepal_width: Sepal dimensions
  • petal_length, petal_width: Petal dimensions
  • species: Species

Data source: https://archive.ics.uci.edu/dataset/53/iris

License: Public Domain