Advanced Graph Creation

Graph Builder provides multiple Graph Types such as Bar Chart, Histogram, and Scatter Plot. When you have a well-defined graph format in mind and only need minor adjustments, these are convenient and easy to use.

However, as you perform more in-depth analysis, you may want more flexible control over visualizations to create custom graphs. Custom Graph provides this capability.

Custom Graph is based on the Grammar of Graphics theory. By decomposing graphs into components like "data," "statistical transformations," "geometric objects," and "coordinate systems," and freely combining these as layers, you can achieve advanced visualizations such as:

  • Overlaying multiple graph types in one graph (e.g., scatter plot + regression line + confidence interval)
  • Visualizing data after statistical transformation (e.g., histogram, kernel density estimation)
  • Exploring multidimensional data with facets (small multiples)
  • Flexibly controlling axis scales and directions

If other Graph Types are like "ready-made furniture," Custom Graph is like an "assembly kit of materials." Once you understand the basic components, you can create virtually unlimited visualization patterns depending on how you use them.

The 7 Components of Grammar of Graphics

Custom Graph builds graphs by combining these 7 elements:

  1. Data: The dataset to visualize
  2. Aesthetics: The mapping between variables (dataset columns) and visual attributes (position, color, size, etc.)
  3. Layers: Individual visual elements when overlaying multiple elements
  4. Statistics: Statistical transformations of data (binning, smoothing, etc.)
  5. Scales: How to convert data values to visual values
  6. Coordinates: Coordinate system (Cartesian coordinates, axis swapping, etc.)
  7. Facets: The structure when creating graphs composed of multiple small graphs

By combining these, you can visualize diverse aspects of your data.

Data - Selecting Data

First, select the dataset to visualize. Here we use the Auto MPG dataset (fuel efficiency data for 398 cars from 1970-1982).

Aesthetics - Mapping Visual Elements

Map data columns to visual attributes. The most basic is mapping two continuous variables to the x and y axes.

Data: Auto MPG
Aesthetics: x = weight, y = mpg
Geometry: Point

Custom Graph basic scatter plot: visualizing the relationship between weight and mpg (fuel efficiency) in the Auto MPG dataset. Shows clear negative correlation where heavier cars have worse fuel efficiency

A clear negative correlation is visible: heavier cars have worse fuel efficiency.

Mapping to Color

You can add more information by mapping a third variable to color. For example, let's color-code by origin (USA, Europe, Japan).

Aesthetics: x = weight, y = mpg, color = origin

Custom Graph scatter plot with color coding by origin: displaying USA, Europe, and Japan in different colors to visualize regional characteristics

Mapping to Size

Map horsepower to point size.

Aesthetics: x = weight, y = mpg, color = origin, size = horsepower

Custom Graph scatter plot combining color and size: color-coded by origin with point size varying by horsepower. Expresses 4 variables in one graph

Larger points indicate higher horsepower. You can visually understand the relationship: heavy, high-horsepower cars have poor fuel efficiency.

Layers - Overlaying Layers

Layers allow you to overlay multiple graphs. For example, let's add a LOESS smoothing curve on top of a scatter plot.

Layer 1: Point (x = weight, y = mpg)
Layer 2: Line + Smooth statistic (method = lm)

Custom Graph layer feature: regression line overlaid on scatter plot. Layers combine multiple graph types into one graph

The blue line is the smoothing curve. It shows the average trend as a curve.

Statistics - Statistical Transformations

You can display data not just as-is, but after statistical transformation.

Histogram (Binning)

To see the distribution of fuel efficiency, we divide the data into bins (intervals) and count them.

Aesthetics: x = mpg
Geometry: Bar
Statistics: Bin (bins = 20)

Custom Graph histogram with Bin statistical transformation: dividing mpg (fuel efficiency) data into 20 bins and counting. Visualizes data distribution and skewness

You can see that most cars are concentrated in the 15-30 mpg range. The distribution is slightly skewed to the right, with fuel-efficient cars being a minority.

Density Estimation

Instead of bins, you can express the distribution as a smooth density curve.

Aesthetics: x = mpg
Show Density Curve: Yes

Custom Graph Density statistical transformation: expressing mpg distribution as smooth density curve. Visualizes same information as histogram with continuous curve

This expresses the same information as the histogram with a continuous curve.

Comparing Densities Across Multiple Groups

By overlaying density curves for each category, you can compare distribution differences. Let's draw density curves on top of histograms for each origin.

Layer 1 (Bar):
  Aesthetics: x = mpg, fill = origin
  Geometry: Bar
  Statistics: Bin (bins = 30)

Layer 2 (Line):
  Aesthetics: x = mpg, color = origin
  Geometry: Line
  Statistics: Density (Y Scale = Count)

Custom Graph utilizing multiple layers: histogram color-coded by origin (Layer 1) with density curves (Layer 2) overlaid. Compares distribution differences between groups

Key points:

  • Layer 1 (Bar): fill = origin for color-coded bar graph
  • Layer 2 (Line): color = origin for color-coded density curves, Y Scale = Count to match scale with histogram
  • Since Bar and Line use the same color scale, colors match

It's clear that Japanese cars peak on the high fuel efficiency side, while US cars peak on the low fuel efficiency side.

Position - Position Adjustment

Position adjustment becomes important when comparing multiple categories in bar charts.

Stacked Bar Chart

Aesthetics: x = model_year, fill = cylinders
Geometry: Bar
Statistics: Count
Position: Stack

Custom Graph Position adjustment: stacked bar chart using Stack. Shows breakdown of cylinders by model_year as stacked bars

The breakdown of car types for each year is shown as stacked bars. 8-cylinder cars were common in the early 1970s, with 4-cylinder cars increasing toward the end.

Grouped Bar Chart

Changing Position to dodge displays them side by side.

Position: Dodge

Custom Graph Position adjustment: grouped bar chart using Dodge. Bars for each cylinder count placed side by side for easy comparison of trends

This makes it easier to compare trends for each cylinder count.

Coordinates - Coordinate System

Flipping Axes

Flipping histograms and bar charts horizontally makes long labels easier to read.

Aesthetics: x = mpg
Geometry: Bar
Statistics: Bin
Coordinates: Flip

Custom Graph Coordinates adjustment: histogram using Flip (axis swap). Swaps vertical and horizontal axes, optimal for displaying long labels or utilizing vertical space

The vertical and horizontal axes are swapped, displaying the histogram horizontally. Useful when category names are long or when you want to effectively use vertical space.

Facets - Facet Division

Splitting and arranging graphs by category makes subgroup comparison easier. The Facets section has two types: Facet Wrap (division by single variable) and Facet Grid (matrix division by two variables).

Facet Wrap - Division by Single Variable

Facet Wrap divides data by one categorical variable and arranges multiple panels in a grid. *Note: The variable's scale must be Nominal or Ordinal to be specified as division criterion. Here we've changed the cylinders variable to Ordinal Scale.

Aesthetics: x = weight, y = mpg
Geometry: Point
Facets: Type = Facet Wrap (Single Variable)
  Variable = cylinders

Custom Graph Facet Wrap feature: scatter plots divided and arranged by cylinders. Panel division by single variable makes subgroup comparison easy

You can compare the weight-fuel efficiency relationship side by side for 4-cylinder, 6-cylinder, and 8-cylinder cars. 8-cylinder cars are generally heavier and concentrated in the poor fuel efficiency range.

As another example, you can also divide by origin:

Aesthetics: x = weight, y = mpg
Geometry: Point
Facets: Type = Facet Wrap (Single Variable)
  Variable = origin

Custom Graph Facet feature divided by origin: three panels for europe, japan, and usa arranged horizontally to compare weight-fuel efficiency relationships by region

Panels are arranged horizontally for each origin (europe, japan, usa).

Facet Wrap has options to control panel arrangement:

  • Variable: Categorical variable to use for division
  • Columns: Number of panels per row (optional)
  • Rows: Number of columns (optional)

If only Columns is specified, row count is calculated automatically. If only Rows is specified, column count is calculated automatically. If both are omitted, optimal arrangement is calculated based on panel count.

Facet Grid - Matrix Division by Two Variables

Facet Grid uses two categorical variables to define rows and columns, enabling more complex comparisons.

Facets: Type = Facet Grid (Two Variables)
  Rows = cylinders
  Columns = origin

Custom Graph Facet Grid feature: 2D grid created with cylinders (rows) and origin (columns). Graphs arranged for each combination of cylinder count and origin for complex comparisons

Graphs are arranged for each combination of cylinder count and origin.

Scales - Scale Control

Logarithmic Scale

When data range is wide, logarithmic scale is effective.

Scales: x = log

Custom Graph Scales adjustment: scatter plot with logarithmic scale applied to X-axis. Effective when data range is wide, displaying both small and large values clearly

Color Scale

You can specify which colors to use with color scales.

Different palettes are available for continuous and categorical variables.

Aesthetics: x = weight, y = mpg, color = origin
Scales: Palette = Viridis (Discrete)

Custom Graph Scales - color palette: scatter plot with Viridis (Discrete) palette applied. Perceptually uniform, color vision friendly colors that are distinguishable in print and grayscale

Viridis (Discrete) is a palette that is perceptually uniform and accommodates color vision diversity. Optimized for categorical data, it's distinguishable even when printed or displayed in grayscale. Discrete versions of Plasma, Inferno, and Magma are also available.

Summary

Custom Graph can achieve complex visualizations by freely combining 7 elements. While there are many configuration options and it can be complex, it can accommodate a wide range of requirements.