This project compares and classifies gravel bike frames.

Notes on the project

  1. The project uses 2D scatterplots of frame measures to compare all bikes in the database across all sizes and across all frames that are spec’d to my size (generally M or 56, but this varies among makes and models). There is noise in size charts, which adds some arbitrariness to the precise location of a model, especially for the length measures that vary across frame sizes.
  2. The project uses hierarchical trees to give a sense of frame geometry similarity. There are many subjective decisions in tree building. so do not take these as objective or even fixed. The clustering is pretty stable but not rigid – adding a bike can occasionally move a frame from one cluster to another!
  3. The project uses the hierarchical trees to divide the frames spec’d to my size into three classes: “gravel-race”, “gravel-endurance”, and “gravel-trail”. There are multiple subjective decisions on the workflow to this classification.
  4. The 2D scatterplots show the bikes on the boundary between different classes and why new data or different tree algorithms can create slightly different classifications.

Some more notes

  1. This page is disorganized and some measures may be left undefined and analyses unexplained. This is a spare-time project – it will get cleaned and polished.
  2. I don’t have a permanent web location for this. Hoping to get one.

Notes on data

  1. All bike data were taken from manufacturer’s web sites. Some missing data were computed based on other measures or taken from online reviews.
  2., and are invaluable sites. This project offers a different way of comparing frames.

Some bike geometry links:

The bike geometry Bible - Everything you need to know about the shape of your bike

Frame Geometry Masterclass: Does The Evil Chamois Hagar Make ANY Sense?

MATTER of FACT: How to Understand Gravel Bike Geometry

Advanced Bicycle Frame Geometry: Steering Speed, Weight Distribution, Tipping Angles (YouTube)

# my_fit: use 176 cm (I am 175.5)
# add Breezer small to my_fit
# geobike[model == "Breezer Radar X Pro" & frame_size == "48cm (S)", my_fit := TRUE]
# add Boone 54 to my_fit
# geobike[model == "Trek Boone 6" & frame_size == "54 cm", my_fit := TRUE]

# add column of shape id for plots
shape_list <- c(15,17,19,0,2)
n_shapes <- length(shape_list)
n_models <- length(unique(geobike[, model]))
n_recycles <- floor(n_models/n_shapes)
left_over <- n_models - n_recycles*n_shapes
model_2_shape_map <- c(rep(shape_list, n_recycles), shape_list[1:left_over])
geobike[, shape_id := model_2_shape_map[as.integer(as.factor(model))]]

1.6 Center landmarks at bottom bracket

y_cols <- c("rear_x", "rear_y",
            "seat_x", "seat_y",
            "head_x", "head_y",
            "crown_x", "crown_y",
            "front_x", "front_y",
            "bottom_x", "bottom_y",
            "seattube_x", "seattube_y")

# center X at bottom bracket
geobike[, rear_x := rear_x - bottom_x]
geobike[, seat_x := seat_x - bottom_x]
geobike[, head_x := head_x - bottom_x]
geobike[, crown_x := crown_x - bottom_x]
geobike[, front_x := front_x - bottom_x]
geobike[, bottom_x := bottom_x - bottom_x]
geobike[, seattube_x := seattube_x - bottom_x]

2 Frame size classification – Initial

The goal here was to create an objective measure of frame size relevant to a rider based on measures related to the virtual front triangle (with a horizontal top-tube) but this turned out to be a fool’s errand because more progressive geometry bikes have extended top tubes and/or head tubes to increase stack and/or reach. So the frequent advice to use stack and reach is useless if using your road bike measures when purchasing many gravel bike frames.

Three measures of frame size are computed

  1. \(\texttt{stack_reach_size_geomean}\) is the geometric mean of stack and reach.
  2. \(\texttt{rider_size}\) is the geometric mean of \(\texttt{seat_tube_effective_length}\) and \(\texttt{top_tube_effective_length}\). \(\texttt{seat_tube_effective_length}\) is the size component related to the rider’s leg length. \(\texttt{top_tube_effective_length}\) is the size component related to the rider’s torso and arm length.
  3. \(\texttt{centroid_size}\) of the three vertices of the front triangle created by the top of the virtual seat tube, the top of the head tube, and the bottom bracket.
# stack + reach size
geobike[, stack_reach_size_euclid := sqrt(stack^2 + reach^2)]
geobike[, stack_reach_size_geomean := sqrt(stack * reach)]

# effective seat tube + effective top tube size
geobike[, seat_tube_effective_length :=
          sqrt((seat_x - bottom_x)^2 + (seat_y - bottom_y)^2)]
geobike[, rider_size := sqrt(seat_tube_effective_length * 

# upper triangle centroid size
geobike[, centroid_x := (seat_x + bottom_x + head_x)/3]
geobike[, centroid_y := (seat_y + bottom_y + head_y)/3]
geobike[, centroid_size := 
          sqrt((seat_x - centroid_x)^2 +
          (seat_y - centroid_y)^2 +
          (bottom_x - centroid_x)^2 +
          (bottom_y - centroid_y)^2 +
          (head_x - centroid_x)^2 +
          (head_y - centroid_y)^2)]

# bike centroid size
geobike[, bike_centroid_x := (rear_x + seat_x + head_x + crown_x + front_x + bottom_x)/3]
geobike[, bike_centroid_y := (rear_y + seat_y + head_y + crown_y + front_y + bottom_y)/3]
geobike[, bike_centroid_size := 
            (rear_x - bike_centroid_x)^2 +
              (rear_y - bike_centroid_y)^2 +
              (seat_x - bike_centroid_x)^2 +
              (seat_y - bike_centroid_y)^2 +
              (head_x - bike_centroid_x)^2 +
              (head_y - bike_centroid_y)^2 +
              (crown_x - bike_centroid_x)^2 +
              (crown_y - bike_centroid_y)^2 +
              (front_x - bike_centroid_x)^2 +
              (front_y - bike_centroid_y)^2 +
              (bottom_x - bike_centroid_x)^2 +
              (bottom_y - bike_centroid_y)^2
size <- "bike_centroid_size"
size <- geobike[, get(size)]
c.x <- geobike[, bike_centroid_x]
c.y <- geobike[, bike_centroid_y]

# do not scale
# size <- 1
# c.x <- 0
# c.y <- 0

# centroid size based on seat/headtube/bottom bracket triangle
geobike[, rear_xs := (rear_x - c.x)/size]
geobike[, rear_ys := (rear_y - c.y)/size]
geobike[, seat_xs := (seat_x - c.x)/size]
geobike[, seat_ys := (seat_y - c.y)/size]
geobike[, head_xs := (head_x - c.x)/size]
geobike[, head_ys := (head_y - c.y)/size]
geobike[, crown_xs := (crown_x - c.x)/size]
geobike[, crown_ys := (crown_y - c.y)/size]
geobike[, front_xs := (front_x - c.x)/size]
geobike[, front_ys := (front_y - c.y)/size]
geobike[, bottom_xs := (bottom_x - c.x)/size]
geobike[, bottom_ys := (bottom_y - c.y)/size]
geobike[, seattube_xs := (seattube_x - c.x)/size]
geobike[, seattube_ys := (seattube_y - c.y)/size]
my_fit <- geobike[my_fit == TRUE,]

shape_map <- setNames(geobike$shape_id, geobike$model)

nudge_percent <- 0.01
gg1 <- ggplot(data = geobike,
             aes(x = centroid_size,
                 y = rider_size,
                 color = model)) +
  geom_point_interactive(aes(tooltip = model_size,
                             data_id = model_size,
                             shape = model),
                         show.legend = FALSE) +
  scale_shape_manual(values = shape_map)

nudge_pos <- nudge_percent*(max(my_fit$stack_reach_size_geomean) - min(my_fit$stack_reach_size_geomean))

gg2 <- ggplot(data = geobike,
             aes(x = stack_reach_size_geomean,
                 y = centroid_size,
                 color = model)) +
  geom_point_interactive(aes(tooltip = model_size,
                             data_id = model_size,
                             shape = model),
                         show.legend = FALSE) +
  scale_shape_manual(values = shape_map)

nudge_pos <- nudge_percent*(max(my_fit$stack_reach_size_geomean) - min(my_fit$stack_reach_size_geomean))

gg3 <- ggplot(data = my_fit,
             aes(x = stack_reach_size_geomean,
                 y = rider_size,
                 color = model,
                 label = model)) +
  geom_point_interactive(aes(tooltip = model_size,
                             data_id = model_size),
                         show.legend = FALSE) +
  geom_text(hjust = 0, nudge_x = nudge_pos, size = 2, show.legend = FALSE)
girafe(ggobj = gg1)

girafe(ggobj = gg2)

girafe(ggobj = gg3)

The goal here is to use the frame size measures to classify the bikes into size classes. First, here is the number of bike models that offer a specific number of frame sizes.

frame_sizes_per_model <- geobike[, .(n_sizes = .N), by = .(model)]
size_dist <- frame_sizes_per_model[, .(n_models = .N), by = .(n_sizes)]
ggplot(data = size_dist,
       aes(x = n_sizes,
           y = n_models)) +
  geom_col() +
  ylab("Number of models") +
  xlab("Number of frame sizes") +
Distribution of bike models that offer a specific number of frame sizes

Figure 2.4: Distribution of bike models that offer a specific number of frame sizes

Use k-means clustering to classify into five size classes and seven size classes. The three frame size variables are the inputs.

y_cols <- c("stack_reach_size_geomean", "rider_size", "centroid_size")

y_cols <- "centroid_size"

# 5 sizes
sizes <- c("extra-small", "small", "medium", "large", "extra-large")
n_sizes <- length(sizes)
size_groups <- kmeans(x = geobike[, .SD, .SDcols = y_cols],
                                  centers = n_sizes)
sizing <- size_groups$cluster
geobike[, size_cluster_5 := sizing]
cluster_means <- geobike[, .(cluster_mean = mean(stack_reach_size_geomean)),
                         by = .(size_cluster_5)] %>%
  dplyr::arrange(cluster_mean) %>%
cluster_means[, sizes := sizes]
cluster_means <- dplyr::arrange(cluster_means, size_cluster_5)
geobike[, frame_size_5 := cluster_means$sizes[size_cluster_5]]
geobike[, frame_size_5 := factor(frame_size_5,
                                 levels = sizes)]

# 7 sizes
sizes <- c("extra-small", "small", "small-medium", "medium", "medium-large", "large", "extra-large")
n_sizes <- length(sizes)
size_groups <- kmeans(x = geobike[, .SD, .SDcols = y_cols],
                                  centers = n_sizes) 
sizing <- size_groups$cluster
geobike[, size_cluster_7 := sizing]
cluster_means <- geobike[, .(cluster_mean = mean(stack_reach_size_geomean)),
                         by = .(size_cluster_7)] %>%
  dplyr::arrange(cluster_mean) %>%
cluster_means[, sizes := sizes]
cluster_means <- dplyr::arrange(cluster_means, size_cluster_7)
geobike[, frame_size_7 := cluster_means$sizes[size_cluster_7]]
geobike[, frame_size_7 := factor(frame_size_7,
                                 levels = sizes)]
y_cols <- c("model", "frame_size", "frame_size_5", "frame_size_7")
#y_cols <- c("model", "frame_size", "frame_size_7")
# View(geobike[, .SD, .SDcols = y_cols])
gg1 <- ggplot(data = geobike,
              aes(x = frame_size_5,
                  y = top_tube_effective_length,
                  color = model,
                  shape = model)) + 
  geom_jitter_interactive(aes(tooltip = model_size,
                              data_id = model_size),
                          width = 0.2,
                          show.legend = FALSE) +
  scale_shape_manual(values = shape_map) +
  ylab("Top Tube, Effective Length (mm)")

gg2 <- ggplot(data = geobike,
              aes(x = frame_size_7,
                  y = top_tube_effective_length,
                  color = model,
                  shape = model)) + 
  geom_jitter_interactive(aes(tooltip = model_size,
                              data_id = model_size),
                          width = 0.2,
                          show.legend = FALSE) +
  scale_shape_manual(values = shape_map) +
  ylab("Top Tube, Effective Length (mm)")

girafe(ggobj = gg1)

Figure 2.5: Hover over points to identify model and frame size

# girafe(ggobj = gg2)


  1. Because the bikepacking/off-road bikes have extra high stack and/or extra long reach, the only extra-large bikes are bikepacking/off-road models and all of the all-road/race gravel bikes are classified into smaller bins then their specified size.
  2. This suggests re-classifying within style classifications.

3 Style classification

3.1 Geometric frame shape

var_labels <- c("Rear wheel X", "Rear wheel Y",
                "Seat at stack height, X",
                "Head tube X", "Head tube Y",
                "Fork crown X", "Fork crown Y",
                "Front wheel X", "Front wheel Y",
                "Bottom bracket X", "Bottom bracket Y")
  Coordinates = var_labels
) %>%
  kable() %>%
  kable_styling(full_width = FALSE)
Rear wheel X
Rear wheel Y
Seat at stack height, X
Head tube X
Head tube Y
Fork crown X
Fork crown Y
Front wheel X
Front wheel Y
Bottom bracket X
Bottom bracket Y
y_cols <- c("rear_xs", "rear_ys",
            # seat_ys is redundant with head_ys
            "head_xs", "head_ys",
            "crown_xs", "crown_ys",
            "front_xs", "front_ys",
            "bottom_xs", "bottom_ys")
geobike_subset <- geobike[my_fit == TRUE,]
scale_it <- FALSE
center_it <- FALSE
tree_geom <- get_tree(geobike_subset,
                hclust_method = "average") %>%
gg <- ggdendrogram(tree_geom, rotate = TRUE)



  1. Method – UPGMA method using landmark coordinates centered at the frame centroid and scaled by frame centroid size, for frames spec’d to my size.

3.2 Traditional measures

y_cols <- c("stack", "reach", "front_center", "rear_center", "bottom_bracket_drop", "fork_offset_rake", "head_tube_angle", "seat_tube_angle")
var_labels <- c("Stack", "Reach",
                "Front-center horizontal",
                "Rear-center horizontal",
                "Bottom bracket drop",
                "Fork offset",
                "Head tube angle",
                "Seat tube angle")
  Variables = var_labels
) %>%
  kable() %>%
  kable_styling(full_width = FALSE)
Front-center horizontal
Rear-center horizontal
Bottom bracket drop
Fork offset
Head tube angle
Seat tube angle
y_cols <- c("stack", "reach", "front_center", "rear_center", "head_tube_angle", "seat_tube_angle", "bottom_bracket_drop", "fork_offset_rake")

geobike_subset <- geobike[my_fit == TRUE,]
scale_it <- TRUE
center_it <- TRUE

# old code
# tree_v1 <- get_tree(geobike_subset,
#                 y_cols,
#                 scale_it,
#                 center_it,
#                 hclust_method = "ward.D2") %>%
#   as.dendrogram()
# gg <- ggdendrogram(tree_v1, rotate = TRUE)
# gg

tree_v1 <- get_tree(geobike_subset,
                hclust_method = "ward.D2")
tree_v1_color <- dendro_data_k(tree_v1, k = 3)
gg <- plot_ggdendro(tree_v1_color,
                      direction   = "rl",
                      expand.y    = 0.2,
                      scale.color = pal_okabe_ito)


  1. Method – Ward’s method using centered/scaled measures of frames spec’d for my height
  2. Three major clusters, from left to right
  • trail: drop-bar mtn bikes and flat-bar gravel bikes
  • all-road and race gravel
  • bikepacking

3.3 Style classification table

Using the traditional-measures tree above, the frames spec’d to my size can be classified into the three styles: All-road, Bikepacking, Trail

options(knitr.kable.NA = '')

style_class <- tree_v1_color$labels %>%
style_class[, model := tstrsplit(label, ",", keep = 1)]

cluster_labels <- numeric(3)
trail <- "Breezer Radar X Pro"
cluster_labels[style_class[model == trail, clust]] <- "Trail"
all_road <- "OPEN U.P."
cluster_labels[style_class[model == all_road, clust]] <- "All-Road"
endurance <- "Mason InSearchOf"
cluster_labels[style_class[model == endurance, clust]] <- "Endurance"

style_class[, style := cluster_labels[clust]]
style_class[, style := factor(style,
                              levels = cluster_labels)]

# add style to geobike
geobike <- plyr::join(geobike,
                      style_class[, .SD, .SDcols = c("model",
                      by = "model")
my_fit <- geobike[my_fit == TRUE,]

# dcast(setDT(DF), rowid(ID) ~ ID, value.var = "total")
# cluster_labels <- c("All-road", "Bikepacking", "Trail")

style_table <-dcast(setDT(style_class), rowid(style) ~ style, value.var = "model")[, .SD, .SDcols = cluster_labels]

style_table %>%
  kable() %>%
  kable_styling(full_width = FALSE)
Trail Endurance All-Road
Surly Ghost Grappler Tout Terrain Scrambler 28 Canyon Grizl 7 1by
Otso Fenrir Ritchey Outback frameset Lauf Siegla
Nordest Kutxo Tumbleweed Stargazer Why R+ V4
Cotic Cascade Reeb Sams Pants Wilier Rave SLR
Chumba Yaupon Genesis Vagabond Trek Checkpoint SL5
Amigo Bug Out Noble GX 5 Wilier Jena
Rondo MYLC CF Hi BlackMtnCy Monstercross V5 Trek Boone 6
Evil Chamois Hagar GRX Bearclaw Beaux Jaxon Santa Cruz Stigmata
Specialized Diverge Evo Salsa Vaya Niner RLT 9 RDO
Hudski Doggler Gravel Bombtrack Beyond 2 Otso Warakin Stainless
Breezer Radar X Pro Light Blue Darwin All-City Cosmic Stallion
Bombtrack Beyond+ Adv Salsa Fargo rear dropout Cervelo Aspero
Enigma Escape Flat-bar BlackMtnCy La Cabra Rose Backroad XPLR
Revel Rover Salsa Fargo front dropout Obed Boundary
Rondo MYLC CF Lo Cinelli Hobootleg Geo Ribble Gravel SL
Fustle Causway GR1 Panorama Taiga EXP Chumba Terlingua steel fdo
BMC URS One Kona Sutra ULTD All-City Gorilla Monsoon
BMC URS AL Mosaic GT-1X Shand Stooshie
Fiftyone Assassin long-low Salsa Cutthroat Bombtrack Hook
BMC URS AL SUS Mason InSearchOf Solace OM-3 Short
Whyte Friston Gravel Moots Routt ESC Squid Gravtron
Knolly Cache Steel Chiru Kegeti Thesis OB1
Merida Silex Open WI.DE
Fiftyone Assassin short-hi OPEN U.P.
Sonder Camino AL Blackheart All Road TI
Fezzari Shafer Pinarello Grevil F
Marin DSX 2 Cannondale SuperSix Evo
Kanzo Adventure New Salsa Warbird
Alchemy Rogue
Devinci Hatchet
No22 Drifter X
Scott Addict Gravel 10
Canyon Grail 7 1by
Specialized Diverge

4 Pairwise

4.1 Stack and Reach


  1. Stack and reach are the most common quick & dirty measure of frame size. These are imperfect measures of frame size because both measures are confounded by bike style – more bike-packing and mountain-bike inspired (“trail”) gravel bikes have high stack or long reach, or both, for their specified size class relative to all-road gravel bikes of the same size class.
gg1 <- ggplot(data = geobike,
             aes(x = reach,
                 y = stack,
                 color = model)) +
  geom_point_interactive(aes(tooltip = model_size,
                             data_id = model_size,
                             shape = model),
                         show.legend = FALSE) +
   scale_shape_manual(values = shape_map)

nudge_pos <- nudge_percent*(max(my_fit$reach) - min(my_fit$reach))

gg2 <- ggplot(data = my_fit,
             aes(x = reach,
                 y = stack,
                 color = model,
                 label = model)) +
  geom_text(hjust = 0, nudge_x = nudge_pos, size = 2, show.legend = FALSE) +
  geom_point_interactive(aes(tooltip = model_size,
                             data_id = model_size),
                         show.legend = FALSE) +
  scale_color_manual(values = pal_okabe_ito)

gg3 <- ggplot(data = my_fit,
             aes(x = reach,
                 y = stack,
                 color = style,
                 label = model)) +
  geom_text(hjust = 0, nudge_x = nudge_pos, size = 2, show.legend = FALSE) +
  geom_point_interactive(aes(tooltip = model_size,
                             data_id = model_size),
                         show.legend = FALSE) +
  scale_color_manual(values = pal_okabe_ito)
girafe(ggobj = gg1)

Figure 4.1: Hover over points to identify model and frame size

girafe(ggobj = gg3)

Figure 4.2: Hover over points to identify model and frame size

4.2 Rear-center and Front-center


  1. Rear-center and front-center here are the horizontal components. Combined, the two sum to the wheelbase.
gg1 <- ggplot(data = geobike,
             aes(x = front_center,
                 y = rear_center,
                 color = model)) +
  geom_point_interactive(aes(tooltip = model_size,
                             data_id = model_size,
                             shape = model),
                         show.legend = FALSE) +
  scale_shape_manual(values = shape_map)

nudge_pos <- nudge_percent * (max(my_fit$front_center) -
gg2 <- ggplot(data = my_fit,
             aes(x = front_center,
                 y = rear_center,
                 color = model,
                 label = model)) +
  geom_text(hjust = 0, nudge_x = nudge_pos, size = 2, show.legend = FALSE) +
  geom_point_interactive(aes(tooltip = model_size,
                             data_id = model_size),
                         show.legend = FALSE) +
  scale_color_manual(values = pal_okabe_ito)

gg3 <- ggplot(data = my_fit,
             aes(x = front_center,
                 y = rear_center,
                 color = style,
                 label = model)) +
  geom_text(hjust = 0, nudge_x = nudge_pos, size = 2, show.legend = FALSE) +
  geom_point_interactive(aes(tooltip = model_size,
                             data_id = model_size),
                         show.legend = FALSE) +
  scale_color_manual(values = pal_okabe_ito)
girafe(ggobj = gg1)

Figure 4.3: Hover over points to identify model and frame size

girafe(ggobj = gg3)

Figure 4.4: Hover over points to identify model and frame size

nudge_pos <- nudge_percent * (max(my_fit$front_wheelbase) -
gg4 <- ggplot(data = my_fit,
             aes(x = front_wheelbase,
                 y = stack_reach,
                 color = style,
                 label = model)) +
  geom_text(hjust = 0, nudge_x = nudge_pos, size = 2,
            show.legend = FALSE) +
  geom_point_interactive(aes(tooltip = model_size,
                             data_id = model_size),
                         show.legend = FALSE) +
  scale_color_manual(values = pal_okabe_ito)
girafe(ggobj = gg4)

4.3 Seat Tube Angle and Head Tube Angle

y_cols <- c("seat_tube_angle", "stack", "reach", "rear_center", "front_center", "head_tube_angle")

ggpairs(geobike[, .SD, .SDcols = y_cols])

gghistogram(data = my_fit,
            x = "seat_tube_angle",
            color = "style",
            fill = "style")
gg1 <- ggplot(data = geobike,
             aes(x = head_tube_angle,
                 y = seat_tube_angle,
                 color = model)) +
  geom_point_interactive(aes(tooltip = model_size,
                             data_id = model_size,
                             shape = model),
                         show.legend = FALSE) +
  scale_shape_manual(values = shape_map)

nudge_pos <- nudge_percent * (max(my_fit$head_tube_angle) -
gg2 <- ggplot(data = my_fit,
             aes(x = head_tube_angle,
                 y = seat_tube_angle,
                 color = model,
                 label = model)) +
  geom_text(hjust = 0, nudge_x = nudge_pos, size = 2,
            show.legend = FALSE) +
  geom_point_interactive(aes(tooltip = model_size,
                             data_id = model_size),
                         show.legend = FALSE) +
  scale_color_manual(values = pal_okabe_ito)

gg3 <- ggplot(data = my_fit,
             aes(x = head_tube_angle,
                 y = seat_tube_angle,
                 color = style,
                 label = model)) +
  geom_text(hjust = 0, nudge_x = nudge_pos, size = 2,
            show.legend = FALSE) +
  geom_point_interactive(aes(tooltip = model_size,
                             data_id = model_size),
                         show.legend = FALSE) +
  scale_color_manual(values = pal_okabe_ito)

nudge_pos <- nudge_percent * (max(my_fit$rear_wheelbase) -
gg4 <- ggplot(data = my_fit,
             aes(x = rear_wheelbase,
                 y = seat_tube_angle,
                 color = style,
                 label = model)) +
  geom_text(hjust = 0, nudge_x = nudge_pos, size = 2,
            show.legend = FALSE) +
  geom_point_interactive(aes(tooltip = model_size,
                             data_id = model_size),
                         show.legend = FALSE) +
  scale_color_manual(values = pal_okabe_ito)
girafe(ggobj = gg1)

Figure 4.5: Hover over points to identify model and frame size

girafe(ggobj = gg3)

Figure 4.6: Hover over points to identify model and frame size

girafe(ggobj = gg4)

Figure 4.7: Hover over points to identify model and frame size

4.4 Rear Center vs. Trail

gg1 <- ggplot(data = geobike,
             aes(x = rear_center,
                 y = trail,
                 color = model)) +
  geom_point_interactive(aes(tooltip = model_size,
                             data_id = model_size,
                             shape = model),
                         show.legend = FALSE) +
  scale_shape_manual(values = shape_map)

nudge_pos <- nudge_percent * (max(my_fit$rear_center) -
gg2 <- ggplot(data = my_fit,
             aes(x = rear_center,
                 y = trail,
                 color = model,
                 label = model)) +
  geom_text(hjust = 0, nudge_x = nudge_pos, size = 2, show.legend = FALSE) +
  geom_point_interactive(aes(tooltip = model_size,
                             data_id = model_size),
                         show.legend = FALSE) +
  scale_color_manual(values = pal_okabe_ito)

gg3 <- ggplot(data = my_fit,
             aes(x = rear_center,
                 y = trail,
                 color = style,
                 label = model)) +
  geom_text(hjust = 0, nudge_x = nudge_pos, size = 2, show.legend = FALSE) +
  geom_point_interactive(aes(tooltip = model_size,
                             data_id = model_size),
                         show.legend = FALSE) +
  scale_color_manual(values = pal_okabe_ito)
girafe(ggobj = gg1)

Figure 4.8: Hover over points to identify model and frame size

girafe(ggobj = gg3)

Figure 4.9: Hover over points to identify model and frame size

4.5 ratios

nudge_pos <- nudge_percent * (max(my_fit$rear_wheelbase) -
gg1 <- ggplot(data = my_fit,
             aes(x = front_wheelbase,
                 y = stack_reach,
                 color = style,
                 label = model)) +
  geom_text(hjust = 0, nudge_x = nudge_pos, size = 2,
            show.legend = FALSE) +
  geom_point_interactive(aes(tooltip = model_size,
                             data_id = model_size),
                         show.legend = FALSE) +
  scale_color_manual(values = pal_okabe_ito)

nudge_pos <- nudge_percent * (max(my_fit$rear_wheelbase) -
gg2 <- ggplot(data = my_fit,
             aes(x = front_wheelbase,
                 y = seat_tube_angle/head_tube_angle,
                 color = style,
                 label = model)) +
  geom_text(hjust = 0, nudge_x = nudge_pos, size = 2,
            show.legend = FALSE) +
  geom_point_interactive(aes(tooltip = model_size,
                             data_id = model_size),
                         show.legend = FALSE) +
  scale_color_manual(values = pal_okabe_ito)

nudge_pos <- nudge_percent * (max(my_fit$stack_reach) -
gg3 <- ggplot(data = my_fit,
             aes(x = stack_reach,
                 y = sta_hta,
                 color = style,
                 label = model)) +
  geom_text(hjust = 0, nudge_x = nudge_pos, size = 2,
            show.legend = FALSE) +
  geom_point_interactive(aes(tooltip = model_size,
                             data_id = model_size),
                         show.legend = FALSE) +
  scale_color_manual(values = pal_okabe_ito)
girafe(ggobj = gg1)
girafe(ggobj = gg2)
girafe(ggobj = gg3)

4.6 Head Tube Angle vs. Fork Offset


  1. Head Tube Angle, Fork Offset, and Head Tube length are frame geometry contributions to trail but also affect toe-overlap in small bikes, especially with wide tires. I didn’t include trail in these analysis because it is a function of wheel plus tire diameter. I could use the spec’d wheel and tire and add this.
gg1 <- ggplot(data = geobike,
             aes(x = head_tube_angle,
                 y = fork_offset_rake,
                 color = model)) +
  geom_point_interactive(aes(tooltip = model_size,
                             data_id = model_size,
                             shape = model),
                         show.legend = FALSE) +
  scale_shape_manual(values = shape_map)

nudge_pos <- nudge_percent * (max(my_fit$head_tube_angle) -
gg2 <- ggplot(data = my_fit,
             aes(x = head_tube_angle,
                 y = fork_offset_rake,
                 color = style,
                 label = model)) +
  geom_text(hjust = 0, nudge_x = nudge_pos, size = 2,
            show.legend = FALSE) +
  geom_point_interactive(aes(tooltip = model_size,
                             data_id = model_size),
                         show.legend = FALSE) +
  scale_color_manual(values = pal_okabe_ito)

gg3 <- ggplot(data = my_fit,
             aes(x = head_tube_angle,
                 y = fork_offset_rake,
                 color = style,
                 label = model)) +
  geom_text(hjust = 0, nudge_x = nudge_pos, size = 2,
            show.legend = FALSE) +
  geom_point_interactive(aes(tooltip = model_size,
                             data_id = model_size),
                         show.legend = FALSE) +
  scale_color_manual(values = pal_okabe_ito)

girafe(ggobj = gg1)

girafe(ggobj = gg2)

girafe(ggobj = gg3)

Figure 4.10: Hover over points to identify model and frame size


Principal Component Analysis is a cheap way of exploring similarity of bike frames through different 2D views of a multidimensional space.

5.1 Coordinates

Coordinates are unscaled and centered at the intersection of the bottom bracket chord and the wheelbase chord.

y_cols <- c("rear_xs", "rear_ys",
            # seat_ys is redundant with head_ys
            "head_xs", "head_ys",
            "crown_xs", "crown_ys",
            # front_ys is redundant with rear_ys
            "bottom_xs", "bottom_ys")
y_labs <- c("Rear wheel X", "Rear wheel Y",
            "Seat X",
            "Head tube X", "Head tube Y",
            "Fork Crown X", "Fork Crown Y",
            "Front wheel X",
            "Bottom Bracket X", "Bottom Bracket Y")

y_cols <- c("rear_x",
            "head_x", "head_y",
            "crown_x", "crown_y",
y_labs <- c("Rear wheel X",
            "Seat X",
            "Head tube X", "Head tube Y",
            "Fork Crown X", "Fork Crown Y",
            "Front wheel X",
            "Bottom Bracket Y")

geobike_subset <- geobike[my_fit == TRUE]
X <- geobike_subset[, .SD, .SDcols = y_cols] %>%
  scale(center = TRUE, scale = FALSE) %>%

S <- cov(X)

geo_eigen <- eigen(S)

L <- geo_eigen$values
E <- geo_eigen$vector
scores <- X %*% E
pc1 <- scores[, 1]
pc2 <- scores[, 2]
pc3 <- scores[, 3]
geobike_subset[, pc1 := pc1]
geobike_subset[, pc2 := pc2]
geobike_subset[, pc3 := pc3]

coord_loadings <- cor(cbind(scores[,1:3], X))[-(1:3), 1:3]
row.names(coord_loadings) <- y_labs
table_cap <- "Correlations (or loadings) between PCs and coordinates centered at the bottom bracket with bike facing in positive X direction (right)."
coord_loadings %>%
  kable(digits = 2,
        caption = table_cap) %>%
  kable_styling(full_width = FALSE)
Table 5.1: Correlations (or loadings) between PCs and coordinates centered at the bottom bracket with bike facing in positive X direction (right).
Rear wheel X 0.47 -0.38 -0.15
Seat X 0.59 -0.38 -0.49
Head tube X -0.38 -0.88 -0.13
Head tube Y -0.87 0.33 0.35
Fork Crown X -0.38 -0.91 0.15
Fork Crown Y -0.83 0.49 -0.25
Front wheel X -0.85 -0.50 -0.08
Bottom Bracket Y -0.18 0.28 0.06
gg1 <- ggplot(data = geobike_subset,
              aes(x = pc1,
                  y = pc2,
                  color = model,
                  shape = model)) +
  geom_point_interactive(aes(tooltip = model_size,
                             data_id = model_size),                          show.legend = FALSE) +
  scale_shape_manual(values = shape_map) +

gg1b <- ggplot(data = geobike_subset,
              aes(x = pc1,
                  y = pc2,
                  color = style,
                  shape = model)) +
  geom_point_interactive(aes(tooltip = model_size,
                             data_id = model_size),                          show.legend = FALSE) +
  scale_shape_manual(values = shape_map) +
  coord_fixed() +
  scale_color_manual(values = pal_okabe_ito)

gg2 <- ggplot(data = geobike_subset,
              aes(x = pc1,
                  y = pc3,
                  color = model,
                  shape = model)) +
  geom_point_interactive(aes(tooltip = model_size,
                             data_id = model_size),                          show.legend = FALSE) +
  scale_shape_manual(values = shape_map) +

gg2b <- ggplot(data = geobike_subset,
              aes(x = pc1,
                  y = pc3,
                  color = style,
                  shape = model)) +
  geom_point_interactive(aes(tooltip = model_size,
                             data_id = model_size),                          show.legend = FALSE) +
  scale_shape_manual(values = shape_map) +
  coord_fixed() +
  scale_color_manual(values = pal_okabe_ito)

gg3 <- ggplot(data = geobike_subset,
              aes(x = pc2,
                  y = pc3,
                  color = model,
                  shape = model)) +
  geom_point_interactive(aes(tooltip = model_size,
                             data_id = model_size),                          show.legend = FALSE) +
  scale_shape_manual(values = shape_map) +

gg3b <- ggplot(data = geobike_subset,
              aes(x = pc2,
                  y = pc3,
                  color = style,
                  shape = model)) +
  geom_point_interactive(aes(tooltip = model_size,
                             data_id = model_size),                          show.legend = FALSE) +
  scale_shape_manual(values = shape_map) +
  coord_fixed() +
  scale_color_manual(values = pal_okabe_ito)


  1. High PC1 describes a frame with a short front center and short stack.
  2. High PC2 describes a frame with short reach.
girafe(ggobj = gg1b)

Figure 5.1: Hover over points to identify model and frame size


  1. High PC2 describes a frame with small reach.
  2. PC3 describes noise
girafe(ggobj = gg2b)

Figure 5.2: Hover over points to identify model and frame size

girafe(ggobj = gg3b)

Figure 5.3: Hover over points to identify model and frame size

5.2 Traditional measures and angles


  1. PCA using the centered and scaled measures used to compute the dendrogram above and the classification.
y_cols <- c("stack", "reach", "front_center", "rear_center", "bottom_bracket_drop", "fork_offset_rake", "head_tube_angle", "seat_tube_angle")
y_labs <- c("stack", "reach", "front center", "rear center", "bottom bracket drop", "fork offset", "head tube angle", "seat tube angle")

geobike_subset <- geobike[my_fit == TRUE]
X <- geobike_subset[, .SD, .SDcols = y_cols] %>%

S <- cov(X)

geo_eigen <- eigen(S)

L <- geo_eigen$values
E <- geo_eigen$vector
scores <- X %*% E
geobike_subset[, pc1 := scores[, 1]]
geobike_subset[, pc2 := scores[, 2]]
geobike_subset[, pc3 := scores[, 3]]

coord_loadings <- cor(cbind(scores[,1:3], X))[-(1:3), 1:3]
row.names(coord_loadings) <- y_labs
table_cap <- "Correlations (or loadings) between PCs and traditional frame measures."

coord_loadings %>%
  kable(digits = 2,
        caption = table_cap) %>%
  kable_styling(full_width = FALSE)
Table 5.2: Correlations (or loadings) between PCs and traditional frame measures.
stack -0.65 -0.53 0.13
reach -0.64 0.64 0.07
front center -0.97 0.12 0.04
rear center -0.39 -0.70 -0.02
bottom bracket drop -0.03 0.31 -0.75
fork offset -0.08 -0.42 -0.72
head tube angle 0.90 0.02 0.07
seat tube angle -0.23 0.55 -0.13
gg1 <- ggplot(data = geobike_subset,
              aes(x = pc1,
                  y = pc2,
                  color = style,
                  shape = model)) +
  geom_point_interactive(aes(tooltip = model_size,
                             data_id = model_size),
                         show.legend = FALSE) +
  scale_shape_manual(values = shape_map) +
  coord_fixed() +
  scale_color_manual(values = pal_okabe_ito)

gg2 <- ggplot(data = geobike_subset,
              aes(x = pc1,
                  y = pc3,
                  color = style,
                  shape = model)) +
  geom_point_interactive(aes(tooltip = model_size,
                             data_id = model_size),
                         show.legend = FALSE) +
  scale_shape_manual(values = shape_map) +
  coord_fixed() +
  scale_color_manual(values = pal_okabe_ito)

gg3 <- ggplot(data = geobike_subset,
              aes(x = pc2,
                  y = pc3,
                  color = style,
                  shape = model)) +
  geom_point_interactive(aes(tooltip = model_size,
                             data_id = model_size),
                         show.legend = FALSE) +
  scale_shape_manual(values = shape_map) +
  coord_fixed() +
  scale_color_manual(values = pal_okabe_ito)
girafe(ggobj = gg1)

Figure 5.4: Hover over points to identify model and frame size


  1. High PC1 describes a frame with a short front-center, steep head-angle, and short stack and reach.
  2. High PC2 describes a frame with long reach, a small rear-center, and high seat tube angle
girafe(ggobj = gg2)

Figure 5.5: Hover over points to identify model and frame size


  1. High PC1 describes a frame with a long front-center, slack head-angle, and high stack.
  2. High PC3 describes a frame with a large bottom bracket drop (a low bottom bracket)
girafe(ggobj = gg3)

Figure 5.6: Hover over points to identify model and frame size


  1. High PC2 describes a frame with long reach, a small rear-center, and high seat tube angle
  2. High PC3 describes a frame with a small bottom bracket drop (a high bottom bracket)

6 Style Re-classification

6.1 Traditional measures – reduced set

y_cols <- c("stack", "reach", "front_center", "rear_center", "head_tube_angle", "seat_tube_angle")
var_labels <- c("Stack", "Reach",
                "Front-center horizontal",
                "Rear-center horizontal",
                "Head tube angle", "Seat tube angle")
  Variables = var_labels
) %>%
  kable() %>%
  kable_styling(full_width = FALSE)
Front-center horizontal
Rear-center horizontal
Head tube angle
Seat tube angle
y_cols <- c("stack", "reach", "front_center", "rear_center", "head_tube_angle", "seat_tube_angle")

geobike_subset <- geobike[my_fit == TRUE,]
scale_it <- TRUE
center_it <- TRUE

tree_v2 <- get_tree(geobike_subset,
                hclust_method = "ward.D2")
tree_v2_color <- dendro_data_k(tree_v2, k = 3)
gg <- plot_ggdendro(tree_v2_color,
                      direction   = "lr",
                      expand.y    = 0.2,
                      scale.color = pal_okabe_ito)


  1. Method – Ward’s method using centered/scaled measures of frames spec’d for my height
  2. Three major clusters, from left to right
  • trail: drop-bar mtn bikes and flat-bar gravel bikes
  • all-road and race gravel
  • bikepacking

6.2 Style re-classification table

options(knitr.kable.NA = '')

style_class <- tree_v2_color$labels %>%
style_class[, model := tstrsplit(label, ",", keep = 1)]

cluster_labels <- numeric(3)
trail <- "Breezer Radar X Pro"
cluster_labels[style_class[model == trail, clust]] <- "Trail"
all_road <- "OPEN U.P."
cluster_labels[style_class[model == all_road, clust]] <- "All-Road"
endurance <- "Mason InSearchOf"
cluster_labels[style_class[model == endurance, clust]] <- "Endurance"

style_class[, restyle := cluster_labels[clust]]
style_class[, restyle := factor(restyle,
                              levels = cluster_labels)]

# add style to geobike
geobike <- plyr::join(geobike,
                      style_class[, .SD, .SDcols = c("model", "restyle")],
                      by = "model")
my_fit <- geobike[my_fit == TRUE,]

style_table <-dcast(setDT(style_class), rowid(restyle) ~ restyle, value.var = "model")[, .SD, .SDcols = cluster_labels]

style_table %>%
  kable() %>%
  kable_styling(full_width = FALSE)
Trail Endurance All-Road
Enigma Escape Flat-bar Tout Terrain Scrambler 28 Bombtrack Hook
Revel Rover Ritchey Outback frameset Rose Backroad XPLR
Merida Silex Bombtrack Beyond 2 All-City Cosmic Stallion
Knolly Cache Steel Light Blue Darwin Ribble Gravel SL
Whyte Friston Gravel Bearclaw Beaux Jaxon All-City Gorilla Monsoon
Fiftyone Assassin short-hi Salsa Vaya Cervelo Aspero
BMC URS One Genesis Vagabond Cannondale SuperSix Evo
Marin DSX 2 Otso Warakin Stainless No22 Drifter X
BMC URS AL Salsa Cutthroat Chumba Terlingua steel fdo
Fiftyone Assassin long-low Mason InSearchOf Shand Stooshie
BMC URS AL SUS Moots Routt ESC Pinarello Grevil F
Sonder Camino AL Chiru Kegeti Open WI.DE
Fezzari Shafer Salsa Fargo rear dropout Solace OM-3 Short
Kanzo Adventure New Salsa Fargo front dropout Squid Gravtron
Mosaic GT-1X Cinelli Hobootleg Geo OPEN U.P.
Cotic Cascade Panorama Taiga EXP Thesis OB1
Chumba Yaupon Tumbleweed Stargazer Blackheart All Road TI
Bombtrack Beyond+ Adv Reeb Sams Pants Wilier Jena
Breezer Radar X Pro Kona Sutra ULTD Specialized Diverge
Surly Ghost Grappler BlackMtnCy La Cabra Trek Boone 6
Specialized Diverge Evo Santa Cruz Stigmata
Fustle Causway GR1 Noble GX 5
Otso Fenrir Obed Boundary
Amigo Bug Out Salsa Warbird
Nordest Kutxo Niner RLT 9 RDO
Rondo MYLC CF Lo BlackMtnCy Monstercross V5
Rondo MYLC CF Hi Trek Checkpoint SL5
Evil Chamois Hagar GRX Canyon Grail 7 1by
Hudski Doggler Gravel Canyon Grizl 7 1by
Wilier Rave SLR
Scott Addict Gravel 10
Alchemy Rogue
Devinci Hatchet
Lauf Siegla
Why R+ V4

6.3 Pairwise V2

nudge_pos <- nudge_percent * (max(my_fit$reach) -
gg1 <- ggplot(data = my_fit,
             aes(x = reach,
                 y = stack,
                 color = restyle,
                 label = model)) +
  geom_text(hjust = 0, nudge_x = nudge_pos, size = 2, show.legend = FALSE) +
  geom_point_interactive(aes(tooltip = model_size,
                             data_id = model_size),
                         show.legend = FALSE) +
  scale_color_manual(values = pal_okabe_ito)

gg2 <- ggplot(data = my_fit,
             aes(x = front_center,
                 y = rear_center,
                 color = restyle,
                 label = model)) +
  geom_text(hjust = 0, nudge_x = nudge_pos, size = 2, show.legend = FALSE) +
  geom_point_interactive(aes(tooltip = model_size,
                             data_id = model_size),
                         show.legend = FALSE) +
  scale_color_manual(values = pal_okabe_ito)

nudge_pos <- nudge_percent * (max(my_fit$head_tube_angle) -
gg3 <- ggplot(data = my_fit,
             aes(x = head_tube_angle,
                 y = seat_tube_angle,
                 color = restyle,
                 label = model)) +
  geom_text(hjust = 0, nudge_x = nudge_pos, size = 2, show.legend = FALSE) +
  geom_point_interactive(aes(tooltip = model_size,
                             data_id = model_size),
                         show.legend = FALSE) +
  scale_color_manual(values = pal_okabe_ito)

nudge_pos <- nudge_percent * (max(my_fit$rear_center) -
gg4 <- ggplot(data = my_fit,
             aes(x = rear_center,
                 y = trail,
                 color = restyle,
                 label = model)) +
  geom_text(hjust = 0, nudge_x = nudge_pos, size = 2, show.legend = FALSE) +
  geom_point_interactive(aes(tooltip = model_size,
                             data_id = model_size),
                         show.legend = FALSE) +
  scale_color_manual(values = pal_okabe_ito)
girafe(ggobj = gg1)
girafe(ggobj = gg2)
girafe(ggobj = gg3)
girafe(ggobj = gg4)

6.4 Ratio measures

y_cols <- c("stack_reach", "front_wheelbase", "sta_hta")
var_labels <- c("Stack:Reach",
  Variables = var_labels
) %>%
  kable() %>%
  kable_styling(full_width = FALSE)
y_cols <- c("stack_reach", "front_wheelbase", "sta_hta")

geobike_subset <- geobike[my_fit == TRUE,]
scale_it <- TRUE
center_it <- TRUE
dendro_v2_ratios <- get_tree(geobike_subset,
                hclust_method = "ward.D2")
dendro_v2_ratios_color <- dendro_data_k(dendro_v2_ratios, k = 3)
gg <- plot_ggdendro(dendro_v2_ratios_color,
                      direction   = "lr",
                      expand.y    = 0.2,
                      scale.color = pal_okabe_ito)

6.5 ratios v2

front_wheelbase is the ratio \(\frac{frontcenter}{wheelbase}\), where frontcenter is the horizontal component of the bottom-bracket to front-wheel-axle chord.

nudge_pos <- nudge_percent * (max(my_fit$front_wheelbase) -
gg1 <- ggplot(data = my_fit,
             aes(x = front_wheelbase,
                 y = stack_reach,
                 color = restyle,
                 label = model)) +
  geom_text(hjust = 0, nudge_x = nudge_pos, size = 2, show.legend = FALSE) +
  geom_point_interactive(aes(tooltip = model_size,
                             data_id = model_size),
                         show.legend = FALSE) +
  scale_color_manual(values = pal_okabe_ito)

nudge_pos <- nudge_percent * (max(my_fit$front_wheelbase) -
gg2 <- ggplot(data = my_fit,
             aes(x = front_wheelbase,
                 y = seat_tube_angle/head_tube_angle,
                 color = restyle,
                 label = model)) +
  geom_text(hjust = 0, nudge_x = nudge_pos, size = 2, show.legend = FALSE) +
  geom_point_interactive(aes(tooltip = model_size,
                             data_id = model_size),
                         show.legend = FALSE) +
  scale_color_manual(values = pal_okabe_ito)

nudge_pos <- nudge_percent * (max(my_fit$stack_reach) -
gg3 <- ggplot(data = my_fit,
             aes(x = stack_reach,
                 y = sta_hta,
                 color = restyle,
                 label = model)) +
  geom_text(hjust = 0, nudge_x = nudge_pos, size = 2, show.legend = FALSE) +
  geom_point_interactive(aes(tooltip = model_size,
                             data_id = model_size),
                         show.legend = FALSE) +
  scale_color_manual(values = pal_okabe_ito)
girafe(ggobj = gg1)
girafe(ggobj = gg2)
girafe(ggobj = gg3)