Using characteristics of rail-trails to predict how they are rated

Written by: Josh Rosenberg

Primary Source:  Joshua M. Rosenberg – August 2, 2017

Catching up

I wrote a blog post (one that, to be honest, I liked a lot) on what the best rail-trails are in Michigan (here). A friend and colleague at MSU, Andy, noticed that paved trails seemed to be rated higher, and this as well as my cfriend and colleague Kristy’s comment about how we can use the output of the the previous post sparked my curiosity in trying to figure out what characteristics predict how highly (or not highly) rated trails are.

Let’s start the same way we did before.

library(tidyverse)
library(lme4)
library(stringr)
library(sjPlot)
library(forcats)

df <- read_rds("/Users/joshuarosenberg/Dropbox/1_Research/railtrail/data/mi.rds")

df <- df %>% 
    unnest(raw_reviews) %>% 
    filter(!is.na(raw_reviews)) %>% 
    rename(raw_review = raw_reviews,
           trail_name = name)

We’ll process the data a bit.

df <- df %>% 
    mutate(category = as.factor(category),
           category = fct_recode(category, "Greenway/Non-RT" = "Canal"),
           mean_review = ifelse(mean_review == 0, NA, mean_review))

df <- mutate(df,
             surface_rc = case_when(
                 surface == "Asphalt" ~ "Paved",
                 surface == "Asphalt, Concrete" ~ "Paved",
                 surface == "Concrete" ~ "Paved",
                 surface == "Asphalt, Boardwalk" ~ "Paved",
                 str_detect(surface, "Stone") ~ "Crushed Stone",
                 str_detect(surface, "Ballast") ~ "Crushed Stone",
                 str_detect(surface, "Gravel") ~ "Crushed Stone",
                 TRUE ~ "Other"
             )
)

df$surface_rc <- as.factor(df$surface_rc)

df$surface_rc <- fct_relevel(df$surface_rc,
                             "Crushed Stone")

Note that the other category includes surfaces like dirt and grass.

The model we built (take one)

Previously we fit a model like this:

m1 <- lmer(raw_review ~ 1 + (1|trail_name), data = df)

This model basically estimated the rating for each trail, taking account not only of the 1 – 5 ratings and how different they are from the “average” review across every trail. In short, it estimates ratings that are less biased by how many reviews there are.

Building a model (take two)

It’s a bit boring, and to extend this, we can add the variables for surface (paved, crushed stone, or other), category (rail-trail or greenway), and distance. You can focus on the B in the table above. The intercept represents the overall average rating, which is 3.50. The B for distance represents the increase in rating for each 1-mile increase in distance (0.00!). This suggests longer trails are not necessarily more highly rated, and the p (0.895) – which we use conventionally to find out whether the B is significant if it is below 0.05 – supports this interpretation.

Similarly, the B for rail-trail compared to greenways is small (and the pis far greater than 0.05) as is the case for other surfaces compared to crushed stone (B = -0.35p = 0.318), but paved surfaces do seem different. They are associated with a rating almost half a point (B = 0.37p = 0.037) higher than other trails, and almost a whole point (0.72) higher than other surfaces.

m2 <- lmer(raw_review ~ 1 + distance + category + surface_rc + (1|trail_name), data = df)

sjt.lmer(m2, p.kr = F, show.re.var = F, show.ci = F, show.se = T)
raw_review
B std. Error p
Fixed Parts
(Intercept) 3.30 1.38 .017
distance (0.4 miles) 1.29 1.64 .431
distance (1.5 miles) 0.00 1.30 1.00
distance (1.6 miles) 0.23 1.51 .881
distance (1.7 miles) 0.29 1.64 .859
distance (1.8 miles) 1.07 1.56 .494
distance (1.9 miles) 1.32 1.69 .434
distance (10.1 miles) 0.66 1.49 .659
distance (10.2 miles) 1.85 1.25 .140
distance (10.4 miles) 1.02 1.56 .513
distance (10.7 miles) 0.52 1.67 .754
distance (11 miles) 0.58 1.45 .688
distance (12 miles) 0.71 1.23 .563
distance (12.2 miles) 0.79 1.56 .612
distance (13 miles) 0.29 1.58 .854
distance (14 miles) -0.24 1.22 .845
distance (15.5 miles) 1.17 1.54 .449
distance (16 miles) -0.06 1.53 .968
distance (16.3 miles) 1.21 1.66 .465
distance (16.6 miles) 1.18 1.56 .450
distance (16.9 miles) -1.68 1.73 .333
distance (17 miles) -0.39 1.66 .814
distance (17.8 miles) 0.50 1.26 .693
distance (19 miles) 1.29 1.64 .431
distance (19.5 miles) 0.48 1.66 .773
distance (19.7 miles) 0.33 1.53 .828
distance (2 miles) 0.07 1.39 .961
distance (2.2 miles) 0.87 1.41 .536
distance (2.3 miles) 0.29 1.64 .859
distance (2.4 miles) 1.70 1.66 .307
distance (2.5 miles) 1.13 1.44 .433
distance (2.7 miles) 0.30 1.56 .846
distance (2.8 miles) -0.71 1.58 .654
distance (20 miles) -0.43 1.66 .797
distance (20.9 miles) 0.79 1.56 .613
distance (21.2 miles) 1.33 1.53 .385
distance (21.7 miles) -1.33 1.57 .395
distance (22 miles) -2.33 1.61 .147
distance (22.1 miles) 1.00 1.53 .515
distance (22.2 miles) 0.29 1.57 .853
distance (22.7 miles) 0.83 1.56 .594
distance (23.5 miles) 0.99 1.56 .526
distance (23.7 miles) 0.57 1.56 .713
distance (25 miles) 0.69 1.56 .658
distance (25.3 miles) 0.22 1.56 .888
distance (25.9 miles) -0.21 1.44 .886
distance (26 miles) 0.42 1.53 .786
distance (3.2 miles) 1.29 1.64 .431
distance (3.4 miles) 0.29 1.60 .855
distance (3.5 miles) 0.31 1.50 .837
distance (3.8 miles) 1.70 1.70 .319
distance (30 miles) 0.91 1.56 .559
distance (32 miles) -0.10 1.42 .945
distance (33.2 miles) -0.01 1.66 .995
distance (34 miles) 1.33 1.53 .384
distance (34.2 miles) -0.10 1.53 .948
distance (37.5 miles) 0.29 1.57 .853
distance (38 miles) -2.33 1.61 .147
distance (4 miles) 0.97 1.48 .515
distance (4.1 miles) 1.32 1.73 .445
distance (4.2 miles) -0.19 1.49 .897
distance (4.3 miles) -0.01 1.66 .995
distance (4.5 miles) 1.26 1.66 .448
distance (4.9 miles) 0.32 1.73 .852
distance (41.4 miles) 0.24 1.54 .878
distance (41.5 miles) 0.67 1.61 .680
distance (41.8 miles) 0.97 1.56 .532
distance (42 miles) 0.50 1.54 .746
distance (47 miles) 0.67 1.61 .680
distance (5.3 miles) 0.70 1.66 .675
distance (5.4 miles) 1.02 1.47 .486
distance (53 miles) 0.87 1.54 .575
distance (6 miles) 1.29 1.60 .419
distance (6.2 miles) 0.82 1.56 .598
distance (6.5 miles) 0.81 1.52 .593
distance (6.6 miles) -0.43 1.67 .799
distance (6.8 miles) 0.59 1.41 .674
distance (60 miles) 0.67 1.61 .680
distance (62 miles) 1.02 1.53 .507
distance (7 miles) 0.29 1.64 .859
distance (71 miles) 0.83 1.54 .589
distance (8 miles) 0.69 1.48 .640
distance (8.2 miles) -0.99 1.57 .525
distance (8.3 miles) 1.18 1.66 .478
distance (8.6 miles) -1.00 1.55 .519
distance (8.9 miles) 1.45 1.53 .343
distance (9 miles) 0.17 1.48 .910
distance (9.5 miles) 1.12 1.67 .500
distance (9.8 miles) 0.83 1.33 .530
distance (92.6 miles) 0.92 1.53 .546
category (Rail-Trail) 0.03 0.37 .931
surface_rc (Other) -0.33 0.87 .701
surface_rc (Paved) 0.37 0.44 .399
Random Parts
Ntrail_name 116
ICCtrail_name 0.469
Observations 2649
R2 / Ω02 .347 / .347

Note that the arguments to sjt.lmer() only have to do with what output is produced.

So, where are we (really) riding next?

This suggests that if you want to ride a good rail-trail, first and foremost pick one that is paved, while whether a trail is technically a rail-trail or a greenway, and whether the trail is short or long, do not seem to matter. Although we will explore this more in later posts.

The following two tabs change content below.
Joshua M. Rosenberg is a Ph.D. student in the Educational Psychology and Educational Technology program at Michigan State University. In his research, Joshua focuses on how social and cultural factors affect teaching and learning with technologies, in order to better understand and design learning environments that support learning for all students. Joshua currently serves as the associate chair for the Technological Pedagogical Content Knowledge (TPACK) Special Interest Group in the Society for Information Technology and Teacher Education. Joshua was previously a high school science teacher, and holds degrees in education (M.A.) and biology (B.S.).

Latest posts by Josh Rosenberg (see all)