Lets confirm though that the relationship between weather and ridership is in fact nonlinear: This graph makes it pretty clear that theres a nonlinear relationship between rides

Lets confirm though that the relationship between weather and ridership is in fact nonlinear: This graph makes it pretty clear that theres a nonlinear relationship between rides and max daily temperature. If station A starts the day with lots of bikes, but people take them out to other stations and nobody returns any bikes to A, then A will run out of bikes, and thats bad. As mentioned earlier, part of that is because Google Maps estimates dont account for time spent transacting at Citi Bike stations, and we cant guarantee that every rider in our dataset was even trying to get from point A to B quickly. Every blue dot on the map represents a single Citi Bike trip, and the small orange dots represent the 493 Citi Bike stations scattered throughout the city: If you stare at the animation for a bit, you start to see some trends. Weekend riders, on the other hand, prefer a more leisurely schedule, with most weekend rides occurring in the mid afternoon hours: Age, Gender, and the Accuracy of Google Maps Cycling Time Estimates The age and gender demographic data can. The latter group will almost certainly not follow a direct route, and so well end up calculating a very slow average speed for these trips, even if the riders were pedaling hard the entire time. Predicted, aggregating to monthly totals in order to reduce noise: I make no claim that its a perfect modelit uses imperfect data, has some smelly features and omissions, and all of the usual correlation/causation caveats applybut. Since April 2015, though, that rate has decreased to about. Lets look at rainy days: And snowy days: Rain and snow are, not surprisingly, both correlated with lower ridership.

The difference between no rain and.1 inches of rain is more significant than the difference between.5 and.6 inches. In the evening theres more volume heading south along 2nd Avenue, as workers head home to the residential neighborhoods. For the nonlinear regression, we first need to specify the form of the model, which I chose to look like this: The d variables are known values for a given date d, variables are calibrated parameters, and the capitalized functions. Bikes dropped off in Manhattans East Village have a much higher chance of being transported if theyre dropped off in the evening: While transfers are more likely in Fort Greene, Brooklyn for bikes dropped off in the morning: And. This got me wondering: how well is Citi Bikes daily ridership predicted by the weather? Both avenues head northbound, but only 8th Avenue has a protected bike path. Twenty-Two Million, citi Bike, Rides: Analyzing the NYC Bike Share System. In the conclusion of my post analyzing NYC taxi and Uber trips, I noted that Citi Bike, New York Citys bike share system, also releases public data, totaling.



The number of trips ramps up quickly between 30 and 60 degrees, but above 60 degrees or so theres a much weaker relationship between ridership and temperature. Similarly, biking is probably less popular on rainy and snowy days. That means you can find out where and when the rider dropped off the bike, which might be sensitive information. Yes, it means that for every year you get older, youre liable to.2 seconds slower on your regular Citi Bike commute route! The data lets us measure how often bikes magically transport from one station to another, even though no one took a ride. The map shows a handful of primary routes for cyclists: 8th and 9th avenues heading uptown and downtown, respectively, on the west side, and 1st and 2nd avenues heading uptown and downtown, respectively, on the east side. There were no stations added or removed around then either, so that seems like an unlikely explanation. Maybe the operator developed a smarter system to allocate bikes, which resulted in a lower transfer percentage? The Weather(d) function uses every mortgage prepayment modelers favorite formula: the s-curve.


Accordingly, for an analysis of bike speed, I restricted to the following subset of trips, which I at least weakly claim is more likely to include riders who are trying to get from point A to point B quickly: Weekdays. The Data, unlike taxis, Citi Bikes cannot pick up and drop off at any arbitrary point in the city. Lets step through the model specification, one line at a time: dtrips is the number of Citi Bike trips on date d, the dependent variable in our model. For what its worth, most Citi Bike trips start and end in Manhattan. I wonder about modeling ridership at the individual station level, especially as stations are added in the future.

