Join gaming leaders, alongside GamesBeat and Facebook Gaming, for their 2nd Annual GamesBeat & Facebook Gaming Summit | GamesBeat: Into the Metaverse 2 this upcoming January 25-27, 2022. Learn more about the event. 

Achieving stable pay for drivers on Uber, Lyft, and other ride-sharing services is a challenge, owing to the conflicting interests of drivers, passengers, and the platforms themselves. But all three could benefit from improved response times and supply utilization across a city. In a paper, researchers at Boston University describe a ride-sharing coordination approach that leverages a framework and reinforcement learning environment. In experiments, they demonstrate it’s possible to fairly maximize driver earnings without compromising the rider’s experience.

To achieve welfare objectives, like a living wage for gig workers, economists tend to rely on dynamic pricing mechanisms that stifle excess demand while resolving supply-demand imbalances in specific neighborhoods. When applied to ride-sharing, however, these mechanisms can have discriminatory effects on drivers and riders. The coauthors’ alternative — a “need-based” coordination framework — provides “envy-free” recommendations, meaning drivers working at the same location and time have no reason to envy each other’s earnings.

The framework sprung from the observation that drivers typically act independently and instances of supply-demand imbalances in a city are usually restricted to particular neighborhoods. The coupling of supply and demand can help companies learn when and where drivers need to coordinate. To support that goal, the framework combines reinforcement learning algorithms with combinatorial techniques constrained by the number of supply-imbalanced neighborhoods.

Ride-hailing AI

Above: A representative illustration of improvement in mean driver earnings during training.

For the sake of simulation, the framework divides cities into sets of non-overlapping zones, with time advancing in steps and drivers traveling between zones to pick up and drop off passengers. For carrying a passenger from one zone to another, drivers earn a reward that includes their earnings minus costs like gas and regular vehicle maintenance. At each time step, drivers who aren’t currently on a trip can choose one of two actions:

  • Waiting, which involves waiting for a passenger in the current zone. If successful, it can lead to a trip to another zone for which the driver earns a reward. But when the number of drivers choosing to wait in a zone exceeds the demand of the zone at a particular time, an unsuccessful wait might occur for which the driver doesn’t earn a reward.
  • Relocating, which involves relocating without a passenger from one zone to another. Drivers incur a cost for this.

The researchers’ model alerts drivers to the best action they can take in every zone at each time step. For example, when two drivers are competing for a single ride from an origination zone, one of them gets the rider, gets paid, and ends at the destination. The other is given an alternate suggestion.

Ride-hailing AI

Above: Top: Demand fulfillment by a trained policy at different times on a representative day. Bottom: Waiting times for demand not immediately fulfilled by the model.

To evaluate the model, the coauthors built a digital environment within Gym, OpenAI’s toolkit for developing and comparing reinforcement learning algorithms. To train it, they allowed virtual drivers to repeatedly interact in the form of a city’s ride demand data:

  • In the first phase of each interaction, drivers behaved exploratorily by choosing a random action.
  • In the second phase, drivers behaved exploitatively using the policy learned up until the previous interaction. The policy recommended actions to drivers based on the time of day and their locations, but it also prevented supply-demand imbalances that might occur if (1) a number of drivers relocated to the same zone with insufficient demand or (2) if too few drivers relocated to a zone with excess demand. A metric — degree of coordination — signified the extent to which drivers located in the same zone needed to coordinate their actions. Whenever a zone has a positive degree of coordination, the policy recommended exploitative actions to a fraction of drivers within the zone.
  • In the third phase, actions recommended in the exploratory and exploitative phases resulted in drivers picking up passengers or relocating themselves to different zones.

In one experiment, the team fed their framework a public corpus of New York City Yellow cab rides containing location, fare, trip distance, and other information about 200,000 rides per day. They divvied up a single day’s worth of data — the first Monday in September 2015 — dividing it into 288 time-slices five minutes in length and indexed by their start time. Then they trained the model on hundreds of interactions among roughly 5,000 drivers.

The coauthors report that about 95% of the total demand during the day was satisfied with their framework. Moreover, 10% of the unfulfilled demand could be fulfilled by drivers nearby within 10 minutes, while over 70% could be fulfilled within 15 minutes, and drivers earned up to $535 on average by the end of the day.

AI ride

“Without coordination, we would expect all the drivers in the city to relocate to Manhattan in order to satisfy the extremely high volume of demand during the morning commute,” the team explained. “However, our model recommends a certain proportion of drivers to wait in the outer boroughs of New York City for the early morning commute to Manhattan. Notably, the model is able to learn demand trends in time-dependent hotspots such as the JFK Airport to the south-east of the city. In contrast, during the evening commute to outer boroughs, the model exceedingly recommends that the drivers wait inside Manhattan.”

Of course, it’s worth noting that New York City has far more than 5,000 drivers — the number is estimated to be around 80,000. Another consideration is the framework’s failure to account for things like demographics or trip purposes, which might impact a driver’s decision to linger in a zone.

Still, the paper contributes to a body of work suggesting current ride-sharing algorithms are severely flawed. An October 2016 report by the National Bureau of Economic Research found that in the cities of Boston and Seattle, male riders with African American names were 3 times more likely to have rides canceled and wait as much as 35% longer for rides. Another study coauthored by researchers at Northeastern University indicates users standing only a few meters apart might be charged dramatically different fares. And a preprint paper published by researchers at George Washington University presented evidence of social bias in ride-sharing pricing algorithms.


VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact. Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:
  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more
Become a member