Simulating a Bike-Share System


Whether or not you’ve signed out a bike, you’ve probably noticed various bike-sharing stations around downtown Toronto. When setting up such a system, many decisions must be made, such as where to put the stations, how many bikes each station will be capable of storing, how many bikes the overall system needs, and where the bikes should be at the start of a day.

Once a bike-share system is up and running, some of these decisions may need to be revised, based on actual system use. One way to identify potential improvements is to simulate the movement of bikes through the system and gather statistics on how well the system “performed”. For instance, if we see that the average number of rides leaving one particular station is much lower than the average for other stations, then we have identified a low-use station that could perhaps be moved to an area with higher demand for bikes.

In this assignment, you’ll construct a small simulation of a bike-share system using real-world data from Bixi in Montreal. You will read in and process data downloaded directly from the Bixi website, and use this data to run a simulation that first produces a visualization of the bikes moving from station to station across Montreal, and then reports some statistics once the simulation has ended.

Problem specifications

Please read through the following section carefully. Your final Python program must accurately handle all of the details described here.

The bike-share domain

The bike-share simulation must keep track of stations and the bike rides between individual stations. A station has a unique id number, a name, a location (represented as (longitude, latitude) coordinates), and a maximum number of bikes it can store. During the simulation, each station must also keep track of the current number of bikes at that station, and other statistics associated with that station (see below).

A ride represents the fact that a person has taken a bike from one station and then returned it at another station. Each ride has a start and end time, and start and end stations. For the purpose of visualization, we’ll also keep track of the location of each ride in progress, again in (longitude, latitude) coordinates. We’ll assume that a ride goes from its start to end station in a straight line and at a constant speed. (Incorporating things like roads or GPS coordinates of bikes while they’re travelling is more complicated, and beyond the scope of this assignment.)

Since the Bixi data does not provide it, we will not store any information about individual bikes.

Bike-share data files

There are two types of data files that you’ll be using for running this simulation. The station file is in JSON format, and we will read it into our program using the json Python library. From this we’ll extract a list of dictionaries, where each dictionary represents one station, having these keys:

  • ‘n’ corresponds to the id number of the station
  • ‘s’ corresponds to the name of the station
  • ‘la’ and ‘lo’ contain the latitude and longitude of the station, respectively
  • ‘da’ represents the number of bikes stored in the station when the simulation begins
  • ‘ba’ represents the number of unoccupied spots in the station when the simulation begins (Use this and the ‘da’ value to calculate the total capacity of the station.)

You’ll use these keys to extract the information that our simulation needs, and then store that information in instances of a Station class that you will define.

The ride file is in comma-separated values (CSV) format, which we read using the csv Python library. This converts the data into (essentially) a list of lists, where each inner list represents one ride, and the first four elements of the list store the relevant information, in the following order:

startdatetime, start station id, end datetime, end station id

The raw data has some extra fields that you should ignore. You can assume that each ride’s start time is always less than its end time. However, do not assume the rides are sorted in any particular order.

The simulation system

The simulation is initialized with a set of input data (stations and rides), and then is run for a certain time period (e.g., June 1 2017, 8-9am).

The simulation is guided by a variable that tracks the “current” time, advancing it one minute at a time. Every time the clock advances, any events that must happen at the current time are identified and dealt with. For example,

  • a new ride begins
  • an in-progress ride moves closer to its destination
  • an in-progress ride reaches its destination and ends

The simulation system has three major steps in its lifecycle.

  1. First, it loads the station data and ride data for the simulation from the raw data files.
  2. The simulation then runs its main loop. On each iteration, it does the following:
    • It advances the clock by a fixed amount.
    • It updates the objects that store the state of the simulation, based on what the ride data says about events that occurred during the time between the last clock tick and this one. This also updates various statistics being tracked at individual stations.
    • It refreshes the visualization of the simulation, so that the user can see the movement of bikes in real time as the clock ticks forward.
  3. After the main loop is complete, it aggregates the statistics from each station to report a set of overall statistics for the simulation.


Each station in the simulation keeps track of four statistics for the time period of the simulation:

  • The number of rides that started at this station during the simulation.
  • The number of rides that ended at this station during the simulation.
  • The total amount of time during the simulation, in seconds, that the station spent with at most five bikes available (“low availability”).
  • The total amount of time during the simulation, in seconds, that the station spent with at most five unoccupied spots (“low unoccupied”).

When the simulation is done, for each of these four statistics it reports the station with the highest value, giving both the station’s name and value of the statistic.

If there’s a tie, the station whose name comes first alphabetically is reported.

Because of this, you’ll need to pay careful attention to rides that:

  • start before the simulation time period but end during or after the time period
  • start during the time period but end after the time period

If a ride starts before the simulation time period, do not count it when counting the number of rides that started at each station. But if this ride ends during the simulation time period, count this ride ending. Simlarly, if a ride starts during the simulation but ends after it, count the ride start but not the ride end.

Task 0: Starter code, setup, and initial run

  1. Open the file in PyCharm. Read the docstrings for the module (at the top of the file), the Simulation class, and the Simulation initializer and run methods.
  2. Run the file. You should see a new window open up showing a blank map of Montreal. You can close the window by pressing the red X, as you would any other window.

Task 1: Initializing stations and rides

Your first task is to read in station and ride data from the files specified in the simulation.

Open, and read the docstrings for the Station and Ride classes.


  1. Complete the functions create_stations and create_rides according to their docstrings. You should be creating new Station and Ride objects here (which is why you need to know the signatures of their initializers).
  2. After you’re done, use these two helpers to initialize the all_stations and all_rides attributes in the Simulation initializer.

 Task 2: Drawing stations and rides

Open and read the docstring of the render_drawables method of the Visualizer class.

First, we need to make Station and Ride proper subclasses of Drawable so that they can be passed to render_drawables. In, modify these two classes so that they properly call their superclass’ initializer. Use the constants defined at the top of the file to refer to the correct image files.

  1. Then, implement the get_position method in each subclass according to its docstring. Since stations have a fixed location but bikes move around, get_position will be much more interesting for bikes.

 Task 3: Keeping track of time: the main simulation loop

Our next step is to “activate” the simulation by allowing time to pass, and update our visualization to only show rides that are in progress.

  1. First, add new code at the start of the Simulationrun method that uses the start and end arguments to create a loop that repeats at the times start, start + 1 minute, start + 2 minutes, etc. until it reaches end.
  2. If you’re stuck with this, think about the code you’d write to do this if start and end were integers, and you were incrementing by 1 at each iteration.
  3. Add a new instance attribute active_rides to Simulation to keep track of the rides that are in progress at the current time in the simulation.

Then complete the _update_active_rides simulation helper method according to its docstring.

  1. Call _update_active_rides inside your run loop to update the list of active rides, and then use render_drawables appropriately to draw all the stations, but only the active rides, at each loop iteration.

Check your work: when you run the simulation now, you should see bikes travelling from station to station in the visualization window! You should also only see bikes that are currently in motion—once a bike reaches its target station at its specified end time, it should disappear from the simulation.

Task 4: Data validation and tracking statistics

Now that the simulation visualization is working, we will turn our attention to the statistics we report once the simulation has finished.

  1. Modify the Station class so that it is able to report the four statistics described in the Problem specifications when the simulation ends.

You should add new instance attributes to Station to help track this information. Document them appropriately (including type annotations). This is the most open-ended part of the whole assignment! Plan out carefully how you’ll track these statistics, and how the simulation will ensure that these statistics are updated for every station.

  1. Implement the Simulationcalculate_statistics method according to its docstring. Note that this should be simpler than step 1, since it just involves aggregating data from individual stations.

Check your work: now when you run your sample program, you should see the dictionary of the statistics printed out. You can check with your friends the output you got on the sample files and configuration we provided, but the next step is to write tests of your own!

Task 5: Events and the Priority Queue

At this point, you have completed the required functionality of this assignment. To update the list of rides that are active at the current timestep, we currently look at every single ride to see if it is in progress (it has started and hasn’t yet ended).

An event is an object representing some sort of change to the state of the simulation. For this assignment, we only care about two kinds of events: a ride starting, and a ride ending.

  • Processing a “ride starting” event should include adding a ride to the list of active rides and generating a “ride ending” event to happen at the appropriate time.
  • Processing a “ride ending” event should involve remove a ride from the list of active rides. It should not generate any additional events.

We’ll store our events in a priority queue and take them out one at a time for processing. A priority queue is similar to a stack or queue: it is another implementation of the Container ADT, but where items are removed in order of priority rather than based only on when they were added. For our simulation, the priority of an event will be its event time, and earlier events will have higher priority. This will allow us to conveniently remove events from the priority queue in order according to event time. Events that are tied for event time will come out of the priority queue in queue order (first-in-first-out).

  1. Open and read through the documentation for the PriorityQueue class. Note that you can ignore all the “Generic” and “TypeVar” code, which is provided only to help PyCharm with type-checking.

Then, implement the PriorityQueue’sadd method according to its specification. You can run the doctests in this file to check your work.

  1. Open and read through the documentation of the Event class. Then, implement two subclasses of this class to represent the two types of events described above. Don’t forget about updating station statistics!

Each “Ride start” start should generate a corresponding “Ride end” event, while each “ride end” event should not generate any new events. Note that we specify the return type of process to be a list of events to make this code quite general. Even though you won’t see this on this assignment, you could imagine a type of event (like a station closure) that spawns a whole bunch of new events as a result.

  1. Modify the Simulationrun method to add one “ride start” event for each ride that occurs during the simulation time period, before the main loop starts. To keep the priority queue small, do not add corresponding “ride end” events. These will be added individually as the “ride start” events are processed.
  2. (updated) Implement the Simulation_update_active_rides_fast method, which has the same effect as _update_active_rides, but which uses the following algorithm:
    • Remove an event from the priority queue.
    • If the event’s time is after the current time, put the event back into the priority queue and return.
    • Otherwise, process the event, and add any new events generated into the priority queue.
    • Repeat until the current time reaches the simulation end time. Repeat by removing the next item from the priority queue.
  3. Modify your Simulationrun method so that it calls _update_active_rides_fast instead of _update_active_rides. If you’ve done both Task 3 and Task 5 correctly, you should be able to literally switch the method call from one to the other without making any other code changes.


 === Module Description ===

This file contains the Station and Ride classes, which store the data for the

objects in this simulation.

There is also an abstract Drawable class that is the superclass for both

Station and Ride. It enables the simulation to visualize these objects in

a graphical window.


fromdatetime import datetime

from typing import Tuple

# Sprite files

STATION_SPRITE = ‘stationsprite.png’

RIDE_SPRITE = ‘bikesprite.png’


“””A base class for objects that the graphical renderer can be drawn.

=== Public Attributes ===


The filename of the image to be drawn for this object.


sprite: str

def __init__(self, sprite_file: str) -> None:

“””Initialize this drawable object with the given sprite file.


self.sprite = sprite_file

defget_position(self, time: datetime) -> Tuple[float, float]:

“””Return the (long, lat) position of this object at the given time.



class Station(Drawable):

“””A Bixi station.

=== Public Attributes ===


the total number of bikes the station can store


the location of the station in long/lat coordinates

**UPDATED**: make sure the first coordinate is the longitude,

and the second coordinate is the latitude.

name: str

name of the station

num_bikes: int

current number of bikes at the station

start_count: int

the count of rides starting from this station

end_count: int

the count of rides ending in this station

time_low_availability: int

the total amount of time during the simulation, in seconds, that the station spent with at most five bikes available

time_low_unoccupied: int

The total amount of time during the simulation, in seconds, that the station spent with at most five unoccupied spots

=== Representation Invariants ===

– 0 <= num_bikes<= capacity


name: str

location: Tuple[float, float]

capacity: int

num_bikes: int

start_count: int

end_count: int

time_low_availability: int

.time_low_unoccupied: int

def __init__(self, pos: Tuple[float, float], cap: int,

num_bikes: int, name: str) -> None:

“””Initialize a new station.

Precondition: 0 <= num_bikes<= cap



self.location = pos

self.capacity = cap

self.num_bikes = num_bikes = name

self.start_count = 0

self.end_count = 0

self.time_low_availability = 0

self.time_low_unoccupied = 0

defget_position(self, time: datetime) -> Tuple[float, float]:

“””Return the (long, lat) position of this station for the given time.

Note that the station’s location does *not* change over time.

The <time> parameter is included only because we should not change

the header of an overridden method.



class Ride(Drawable):

“””A ride using a Bixi bike.

=== Attributes ===


the station where this ride starts


the station where this ride ends


the time this ride starts


the time this ride ends

=== Representation Invariants ===

– start_time<end_time


start: Station

end: Station

start_time: datetime

end_time: datetime

def __init__(self, start: Station, end: Station,

times: Tuple[datetime, datetime]) -> None:

“””Initialize a ride object with the given start and end information.



self.start, self.end = start, end

self.start_time, self.end_time = times[0], times[1]

defget_position(self, time: datetime) -> Tuple[float, float]:

“””Return the (long, lat) position of this ride for the given time.

A ride travels in a straight line between its start and end stations

at a constant speed.


start_lo, start_la = self.start.get_position(time)

end_lo, end_la = self.end.get_position(time)

curr_lo = start_lo + (end_lo – start_lo) * (time – self.start_time) / (self.end_time – self.start_time)

curr_la = start_la + (end_la – start_la) * (time – self.start_time) / (self.end_time – self.start_time)

returncurr_lo, curr_la

if __name__ == ‘__main__’:



‘allowed-import-modules’: [

‘doctest’, ‘python_ta’, ‘typing’,



‘max-attributes’: 15


 === Module Description ===

This module contains the Container and PriorityQueue classes.

Your only task here is to implement the add method for PriorityQueue,

according to its docstring.


from typing import Generic, List, TypeVar

# Ignore this line; it is only used to facilitate PyCharm’stypechecking.

T = TypeVar(‘T’)

class Container(Generic[T]):

“””A container that holds objects.

This is an abstract class. Only child classes should be instantiated.


def add(self, item: T) -> None:

“””Add <item> to this Container.



def remove(self) -> T:

“””Remove and return a single item from this container.



defis_empty(self) ->bool:

“””Return True iff this Container is empty.




“””A queue of items that operates in FIFO-priority order.

Items are removed from the queue according to priority; the item with the

smallest priority is removed first. In this basic implementation, each

item’s value is its priority, and we compare values simply with ‘<‘.

Ties are resolved in first-in-first-out (FIFO) order, meaning the item

that was inserted *earlier* is the first one to be removed.

All objects in the container must be of the same type.

=== Private Attributes ===

_queue: List

The end of the list represents the *front* of the queue, that is,

the next item to be removed.

=== Representation Invariants ===

– all elements of _queue are of the same type

– the elements of _queue are in non-increasing order


_queue: List[T]

def __init__(self) -> None:

“””Initialize this to an empty PriorityQueue.


self._queue = []

def add(self, item: T) -> None:

“””Add <item> to this PriorityQueue.

NOTE: See the docstring for the ‘remove’ method for a sample doctest.


idx = 0

whileidx<len(self._queue) and item <self._queue[idx]:

idx = idx + 1

self._queue.insert(idx, item)

def remove(self) -> T:

“””Remove and return the next item from this PriorityQueue.

Precondition: this priority queue is non-empty.

>>>pq = PriorityQueue()














return self._queue.pop()


“””Return True iff this PriorityQueue is empty.

>>>pq = PriorityQueue()







return not self._queue

if __name__ == ‘__main__’:

# importdoctest

# doctest.testmod()



‘allowed-import-modules’: [

‘doctest’, ‘python_ta’, ‘typing’



 === Module Description ===

This file contains the Simulation class, which is the main class for your

bike-share simulation.

At the bottom of the file, there is a sample_simulation function that you

can use to try running the simulation at any time.



fromdatetime import datetime, timedelta


from typing import Dict, List, Tuple

frombikeshare import Ride, Station

from container import PriorityQueue

from visualizer import Visualizer

# Datetime format to parse the ride data

DATETIME_FORMAT = ‘%Y-%m-%d %H:%M’

class Simulation:

“””Runs the core of the simulation through time.

=== Attributes ===


A list of all the rides in this simulation.

Note that not all rides might be used, depending on the timeframe

when the simulation is run.


A dictionary containing all the stations in this simulation.


A list of all the rides that are in progress at the current time in the simulation


A helper class for visualizing the simulation.


all_stations: Dict[str, Station]

all_rides: List[Ride]

active_rides: List[Ride]

visualizer: Visualizer

event_queue: PriorityQueue

def __init__(self, station_file: str, ride_file: str) -> None:

“””Initialize this simulation with the given configuration settings.


self.visualizer = Visualizer()

self.all_stations = create_stations(station_file)

self.all_rides = create_rides(ride_file, self.all_stations)

self.active_rides = []

self.event_queue = PriorityQueue()

def run(self, start: datetime, end: datetime) -> None:

“””Run the simulation from <start> to <end>.


step = timedelta(minutes=1)  # Each iteration spans one minute of time

for ride in self.all_rides:

ifride.start_time<= end:

event = RideStartEvent(self, ride.start_time)


time = start

while time <= end:


# update statistics

for ride in self.active_rides:

ifride.start_time == time:

ride.start.start_count += 1

ride.start.num_bikes -= 1

ifride.start.num_bikes<= 5:

ride.start.time_low_unoccupied += 60

ifride.end_time == time:

ride.end.end_count += 1

ride.end.num_bikes += 1

ifride.end.capacity – ride.end.num_bikes<= 5:

ride.end.time_low_availability += 60

all_drawables = list(self.all_stations.values()) + self.active_rides

self.visualizer.render_drawables(all_drawables, time)

time = time + step

# Leave this code at the very bottom of this method.

# It will keep the visualization window open until you close

# it by pressing the ‘X’.

while True:


return  # Stop the simulation

def _update_active_rides(self, time: datetime) -> None:

“””Update this simulation’s list of active rides for the given time.


–   Loop through `self.all_rides` and compare each Ride’s start and

end times with <time>.

If <time> is between the ride’s start and end times (inclusive),

then add the ride to self.active_rides if it isn’t already in

that list.

Otherwise, remove the ride from self.active_rides if it is in

that list.

–   This means that if a ride started before the simulation’s time

period but ends during or after the simulation’s time period,

it should still be added to self.active_rides.


for ride in self.all_rides:

if ride in self.active_rides:

if time >ride.end_time or time <ride.start_time:



if time >= ride.start_time and time <= ride.end_time:


if time >ride.start_time and time <ride.end_time:


defcalculate_statistics(self) ->Dict[str, Tuple[str, float]]:

“””Return a dictionary containing statistics for this simulation.

The returned dictionary has exactly four keys, corresponding

to the four statistics tracked for each station:

– ‘max_start’

– ‘max_end’

– ‘max_time_low_availability’

– ‘max_time_low_unoccupied’

The corresponding value of each key is a tuple of two elements,

where the first element is the name (NOT id) of the station that has

the maximum value of the quantity specified by that key,

and the second element is the value of that quantity.

For example, the value corresponding to key ‘max_start’ should be the

name of the station with the most number of rides started at that

station, and the number of rides that started at that station.


n1 = max(self.all_stations, key = lambda x: self.all_stations[x].start_count)

n2 = max(self.all_stations, key = lambda x: self.all_stations[x].end_count)

n3 = max(self.all_stations, key = lambda x: self.all_stations[x].time_low_availability)

n4 = max(self.all_stations, key = lambda x: self.all_stations[x].time_low_unoccupied)

return {

‘max_start’: (self.all_stations[n1].name, self.all_stations[n1].start_count),

‘max_end’: (self.all_stations[n2].name, self.all_stations[n2].end_count),

‘max_time_low_availability’: (self.all_stations[n3].name, self.all_stations[n3].time_low_availability),

‘max_time_low_unoccupied’: (self.all_stations[n4].name, self.all_stations[n4].time_low_unoccupied)


def _update_active_rides_fast(self, time: datetime) -> None:

“””Update this simulation’s list of active rides for the given time.


–   see Task 5 of the assignment handout


while not self.event_queue.is_empty():

event = self.event_queue.remove()

ifevent.time> time:




for e in event.process():


defcreate_stations(stations_file: str) ->Dict[str, ‘Station’]:

“””Return the stations described in the given JSON data file.

Each key in the returned dictionary is a station id,

and each value is the corresponding Station object.

Note that you need to call Station(…) to create these objects!

Precondition: stations_file matches the format specified in the

assignment handout.

This function should be called *before* _read_rides because the

rides CSV file refers to station ids.


# Read in raw data using the json library.

with open(stations_file) as file:

raw_stations = json.load(file)

stations = {}

for s in raw_stations[‘stations’]:

# Extract the relevant fields from the raw station JSON.

# s is a dictionary with the keys ‘n’, ‘s’, ‘la’, ‘lo’, ‘da’, and ‘ba’

# as described in the assignment handout.

# NOTE: all of the corresponding values are strings, and so you need

# to convert some of them to numbers explicitly using int() or float().

pos = float(s[‘lo’]), float(s[‘la’])

da, ba = int(s[‘da’]), int(s[‘ba’])

stations[s[‘n’]] = Station(pos, da + ba, da, s[‘s’])

return stations

defcreate_rides(rides_file: str,

stations: Dict[str, ‘Station’]) -> List[‘Ride’]:

“””Return the rides described in the given CSV file.

Lookup the station ids contained in the rides file in <stations>

to access the corresponding Station objects.

Ignore any ride whose start or end station is not present in <stations>.

Precondition: rides_file matches the format specified in the

assignment handout.


rides = []

with open(rides_file) as file:

for line in csv.reader(file):

# line is a list of strings, following the format described

# in the assignment handout.


# Convert between a string and a datetime object

# using the function datetime.strptime and the DATETIME_FORMAT

# constant we defined above. Example:

# >>>datetime.strptime(‘2017-06-01 8:00’, DATETIME_FORMAT)

# datetime.datetime(2017, 6, 1, 8, 0)

if line[1] in stations and line[3] in stations:

start = stations[line[1]]

end   = stations[line[3]]

times = datetime.strptime(line[0], DATETIME_FORMAT), datetime.strptime(line[2], DATETIME_FORMAT)

ride  = Ride(start, end, times)


return rides

class Event:

“””An event in the bike share simulation.

Events are ordered by their timestamp.


simulation: ‘Simulation’

time: datetime

def __init__(self, simulation: ‘Simulation’, time: datetime) -> None:

“””Initialize a new event.”””

self.simulation = simulation

self.time = time

def __lt__(self, other: ‘Event’) ->bool:

“””Return whether this event is less than <other>.

Events are ordered by their timestamp.



def process(self) -> List[‘Event’]:

“””Process this event by updating the state of the simulation.

Return a list of new events spawned by this event.




“””An event corresponding to the start of a ride.”””

def process(self) -> List[‘Event’]:

“””Process this event by updating the state of the simulation.

Return a list of new events spawned by this event.


for ride in self.simulation.all_rides:

ifride.start_time == self.time:


return [RideEndEvent(self.simulation, ride.end_time)]


“””An event corresponding to the start of a ride.”””

def process(self) -> List[‘Event’]:

“””Process this event by updating the state of the simulation.

Return a list of new events spawned by this event.


for ride in self.simulation.active_rides:

ifride.end_time == self.time:


return []

defsample_simulation() ->Dict[str, Tuple[str, float]]:

“””Run a sample simulation. For testing purposes only.”””

sim = Simulation(‘stations.json’, ‘sample_rides.csv’), 6, 1, 8, 0, 0),

datetime(2017, 6, 1, 9, 0, 0))


if __name__ == ‘__main__’:

# Uncomment these lines when you want to check your work using python_ta!

# importpython_ta

# python_ta.check_all(config={

#     ‘allowed-io’: [‘create_stations’, ‘create_rides’],

#     ‘allowed-import-modules’: [

#         ‘doctest’, ‘python_ta’, ‘typing’,

#         ‘csv’, ‘datetime’, ‘json’,

#         ‘bikeshare’, ‘container’, ‘visualizer’

#     ]

# })