Many to Many Associations with ActiveRecord

Posted by Niky Morgan on April 15, 2018

I’ve spent the past three months in Ruby-and-Rails-land, teaching dozens of Flatiron students the basics of ActiveRecord. One of the concepts we really focus on early in our curriculum is the many-to-many relationship. We have students use the has many through ActiveRecord association for this, but at some point they stumble upon the has and belongs to many association and wonder what the differences are between the two.

Has Many Through

To properly create a relationship using has many through, users must create a join model and a join table. An example we use for this relationship is a ridesharing service. Passengers are connected to drivers, but in order to store information about a specific ride (fare, distance, etc.) we have a trip model which connects them. This join model can be named anything, but we encourage students to find a real-world word that encompasses the relationship whenever possible. If there is no word which accurately describes the relationship, we teach them the convention of naming it after both models (i.e. DriverPassenger or PassengerDriver). The pluralized table name would then be driver_passengers or passenger_drivers.

Using our Trip class to connect a driver to a passenger, models would have the below associations. Note that a passenger must have a trip in order to connect to a driver: the relationship would not exist without connecting through a trip.

class Driver < ApplicationRecord
  has_many :trips
  has_many :passengers, through: :trips
end
 
class Trip < ApplicationRecord
  belongs_to :driver
  belongs_to :passenger
end
 
class Passenger < ApplicationRecord
  has_many :trips
  has_many :drivers, through: :trips
end

A sample migration for these tables could look like the below. (Note: that a real table might have more columns of data attributes.)

class CreateTrips < ActiveRecord::Migration[5.0]
  def change
    create_table :drivers do |t|
      t.string :name
    end
 
    create_table :passengers do |t|
      t.string :name
    end
 
    create_table :trips do |t|
      t.integer :driver_id
      t.integer :passenger_id
    end
  end
end

Once the ‘has many’ association is established, the related models have specific ActiveRecord methods applied to them. Executing driver.passengers << passenger automatically creates a join model for the driver and the passenger. Executing driver.passengers = [passenger] not only creates a join model, but it also destroys all previous rows for any passengers that driver has previously been associated with.

While join models are automatically created and destroyed in those cases, deleting the trips when you delete either the passenger or driver requires further configuration. To delete an associated model, you need to specify a dependent destroy on the parent class. With the below setup, calling driver.destroy or passenger.destroy will delete all the trips connected to that driver and passenger respectively. Executing driver.delete will not delete any associated instances because the delete method only destroys the instance it is called upon. The destroy method will execute any before_destroy and after_destroy methods as well as destroying any dependent instances.

class Driver < ApplicationRecord
  has_many :trips, dependent: :destroy
  has_many :passengers, through: :trips
end

class Trip < ApplicationRecord
  belongs_to :driver
  belongs_to :passenger
end

class Passenger < ApplicationRecord
  has_many :trips, dependent: :destroy
  has_many :drivers, through: :trips
end

Note that there are no changes to the Trip class. You tell the parent models to delete the child, but the child doesn’t need to be informed of anything.

Has and Belongs to Many

Using the same ridesharing model, here is what the relationship would look like using a has and belongs to many association.

class Driver < ApplicationRecord
  has_and_belongs_to_many :passengers, join_table: 'trips'
end
 
class Passenger < ApplicationRecord
  has_and_belongs_to_many :drivers, join_table: 'trips'
end

Note the lack of a join class. We will have a join table, but there is no need to reference a join class anymore. Before we needed the Trip class to call the belongs_to method to indicate that table would hold the foreign keys. Now all necessary relationship information is passed to has_and_belongs_to_many.

In order to give our join table the name we want, we have to pass it through as an option to has_and_belongs_to_many. Without specifying the join table, its name would have defaulted to the pluralizations of the models it joins in lexical order (an alphabetical order that takes other characters into account). In this case our default join table would have been drivers_passengers. (Notice the double pluralization there.)

Speaking of our join table, while our previous migration would still have worked in this situation, there are alternate methods we can use.

def change
    create_table :trips do |t|
      t.references :passenger, foreign_key: true
      t.references :driver, foreign_key: true
    end
  end

Using the references method will still add passenger_id and driver_id column to the trips table, but using references will add an index to the column by default. It will not add a foreign key constraint by default, but that is one of the options that can be passed to it.

Without an index your database has to search though each row in a table to find column values that match your query. This search takes O(n) time where n is the number of table rows. Adding an index to a column (and a table can have multiple indexes) will speed up the database searches. Once a column is indexed, that information (along with a primary key or row id) is stored in a tree which is sorted based on the values in that column. Search, insertion and deletion all take O(log n) time in a balanced tree.

Setting the foreign_key value to true when creating a reference adds a foreign key database constraint to that column. The purpose of this is to create an association between the two tables and ensure referential integrity. At the database level (instead of at the ActiveRecord level), you are indicating that any changes to the referenced tables, should not be executed without ensuring that any relevant changes are also made to the table that has the foreign key on it. For example deleting a driver record without destroying his trips would raise an error indicating that it violates a database constraint. Note that these constraints are not currently supported by SQlite.

Summary

We teach our students has many through because it abstracts away less of the relationship setup. Students have to manually create their join classes and tables. While writing their migrations, they are deciding which tables hold the foreign keys and hopefully drawing the connection between holding a foreign key and belonging to something else. Additionally if our students decide they need to store information on the join table or add attributes/methods to a join model, it is easier with a has many through.

Using has and belongs to many doesn’t require the join class, but users must explicitly tell the classes the join table name or accept a default name. Our students spend enough time troubleshooting name errors caused by mismatched class names and filenames with ActiveRecord. We’d rather they spend their time understanding the relationships instead of memorizing more naming conventions. While has and belongs to many requires a little less work, it requires a lot more understanding of ActiveRecord.