Saving Database Space with Rails Enums

Posted by Niky Morgan on July 11, 2018

I have an app that is running out of database space. It is a shared Rails project with colleagues and we host the app on Heroku, so it has a 10,000 row limit. Since I clean up the database regularly to remove unused data, I decided to focus on other fixes that might decrease the amount of space we are utilizing.

One of the tables is currently focused on adding categories or labels to another model (an Activity model). It is a many to many relationship, so this also requires a join table. After using this feature for a few months, it has become obvious that we only ever use a few different labels, so I decided to deprecate the table. Using a table for Labels means that for every row in our Activities table, we have a row in our Activities_Labels join table. Since Activities is one of our most-used tables, that means we are literally doubling the amount of rows storing an activity requires.

The fix is to store one label (from a finite list of possible labels) as an attribute on the Activities table. True, we are losing the ability to have many labels linked to one activity, but for us using the database space more efficiently is a priority over the edge case of supporting multiple labels.

With the decision made to remove the Labels table and instead store information on the Activities table, the next decision is how to store this information. Our database is PostgreSQL, so that means our data options are limited to:

Boolean
Character types such as char, varchar, and text.
Numeric types such as integer and floating-point number
Temporal types such as date, time, timestamp, and interval
UUID for storing Universally Unique Identifiers
Array for storing array strings, numbers, etc.
JSON stores JSON data
hstore stores key-value pair
Special types such as network address and geometric data

We could store these label names as strings, but that leaves some room for error. If there is a difference in capitalization or pluralization, we could probably validate and correct for that. However there is always the chance for a larger spelling error or difference. Instead we want to hard-code the label choices so the user can only use one of our preset options.

The solution then is to use ActiveRecord Enums. Enums are an interface ActiveRecord provides so developers can store attributes as integers in the database, but query them by name. We provide ActiveRecord with a list of options for this attribute as an array of symbols, and ActiveRecord does the work of translating each value into an integer and storing/retrieving it for us.

How do we implement this Rails magic? In this case, I had four main phases.

  • Set up the Activities table for the new column
  • Migrate information from our to-be-deprecated Activities_Labels table
  • Remove the Labels and Activities_Labels tables
  • Update the views and controller

The Set-Up

First we create a migration for adding the column to our Activities table. Since the enum will be stored as an integer in our table, the migration should specify the data type as an integer. We also need to let our model know what these integers correspond to. We tell it to add an enum and the attribute name, then we pass it an array of the attribute options.

# db/migrate/...
class AddLabelToActivity < ActiveRecord::Migration
  def change
    add_column :activities, :label, :integer
  end
end

# app/models/activity.rb
class Activity < ActiveRecord::Base
# ...
enum label: [
    :knitting,
    :playing_trombone,
    :coding, 
    :mma
  ]
# ...
end

As you can see above, knitting maps to 0 in my database, playing trombone to 1, coding to 2 and mma to 3. If we need to add any options in the future, we can simply add them to the end of this array. As long as we don’t change the order of the existing options, ActiveRecord will still make the correct connections between the integer and label name. If at any point we forget the mapping or want to view the labels, ActiveRecord has written a class method on the model which will display all the enum options as a hash. We also have methods to access all the activities matching a specific label.

Activity.labels # returns {"knitting"=>0, "playing trombone"=>1, "coding"=>2, "mma"=>3}
Activity.knitting # returns an array of activity objects there the label is knitting

Don’t you love it when Rails does your job for you?

Data Migration

To move data from the old table to the new column, we can write a Rake migration to update records and surface any records that needed to be manually updated (e.g. had multiple labels or no labels). Any ‘manual update’ records require us to inspect them, pick a category and then assign it by hand via the console.

Since the label names are remaining constant, we can take the original label and snake case it to get the new label. Since the enums are symbols, all multi-word labels need to be snake-cased.

# Rakefile
task :relabel_activities => [ :environment ] do
  manual_updates = []
  Activity.all.each do |activity|
    if activity.labels.length == 1
      label_name = activity.labels.first.name.downcase.gsub(' ', '_')
      activity.update(label: label_name)
      puts "#{activity.id}. #{activity.name} successfully relabeled"
    else
      manual_updates.push(activity)
    end
  end
  puts "********"
  puts "Please manually update the below activities"
  manual_updates.each do |activity|
    puts "#{activity.id}. #{activity.name}: #{activity.labels.map(&:name).join(", ")}" 
  end
end

After we merge this code into master, we should run the migration from Heroku console, update the manual update records by hand and then remove the migration in a subsequent commit. It will always be in the project’s git history if we need to reference it in the future.

Deprecate Tables

Now that all the data is stored on the new column, we can safely remove the Labels and Activity_Labels tables through a simple drop table migration.

# db/migrate/...
class DropTablesLabelsAndActivityLabels < ActiveRecord::Migration[5.1]
  def change
    drop_table :labels
    drop_table :activity_labels
  end
end

We should only drop the tables when we are confident that the data is safely stored elsewhere.

Update Views and Controller

In the same pull request as the table deprecation, we can make some changes to the activity views and controller. Since an activity now has only one label, we need to change the views accordingly. Wherever we reference labels, we need to now reference label. Also before the labels method returned an array of objects, but now the label method returns a snake-cased string. For readability, we can display the label titleized. This is a Rails method which will remove the snake-casing and capitalize every word. Thus in the views instead of seeing playing_trombone users will see Playing Trombone.

One of the labels requires one extra step of configuration. MMA is an abbreviation for Mixed Martial Arts. While Rails has a fairly large vocabulary, it doesn’t recognize some acronyms. To configure this acronym, we have to update the inflections config file. You can see below that we are telling it to recognize mma as an acronym. Now it will know that a lowercase mma should titleize to MMA instead of Mma. We can also use this file to specify irregular pluralizations like alumnus to alumni.

# config/initializers/inflections.rb
ActiveSupport::Inflector.inflections(:en) do |inflect|
  inflect.acronym 'MMA'
  inflect.irregular 'alumnus', 'alumni'
end

Additionally, since we changed the overall relationship so an activity can only have one label, we must be sure to update the create and edit forms with this change. This also means we must update the strong params methods to accept the correct form data (one label instead of an array of label ids).

# app/controllers/activities_controller.rb
# old params supporting multiple labels
  def activity_params	    
    params.require(:activity).permit(:name, :description, :date, :label_ids => [])	    
  end
  
# new params supporting one label  
  def activity_params
    params.require(:activity).permit(:name, :description, :date, :label)
  end	        

Now that our params are set up to accept just one label, what information are we actually sending over for that label? Originally we had a checkbox collection which sent over ids for the selected labels. If we can only accept one label, it makes sense to change this checkbox collection to radio buttons (which only allow one selection). Which value should we send as params though, the string or the integer?

If we try to add radio button inputs by iterating over our Activity.labels method, we must remember that this method returns a hash. We could try to display the name as a label and keep the integer as the value.

/* app/views/activities/edit.html.erb */
<% Activity.labels.each do |name, id| %>
  <%= f.radio_button :label, id %>
  <%= label_tag name %>
<% end %>

This results in a form that looks right. For each input the integer (id) is the value, and the name is showing as the label. As an added bonus, the name gets titleized for us automatically. Thanks Rails!

If we submit the form with the labels params like this, we get a Rails from error in our controller: '2' is not a valid label. Since our id is sent over as a string in the params, our update method is trying to match the string ‘2’ to one of the enums we specified. We could create a helper method that pulls this param out of the params hash, converts it into an integer and creates an updated params hash with the integer as the label value. This will work as our enums allow us to update the column attribute by passing in a string, symbol or integer referencing the enum as an argument. However, this feels like a lot more param manipulation than we should have to do.

The alternative then is to have the value of the form param be a string representing the label. Instead of iterating over a hash of the labels to create the radio buttons, we can iterate over an array of the enum strings (which happen to be the keys of the Activity.labels hash). This will set up our form so that our string is both the input value and the input label. In the controller our activity_params can use this string as is and immediately update the activity.

/* app/views/activities/edit.html.erb  */
<% Activity.labels.keys.each do |label| %>
    <%= f.radio_button :label, label %>
    <%= label_tag label %>
<% end %>

With the views now able to pass params to the controller and update the database, our new label column is fully implemented. Overall this is definitely a move towards efficiency for our database. Now a label is stored on an activity, so we can get all activity information with one query and no joins. Additionally we aren’t creating extra rows of Activity_Labels data for each activity which cuts down on total database rows used. That means it will take us longer to reach the 10,000 row free Heroku limit. While there is some performance benefit to storing integers instead of strings in a database, for small strings (like our label names) I am not sure it is that much of a performance increase.

For more about Enums, check out the ActiveRecord documentation on Enum or read this awesome article by another Flatiron alumna!