Setting up multiple databases in Rails: the definitive guide

Setting up multiple databases in Rails: the definitive guide

There are different reasons why you might consider having multiple databases in your Ruby on Rails application. In my specific case scenario, I needed to store large quantities of data representing user behavior: clicks, pages visited, historical changes, and so on.

This kind of databases generally are not mission critical, and grow much faster (and larger) than most databases. Their requirements are often different: for instance, they need more storage space, are more tolerant in the face of hardware or software failures, and are write-intensive. For these reasons, sometimes it is interesting to separate them from your application’s primary database. Often, non-RDBMS databases are chosen for these kind of tasks, something which is however beyond the scope of this article.

I googled and read many different solutions, however I couldn’t find one that was able to fully cover how to:

  • Have different and isolated migrations and schemas for every database.
  • Use rails generators to create new migrations for every database, independently.
  • Offer database-specific rake tasks for the most common database operations (i.e. like the ones available for the primary database).
  • Integrate with RSpec’s default spec task.
  • Work with Database Cleaner.
  • Work on Heroku.

This is my take on how to solve all of these – and have a fully working multiple database solution for your Rails application.

Create the custom database files

For the purpose of this tutorial, we’re going to set up a second database called Stats. To do so, we’re going to duplicate how Rails handles the primary database, and stick to conventions.

First of all, create the file config/database_stats.yml and populate it as you do with the primary database’s config file. Your file will look something like this:

Note that I’ve given specific names to the databases, trying to follow as closely as possible Rails’ naming conventions. Also, I’ve set the database production url to an environment variable DATABASE_STATS_URL. This will allow us to easily set this variable to a secondary database when deploying to Heroku.

We’re now going to create a directory that will hold the schema and all the migrations of the Stats database, so that it will have its own files clearly isolated from the primary database. We are basically going to duplicate Rails’ primary database db directory.

Create the directory db_stats in the Rails root and ensure to copy the structure and files of the primary database db directory within it. You will have something like:

The created files schema.rb and seeds.rb, together with the migrate directory, should just be empty.

 

Add Rake tasks

To handle the Stats database, and allow for its creation, migrations, schema dumping and other functionalities we’re going to need custom Rake tasks. These tasks will provide us with the same functionalities that Rails provides us for the primary database.

Create a new file  lib/tasks/db_stats.rake, and paste the following:

This needs a little explanation: let’s break up this file in its main sections. First of all, we simply provide “proxies” to standard Rails database tasks, in a newly created Rake namespace stats:db:

Then, we loop all of these tasks, and ensure the task stats:set_custom_config  is run before and the task stats:revert_to_original_config  after every one of the “proxy” tasks:

We have to do this since, unfortunately, Rails support for multiple databases isn’t that great, hence we need to provide minor hacks to make everything work. For this reason we have to set specific environment and configuration variables to custom values which match our Stats database before we run the “proxy” tasks, and then ensure that the original values are set back once those tasks have been run. The following two tasks do just that:

Notice how the lines 9-13 set values to the files and directories we have created in the previous steps.

Finally, if you’re using RSpec you can add one dependency to the spec task, to ensure that the Stats database is automatically prepared when tests are run:

Once all of this is set up, we can create the Stats database and run its first migration:

This will generate the Stats database schema file in db_stats/schema.rb.

 

Add a custom generator

Unfortunately, we cannot simply use Rails’ generator ActiveRecord::Generators::MigrationGenerator because it hardcodes the parent directory of the migration (notice the path hardcoded to the directory db/migrate in line 4 here below):

Therefore, we need to have a custom generator to create migrations for the Stats database. However, we can still inherit from it and monkey patch this specific function. Create the following generator in lib/generators/stats_migration_generator.rb:

In line 9 we set the directory base to the Stats database directory. Also, in line 4 we initialize the templates directory and point it at the original one used by the generator we’re inheriting from.

With all of this in place, we can now generate migrations for the Stats database:

You’ll notice that the migration file gets created in the Stats database migrate directory db_stats/migrate. You can edit this file and then run your migrations with the Rake task that we’ve set up in the previous steps, just as you normally would do with your primary database:

 

Finalize connection and models

We’re almost done. Add a new initializer file config/initializers/db_stats.rb  and paste the following:

Notice that we reference the Stats database configuration file that we created in the first step here above. By doing this, we initialise a global variable DB_STATS that holds the current environment’s configuration of the Stats database.

Finally, we can set our models’ connection to this configuration. For example, let’s say that we have a Click  model that corresponds to the migration here above. All you have to do is add one extra line that specifies which connection to use:

It’s that easy. Your model will now use the database Stats.

If you have multiple models that need to connect to the Stats database, however, you will need to add an extra step. If you were to have another model establishing its own connection to the Stats database, it would have its own connection pool and you might risk getting out of available connections to your Stats database. Therefore, if you have multiple models it is recommended to inherit from a single model, so that all the models connecting to the Stats database will share the same connection pool.

To do so, create the base model that connects to the Stats database:

You can now inherit in all your models:

 

Heroku

As already anticipated, the last step that you need to make this work on Heroku is to set the environment variable DATABASE_STATS_URL  to the database you want to use as Stats. For example, if you created a second database called HEROKU_POSTGRESQL_TEAL_URL all you have to do is to set this database’s value using the Heroku toolbelt:

And you’re ready to go.

 

Bonus: DatabaseCleaner

If you’re using the DatabaseCleaner gem, you can set it to clean the models that use the Stats database too. For example, your  spec/rails_helper.rb may look something like this:

According to DatabaseCleaner README, it should be possible to set a connection option instead of the model one. Unfortunately, my attempts at this have been unsuccessful. If anyone knows how to do this and avoid specifying a DatabaseCleaner strategy for every model, please let me know.

 

I hope you’ve enjoyed reading this, and that my ramblings can be helpful to someone going down this same path. As usual, any suggestions on how to improve any of this are warmly welcome.

Happy multiple db’ing! :)

50 Comments

  1. BIGmike

    zOMG, you made my day!!!

  2. BIGmike

    I think you need one more level of inheritance to use the connection pool see here: https://blog.pivotal.io/labs/labs/using-activerecord-with-multiple-databases

    Also, please add this to stackoverflow!!!

    • Hey Mike,
      Thank you for your feedback! I’m already not using mixins though, so what do you believe I’m missing from the Pivotal’s post?

      Also, any specific post you are referring to on Stackoverflow?

  3. BIGmike

    I believe if you want every model that touches the second database to share the same connection pool it has to be inherited. That is my very limited understanding. So you may not want to uses establish_connection in Click directly.

    As for Stackoverflow, this is definitely one of the better documentations I’ve seen for this and a beautiful solution; I believe it needs to exist in a way that many can see, that’s all. It’s way too beautiful not to be shared on stackoverflow somehow.

  4. BIGmike

    Love your blog btw!

  5. This is a fantastic guide and it works great with one exception, every time I try to run seeds I get a pending migrations error from the main . I noticed that the application configuration the existing path wasn’t ‘db/seeds’ before running set_custom_config, it was ‘db/seeds.rb’ but changing this didn’t help. In order to solve the problem I made a few changes. First I defined my own stats:db:abort_if_pending_migrations that takes into account the application configuration (which it doesn’t seem to do by default!?). Then I updated stats:db:seed to call it before calling ActiveRecord::Tasks::DatabaseTasks.load_seed.

    Here’s the code, if there’s a better way please let me know.

  6. sohair

    migration file is created without the content and also faced this `binread’: Is a directory @ io_fread (Errno::EISDIR)

    • Having no access to your code, I cannot tell where this is coming from. Just a thought: are you sure that you’ve set up correctly the source_root of your migration generator? It should be something like:

  7. sohair

    private method:

  8. Slava K

    Hi, Roberto

    Thank you for this post! I found it very useful while I was setting up second database in my rails app.

    I was able to setup database cleaner without specifying strategy for each model in this way: http://pastebin.com/upxPzJLE

    I would also suggest to dry lib/tasks/db_stats.rake a little: http://pastebin.com/UMbfxkSL

    Have a nice day!
    Slava

  9. Hareet

    This was priceless. Great post!
    +1 to bookmarking your blog.

    One thing I had to do was add:
    “self.abtract_class = true” under my main model that established_connection to the secondary db.
    Otherwise, the spec tests that tested my models that inherited that connection would fail, because it would look for a table name identical to the main model.

    example:
    Error: Table does not exist:: .

    Rails Way describes .abtract_class as saying “Hey, rails pass all my stuff to subclasses, but there is no table with my name”

  10. Jackson

    Thanks for the awesome tutorial Roberto!

    Just a heads up if anyone is having trouble seeding: for some reason running seed rake tasks didn’t work in my Rails 4.2.6 project. Rather than running the db_stats/seeds.rb file, it would run the normal db/seeds.rb file.

    I had to update this line:
    Rails.application.config.paths[‘db/seeds’] = [“db_stats/seeds.rb”]

    to:
    Rails.application.config.paths[‘db/seeds.rb’] = [“db_stats/seeds.rb”]

    • Hi Jackson,
      Thank you for this feedback. Not sure why you’re experiencing this though, I’m using it too with Rails 4.2.6 currently and did not have to change the seeds as you suggested.

    • Josh

      :+1:

      In Rails 5 I also needed to make this change. It was trying to run the regular db/seeds.rb file otherwise.

    • Mario

      Great article! As already mentioned, the path name for db/seeds changed to db/seeds.rb in newer Rails versions.

      Rails 5.0.1


      > Rails.configuration.paths.keys
      => ["app", "app/assets", "app/controllers", "app/channels", "app/helpers", "app/models", "app/mailers", "app/views", "lib", "lib/assets", "lib/tasks", "config", "config/environments", "config/initializers", "config/locales", "config/routes.rb", "db", "db/migrate", "db/seeds.rb", "vendor", "vendor/assets", "config/database", "config/secrets", "config/environment", "lib/templates", "log", "public", "public/javascripts", "public/stylesheets", "tmp", "config/cable", "app/decorators"]

  11. Alex

    Hello! Thanks for your post. It was very helpful!
    One note to mention: I have to add this two line to the both set_custom_config and revert_to_original_config tasks:


    db_config = Rails.application.config.paths['config/database'].to_a.first
    ActiveRecord::Base.establish_connection YAML.load(ERB.new(File.read(db_config)).result)[Rails.env]

    At least schema dumper will dump original db without this line. It uses ActiveRecord::Base.connection as the source.

  12. Romain

    Hi,

    Thx a lot for the tuto!

    (you have a typo in the definition of StatsBase : abtract_class => abstract_class)

  13. augusto sandim

    I have an error

    rake log:db:create
    rake aborted!
    undefined method `config=’ for

  14. Kevin

    This guide has been hugely helpful for me in trying to figure out how to set up multiple databases, because of the extension of the existing rails framework.

    I saw that you had difficulty getting DatabaseCleaner to work via the ‘connection’ option, and I saw something interesting while reading the docs that might tip you off in the correct direction of avoiding declaring cleaning for every model:

    Active Record: DatabaseCleaner[:active_record] : “Connection specified as :symbol keys, loaded from config/database.yml. You may also pass in the ActiveRecord model under the :model key.”

    I think the key here is that the symbol of which database to connect to seems to be derived from config/database.yml, which probaby means to read the configuration from config/database_stats.yml in your example, the loading of the the symbols in DatabaseCleaner probably needs to be monkey patched in some way to read the configuration from the different configuration file.

    Thank you once again for such a thorough guide! I’ll be sure to let you know if I find some way to solve the last piece of the puzzle here, if it hasn’t been solved by more experienced minds.

  15. robert

    Great post. The my situation is almost exact as this. However, my app is using structure.sql, not schema.rb (config.active_record.schema_format = :sql).

    Your code works great for schema.rb, but I am having trouble making it XXX:schema:dump produce an updated structure.sql.

    I tried adding to :set_custom_config…
    ENV[‘SCHEMA’] = “db_monitor/structure.sql”
    Rails.application.config.active_record.schema_format = :sql

    Which should run before each task, but it has no effect.

    Any suggestions?

    • Dave Hartnoll

      My solution was to leave ENV[‘SCHEMA’] unchanged as it’s actually nil when using :sql format:

      ENV[‘SCHEMA’] = “db_stats/schema.rb” unless Rails.application.config.active_record.schema_format == :sql

      and

      ENV[‘SCHEMA’] = @original_config[:env_schema] unless Rails.application.config.active_record.schema_format == :sql

      (or simply remove those lines)

      (I’m using Rails 5 if that makes a difference)

  16. robert

    Ah, nevermind. I just realized we have “db:structure:dump”!

  17. Sheri

    You are awesome. This was precisely what I needed and it ‘just works’. Thanks so much!

  18. jonas

    Thanks for your job.

    I got a problem when I deploy my project and restart puma.

    NameError: uninitialized constant Rails::Generators::Actions
    /home/deploy/fulmo/releases/5/vendor/bundle/ruby/2.3.0/gems/railties-5.0.2/lib/rails/generators/base.rb:17:in
    '
    /home/deploy/fulmo/releases/5/vendor/bundle/ruby/2.3.0/gems/railties-5.0.2/lib/rails/generators/base.rb:15:in
    '
    /home/deploy/fulmo/releases/5/vendor/bundle/ruby/2.3.0/gems/railties-5.0.2/lib/rails/generators/base.rb:11:in '
    /home/deploy/fulmo/releases/5/vendor/bundle/ruby/2.3.0/gems/railties-5.0.2/lib/rails/generators/base.rb:10:in
    '
    /home/deploy/fulmo/releases/5/vendor/bundle/ruby/2.3.0/gems/railties-5.0.2/lib/rails/generators/named_base.rb:2:in '
    /home/deploy/fulmo/releases/5/vendor/bundle/ruby/2.3.0/gems/activerecord-5.0.2/lib/rails/generators/active_record.rb:1:in
    '
    /home/deploy/fulmo/releases/5/vendor/bundle/ruby/2.3.0/gems/activerecord-5.0.2/lib/rails/generators/active_record/migration/migration_generator.rb:1:in '
    /home/deploy/fulmo/releases/5/lib/generators/log_migration_generator.rb:1:in
    '

    log_migration_generator.rb:


    require 'rails/generators/active_record/migration/migration_generator'

    class LogMigrationGenerator < ActiveRecord::Generators::MigrationGenerator
    include Rails::Generators::Actions
    source_root File.join(File.dirname(ActiveRecord::Generators::MigrationGenerator.instance_method(:create_migration_file).source_location.first), "templates")

    def create_migration_file
    set_local_assigns!
    validate_file_name!
    migration_template @migration_template, "db_log/migrate/#{file_name}.rb"
    end
    end

    How to fix it?

  19. jonas

    i fixed it. log_migration_generator.rb need requires :

    require ‘rails/generators/actions/create_migration’
    require ‘rails/generators’
    require ‘rails/generators/active_record/migration/migration_generator’

  20. Dano

    Hi, Roberto

    Thank you for the great post, which tremendously helped me. I really appreciate it.

    I have an issue which I think is related to this multiple database set-up. Only models using the secondary database don’t work with select method of ActiveRecord

    2.3.1 :011 > CouponCampaign.all.select(:id)
    CouponCampaign Load (0.3ms) SELECT 'coupon_campaigns'.'id' FROM 'coupon_campaigns'
    (Object doesn't support #inspect)

    Do you have any idea to solve this one?

  21. Thanks so much for this! I had asked a question on Stack Overflow in October 2016 asking about how to include multiple databases in a single schema.rb file but never got an answer. I asked before I migrated all of my RoR 5.0.2 websites to Heroku a few weeks ago. After moving them to Heroku it makes total sense to keep the migrations separated as you do on this post. Roberto, if you are on Stack Overflow please post your username. I plan on answering my question with all of my code and include your link as my reference point. I would like to tag you on the post. This information will help many, many people.

  22. ben

    This worked great for almost everything — I ran into some difficulties with the seed command, as it insists on looking in the schema_migrations table. I tried a few variations on setting the DatabaseTasks configuration and telling it to run the load_seed command in a custom class without checking for pending migrations, but that raised further problems as I no longer had access to my ActiveRecord objects at all. Ended up just doing the seeds by hand, which is no big deal, but would be nice to have that working seamlessly as the other commands do.

  23. Matt N

    Need to add my thanks here, really clean and concise guide.

  24. Theja

    This article really helped me a lot. When I was working on Sass apps, I came across maintaining separate databases and shared schemas of a database. With this article, I came to know how to maintain multiple databases of a single application instance

  25. TJ

    If I want to setup 10 separate databases, Do I need to create 10 database.yml files. Can’t we accomplish in an optimised way?

    Thanks in advance.

Leave a Reply