There are different reasons why you might consider having multiple databases in your Ruby on Rails application. In my specific case scenario, I needed to store large quantities of data representing user behavior: clicks, pages visited, historical changes, and so on.
This kind of databases generally are not mission critical, and grow much faster (and larger) than most databases. Their requirements are often different: for instance, they need more storage space, are more tolerant in the face of hardware or software failures, and are write-intensive. For these reasons, sometimes it is interesting to separate them from your application’s primary database. Often, non-RDBMS databases are chosen for these kind of tasks, something which is however beyond the scope of this article.
I googled and read many different solutions, however I couldn’t find one that was able to fully cover how to:
- Have different and isolated migrations and schemas for every database.
- Use rails generators to create new migrations for every database, independently.
- Offer database-specific rake tasks for the most common database operations (i.e. like the ones available for the primary database).
- Integrate with RSpec’s default spec task.
- Work with Database Cleaner.
- Work on Heroku.
This is my take on how to solve all of these – and have a fully working multiple database solution for your Rails application.
Create the custom database files
For the purpose of this tutorial, we’re going to set up a second database called Stats. To do so, we’re going to duplicate how Rails handles the primary database, and stick to conventions.
First of all, create the file config/database_stats.yml and populate it as you do with the primary database’s config file. Your file will look something like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
development: adapter: postgresql encoding: utf8 host: localhost pool: 10 database: myapp_stats_development username: postgres password: test: adapter: postgresql encoding: utf8 host: localhost pool: 10 database: myapp_stats_test username: postgres password: production: adapter: postgresql encoding: utf8 url: <%= ENV["DATABASE_STATS_URL"] %> pool: <%= ENV["DB_POOL"] || 5 %> |
Note that I’ve given specific names to the databases, trying to follow as closely as possible Rails’ naming conventions. Also, I’ve set the database production url to an environment variable DATABASE_STATS_URL. This will allow us to easily set this variable to a secondary database when deploying to Heroku.
We’re now going to create a directory that will hold the schema and all the migrations of the Stats database, so that it will have its own files clearly isolated from the primary database. We are basically going to duplicate Rails’ primary database db directory.
Create the directory db_stats in the Rails root and ensure to copy the structure and files of the primary database db directory within it. You will have something like:
1 2 3 4 5 6 7 8 |
-- db |-- migrate schema.rb seeds.rb -- db_stats |-- migrate schema.rb seeds.rb |
The created files schema.rb and seeds.rb, together with the migrate directory, should just be empty.
Add Rake tasks
To handle the Stats database, and allow for its creation, migrations, schema dumping and other functionalities we’re going to need custom Rake tasks. These tasks will provide us with the same functionalities that Rails provides us for the primary database.
Create a new file lib/tasks/db_stats.rake, and paste the following:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 |
task spec: ["stats:db:test:prepare"] namespace :stats do namespace :db do |ns| task :drop do Rake::Task["db:drop"].invoke end task :create do Rake::Task["db:create"].invoke end task :setup do Rake::Task["db:setup"].invoke end task :migrate do Rake::Task["db:migrate"].invoke end task :rollback do Rake::Task["db:rollback"].invoke end task :seed do Rake::Task["db:seed"].invoke end task :version do Rake::Task["db:version"].invoke end namespace :schema do task :load do Rake::Task["db:schema:load"].invoke end task :dump do Rake::Task["db:schema:dump"].invoke end end namespace :test do task :prepare do Rake::Task["db:test:prepare"].invoke end end # append and prepend proper tasks to all the tasks defined here above ns.tasks.each do |task| task.enhance ["stats:set_custom_config"] do Rake::Task["stats:revert_to_original_config"].invoke end end end task :set_custom_config do # save current vars @original_config = { env_schema: ENV['SCHEMA'], config: Rails.application.config.dup } # set config variables for custom database ENV['SCHEMA'] = "db_stats/schema.rb" Rails.application.config.paths['db'] = ["db_stats"] Rails.application.config.paths['db/migrate'] = ["db_stats/migrate"] Rails.application.config.paths['db/seeds'] = ["db_stats/seeds.rb"] Rails.application.config.paths['config/database'] = ["config/database_stats.yml"] end task :revert_to_original_config do # reset config variables to original values ENV['SCHEMA'] = @original_config[:env_schema] Rails.application.config = @original_config[:config] end end |
This needs a little explanation: let’s break up this file in its main sections. First of all, we simply provide “proxies” to standard Rails database tasks, in a newly created Rake namespace stats:db:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
task :drop do Rake::Task["db:drop"].invoke end task :create do Rake::Task["db:create"].invoke end task :setup do Rake::Task["db:setup"].invoke end task :migrate do Rake::Task["db:migrate"].invoke end [...] |
Then, we loop all of these tasks, and ensure the task stats:set_custom_config is run before and the task stats:revert_to_original_config after every one of the “proxy” tasks:
1 2 3 4 5 6 |
# append and prepend proper tasks to all tasks defined in stats:db namespace ns.tasks.each do |task| task.enhance ["stats:set_custom_config"] do Rake::Task["stats:revert_to_original_config"].invoke end end |
We have to do this since, unfortunately, Rails support for multiple databases isn’t that great, hence we need to provide minor hacks to make everything work. For this reason we have to set specific environment and configuration variables to custom values which match our Stats database before we run the “proxy” tasks, and then ensure that the original values are set back once those tasks have been run. The following two tasks do just that:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
task :set_custom_config do # save current vars @original_config = { env_schema: ENV['SCHEMA'], config: Rails.application.config.dup } # set config variables for custom database ENV['SCHEMA'] = "db_stats/schema.rb" Rails.application.config.paths['db'] = ["db_stats"] Rails.application.config.paths['db/migrate'] = ["db_stats/migrate"] Rails.application.config.paths['db/seeds'] = ["db_stats/seeds.rb"] Rails.application.config.paths['config/database'] = ["config/database_stats.yml"] end task :revert_to_original_config do # reset config variables to original values ENV['SCHEMA'] = @original_config[:env_schema] Rails.application.config = @original_config[:config] end |
Notice how the lines 9-13 set values to the files and directories we have created in the previous steps.
Finally, if you’re using RSpec you can add one dependency to the spec task, to ensure that the Stats database is automatically prepared when tests are run:
1 |
task spec: ["stats:db:test:prepare"] |
Once all of this is set up, we can create the Stats database and run its first migration:
1 2 |
$ rake stats:db:create $ rake stats:db:migrate |
This will generate the Stats database schema file in db_stats/schema.rb.
Add a custom generator
Unfortunately, we cannot simply use Rails’ generator ActiveRecord::Generators::MigrationGenerator because it hardcodes the parent directory of the migration (notice the path hardcoded to the directory db/migrate in line 4 here below):
1 2 3 4 5 |
def create_migration_file set_local_assigns! validate_file_name! migration_template @migration_template, "db/migrate/#{file_name}.rb" end |
Therefore, we need to have a custom generator to create migrations for the Stats database. However, we can still inherit from it and monkey patch this specific function. Create the following generator in lib/generators/stats_migration_generator.rb:
1 2 3 4 5 6 7 8 9 10 11 |
require 'rails/generators/active_record/migration/migration_generator' class StatsMigrationGenerator < ActiveRecord::Generators::MigrationGenerator source_root File.join(File.dirname(ActiveRecord::Generators::MigrationGenerator.instance_method(:create_migration_file).source_location.first), "templates") def create_migration_file set_local_assigns! validate_file_name! migration_template @migration_template, "db_stats/migrate/#{file_name}.rb" end end |
In line 9 we set the directory base to the Stats database directory. Also, in line 4 we initialize the templates directory and point it at the original one used by the generator we’re inheriting from.
With all of this in place, we can now generate migrations for the Stats database:
1 2 |
$ rails g stats_migration create_clicks create db_stats/migrate/20151201191642_create_clicks.rb |
You’ll notice that the migration file gets created in the Stats database migrate directory db_stats/migrate. You can edit this file and then run your migrations with the Rake task that we’ve set up in the previous steps, just as you normally would do with your primary database:
1 |
$ rake stats:db:migrate |
Finalize connection and models
We’re almost done. Add a new initializer file config/initializers/db_stats.rb and paste the following:
1 2 |
# save stats database settings in global var DB_STATS = YAML::load(ERB.new(File.read(Rails.root.join("config","database_stats.yml"))).result)[Rails.env] |
Notice that we reference the Stats database configuration file that we created in the first step here above. By doing this, we initialise a global variable DB_STATS that holds the current environment’s configuration of the Stats database.
Finally, we can set our models’ connection to this configuration. For example, let’s say that we have a Click model that corresponds to the migration here above. All you have to do is add one extra line that specifies which connection to use:
1 2 3 4 |
class Click < ActiveRecord::Base establish_connection DB_STATS end |
It’s that easy. Your model will now use the database Stats.
If you have multiple models that need to connect to the Stats database, however, you will need to add an extra step. If you were to have another model establishing its own connection to the Stats database, it would have its own connection pool and you might risk getting out of available connections to your Stats database. Therefore, if you have multiple models it is recommended to inherit from a single model, so that all the models connecting to the Stats database will share the same connection pool.
To do so, create the base model that connects to the Stats database:
1 2 3 4 |
class StatsBase < ActiveRecord::Base establish_connection DB_STATS self.abstract_class = true end |
You can now inherit in all your models:
1 2 3 4 5 |
class Click < StatsBase end class View < StatsBase end |
Heroku
As already anticipated, the last step that you need to make this work on Heroku is to set the environment variable DATABASE_STATS_URL to the database you want to use as Stats. For example, if you created a second database called HEROKU_POSTGRESQL_TEAL_URL all you have to do is to set this database’s value using the Heroku toolbelt:
1 |
$ heroku config:set DATABASE_STATS_URL=postgres://gsdfjrthjsnaew:gry6OJF6drDjththjkSDngldsf@ec2-116-22-114-221.compute-1.amazonaws.com:5432/hmsrthj24dfgks |
And you’re ready to go.
Bonus: DatabaseCleaner
If you’re using the DatabaseCleaner gem, you can set it to clean the models that use the Stats database too. For example, your spec/rails_helper.rb may look something like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
ENV["RAILS_ENV"] ||= 'test' require 'spec_helper' require File.expand_path("../../config/environment", __FILE__) require 'rspec/rails' Dir[Rails.root.join("spec/support/**/*.rb")].each { |f| require f } ActiveRecord::Migration.maintain_test_schema! RSpec.configure do |config| config.use_transactional_fixtures = false config.infer_spec_type_from_file_location! config.before(:suite) do DatabaseCleaner.clean_with(:truncation) DatabaseCleaner[:active_record, { model: Click }].clean_with(:truncation) end config.before(:each) do |example| unit_test = ![:feature, :request].include?(example.metadata[:type]) strategy = unit_test ? :transaction : :truncation DatabaseCleaner.strategy = strategy DatabaseCleaner[:active_record, { model: Click }].strategy = strategy DatabaseCleaner.start DatabaseCleaner[:active_record, { model: Click }].start end config.after(:each) do DatabaseCleaner.clean DatabaseCleaner[:active_record, { model: Click }].clean end end |
According to DatabaseCleaner README, it should be possible to set a connection option instead of the model one. Unfortunately, my attempts at this have been unsuccessful. If anyone knows how to do this and avoid specifying a DatabaseCleaner strategy for every model, please let me know.
I hope you’ve enjoyed reading this, and that my ramblings can be helpful to someone going down this same path. As usual, any suggestions on how to improve any of this are warmly welcome.
Happy multiple db’ing! :)
zOMG, you made my day!!!
I think you need one more level of inheritance to use the connection pool see here: https://blog.pivotal.io/labs/labs/using-activerecord-with-multiple-databases
Also, please add this to stackoverflow!!!
Hey Mike,
Thank you for your feedback! I’m already not using mixins though, so what do you believe I’m missing from the Pivotal’s post?
Also, any specific post you are referring to on Stackoverflow?
I believe if you want every model that touches the second database to share the same connection pool it has to be inherited. That is my very limited understanding. So you may not want to uses establish_connection in Click directly.
As for Stackoverflow, this is definitely one of the better documentations I’ve seen for this and a beautiful solution; I believe it needs to exist in a way that many can see, that’s all. It’s way too beautiful not to be shared on stackoverflow somehow.
Thank you Mike, will investigate further based on your comments.
Can’t post on Stackoverflow with only a link to this article, as it’s against their policy.
See the “Connection Management” of this article http://technology.customink.com/blog/2015/06/22/rails-multi-database-best-practices-roundup/
Thank you Luke, I’ve updated the article.
Love your blog btw!
This is a fantastic guide and it works great with one exception, every time I try to run seeds I get a pending migrations error from the main . I noticed that the application configuration the existing path wasn’t ‘db/seeds’ before running set_custom_config, it was ‘db/seeds.rb’ but changing this didn’t help. In order to solve the problem I made a few changes. First I defined my own stats:db:abort_if_pending_migrations that takes into account the application configuration (which it doesn’t seem to do by default!?). Then I updated stats:db:seed to call it before calling ActiveRecord::Tasks::DatabaseTasks.load_seed.
Here’s the code, if there’s a better way please let me know.
Hi Jim, thank you for your input. Unfortunately, I can’t reproduce the error, so I can’t really help :/
migration file is created without the content and also faced this `binread’: Is a directory @ io_fread (Errno::EISDIR)
Having no access to your code, I cannot tell where this is coming from. Just a thought: are you sure that you’ve set up correctly the source_root of your migration generator? It should be something like:
private method:
I’m sorry but I don’t understand what this piece of code refers to.
Hi, Roberto
Thank you for this post! I found it very useful while I was setting up second database in my rails app.
I was able to setup database cleaner without specifying strategy for each model in this way: http://pastebin.com/upxPzJLE
I would also suggest to dry lib/tasks/db_stats.rake a little: http://pastebin.com/UMbfxkSL
Have a nice day!
Slava
Thank you for this input Slava!
This was priceless. Great post!
+1 to bookmarking your blog.
One thing I had to do was add:
“self.abtract_class = true” under my main model that established_connection to the secondary db.
Otherwise, the spec tests that tested my models that inherited that connection would fail, because it would look for a table name identical to the main model.
example:
Error: Table does not exist:: .
Rails Way describes .abtract_class as saying “Hey, rails pass all my stuff to subclasses, but there is no table with my name”
Yes, you are absolutely right! I don’t know why this slipped, but I’ve added it back.
Thank you :)
Thanks for the awesome tutorial Roberto!
Just a heads up if anyone is having trouble seeding: for some reason running seed rake tasks didn’t work in my Rails 4.2.6 project. Rather than running the db_stats/seeds.rb file, it would run the normal db/seeds.rb file.
I had to update this line:
Rails.application.config.paths[‘db/seeds’] = [“db_stats/seeds.rb”]
to:
Rails.application.config.paths[‘db/seeds.rb’] = [“db_stats/seeds.rb”]
Hi Jackson,
Thank you for this feedback. Not sure why you’re experiencing this though, I’m using it too with Rails 4.2.6 currently and did not have to change the seeds as you suggested.
:+1:
In Rails 5 I also needed to make this change. It was trying to run the regular
db/seeds.rb
file otherwise.Great article! As already mentioned, the path name for
db/seeds
changed todb/seeds.rb
in newer Rails versions.Rails 5.0.1
> Rails.configuration.paths.keys
=> ["app", "app/assets", "app/controllers", "app/channels", "app/helpers", "app/models", "app/mailers", "app/views", "lib", "lib/assets", "lib/tasks", "config", "config/environments", "config/initializers", "config/locales", "config/routes.rb", "db", "db/migrate", "db/seeds.rb", "vendor", "vendor/assets", "config/database", "config/secrets", "config/environment", "lib/templates", "log", "public", "public/javascripts", "public/stylesheets", "tmp", "config/cable", "app/decorators"]
Hello! Thanks for your post. It was very helpful!
One note to mention: I have to add this two line to the both
set_custom_config
andrevert_to_original_config
tasks:db_config = Rails.application.config.paths['config/database'].to_a.first
ActiveRecord::Base.establish_connection YAML.load(ERB.new(File.read(db_config)).result)[Rails.env]
At least schema dumper will dump original db without this line. It uses ActiveRecord::Base.connection as the source.
Not experiencing this, leaving this note if anyone has the same need! Thank you.
Had the same problem, I think it occurs when the stats database is in a different server.
Also had the same issue. Adding those lines fixed it for me.
Same here. Thanks for posting your solution :)
Hi,
Thx a lot for the tuto!
(you have a typo in the definition of StatsBase : abtract_class => abstract_class)
Thanks, fixed.
I have an error
rake log:db:create
rake aborted!
undefined method `config=’ for
I have the same error
This guide has been hugely helpful for me in trying to figure out how to set up multiple databases, because of the extension of the existing rails framework.
I saw that you had difficulty getting DatabaseCleaner to work via the ‘connection’ option, and I saw something interesting while reading the docs that might tip you off in the correct direction of avoiding declaring cleaning for every model:
Active Record: DatabaseCleaner[:active_record] : “Connection specified as :symbol keys, loaded from config/database.yml. You may also pass in the ActiveRecord model under the :model key.”
I think the key here is that the symbol of which database to connect to seems to be derived from config/database.yml, which probaby means to read the configuration from
config/database_stats.yml
in your example, the loading of the the symbols in DatabaseCleaner probably needs to be monkey patched in some way to read the configuration from the different configuration file.Thank you once again for such a thorough guide! I’ll be sure to let you know if I find some way to solve the last piece of the puzzle here, if it hasn’t been solved by more experienced minds.
Glad it could help. :) Let me know if you find a viable solution.
Great post. The my situation is almost exact as this. However, my app is using structure.sql, not schema.rb (config.active_record.schema_format = :sql).
Your code works great for schema.rb, but I am having trouble making it XXX:schema:dump produce an updated structure.sql.
I tried adding to :set_custom_config…
ENV[‘SCHEMA’] = “db_monitor/structure.sql”
Rails.application.config.active_record.schema_format = :sql
Which should run before each task, but it has no effect.
Any suggestions?
My solution was to leave ENV[‘SCHEMA’] unchanged as it’s actually nil when using :sql format:
ENV[‘SCHEMA’] = “db_stats/schema.rb” unless Rails.application.config.active_record.schema_format == :sql
and
ENV[‘SCHEMA’] = @original_config[:env_schema] unless Rails.application.config.active_record.schema_format == :sql
(or simply remove those lines)
(I’m using Rails 5 if that makes a difference)
Ah, nevermind. I just realized we have “db:structure:dump”!
You are awesome. This was precisely what I needed and it ‘just works’. Thanks so much!
Thanks for your job.
I got a problem when I deploy my project and restart puma.
NameError: uninitialized constant Rails::Generators::Actions
/home/deploy/fulmo/releases/5/vendor/bundle/ruby/2.3.0/gems/railties-5.0.2/lib/rails/generators/base.rb:17:in
'
'/home/deploy/fulmo/releases/5/vendor/bundle/ruby/2.3.0/gems/railties-5.0.2/lib/rails/generators/base.rb:15:in
/home/deploy/fulmo/releases/5/vendor/bundle/ruby/2.3.0/gems/railties-5.0.2/lib/rails/generators/base.rb:11:in
'
'/home/deploy/fulmo/releases/5/vendor/bundle/ruby/2.3.0/gems/railties-5.0.2/lib/rails/generators/base.rb:10:in
/home/deploy/fulmo/releases/5/vendor/bundle/ruby/2.3.0/gems/railties-5.0.2/lib/rails/generators/named_base.rb:2:in
'
'/home/deploy/fulmo/releases/5/vendor/bundle/ruby/2.3.0/gems/activerecord-5.0.2/lib/rails/generators/active_record.rb:1:in
/home/deploy/fulmo/releases/5/vendor/bundle/ruby/2.3.0/gems/activerecord-5.0.2/lib/rails/generators/active_record/migration/migration_generator.rb:1:in
'
'/home/deploy/fulmo/releases/5/lib/generators/log_migration_generator.rb:1:in
log_migration_generator.rb:
require 'rails/generators/active_record/migration/migration_generator'
class LogMigrationGenerator < ActiveRecord::Generators::MigrationGenerator
include Rails::Generators::Actions
source_root File.join(File.dirname(ActiveRecord::Generators::MigrationGenerator.instance_method(:create_migration_file).source_location.first), "templates")
def create_migration_file
set_local_assigns!
validate_file_name!
migration_template @migration_template, "db_log/migrate/#{file_name}.rb"
end
end
How to fix it?
error log: http://pastebin.com/VQTagFUT
log_migration_generator.rb: http://pastebin.com/ypiSGGRF
i fixed it. log_migration_generator.rb need requires :
require ‘rails/generators/actions/create_migration’
require ‘rails/generators’
require ‘rails/generators/active_record/migration/migration_generator’
Hi, Roberto
Thank you for the great post, which tremendously helped me. I really appreciate it.
I have an issue which I think is related to this multiple database set-up. Only models using the secondary database don’t work with select method of ActiveRecord
2.3.1 :011 > CouponCampaign.all.select(:id)
CouponCampaign Load (0.3ms) SELECT 'coupon_campaigns'.'id' FROM 'coupon_campaigns'
(Object doesn't support #inspect)
Do you have any idea to solve this one?
Thanks so much for this! I had asked a question on Stack Overflow in October 2016 asking about how to include multiple databases in a single schema.rb file but never got an answer. I asked before I migrated all of my RoR 5.0.2 websites to Heroku a few weeks ago. After moving them to Heroku it makes total sense to keep the migrations separated as you do on this post. Roberto, if you are on Stack Overflow please post your username. I plan on answering my question with all of my code and include your link as my reference point. I would like to tag you on the post. This information will help many, many people.
Hello Pamela, glad I could help. I’m @ostinelli there, though not particularly active. :)
I just answered my question on Stack Overflow. I hope I tagged you properly on the answer. I’m in the process of adding a new database to my Rails application. I wanted that information in Stack Overflow since that blog gets so many hits in search engines. It took me a while to find this one which made sense and fully explained what’s going on.
Can you please leave a link here to your reply? Thank you.
Here’s the link to Pamela’s post on StackOverflow:
https://stackoverflow.com/questions/40119912/can-i-rebuild-db-schema-rb-to-include-multiple-databases-without-migrations/43350475#43350475
This worked great for almost everything — I ran into some difficulties with the
seed
command, as it insists on looking in theschema_migrations
table. I tried a few variations on setting theDatabaseTasks
configuration and telling it to run theload_seed
command in a custom class without checking for pending migrations, but that raised further problems as I no longer had access to my ActiveRecord objects at all. Ended up just doing the seeds by hand, which is no big deal, but would be nice to have that working seamlessly as the other commands do.Need to add my thanks here, really clean and concise guide.
This article really helped me a lot. When I was working on Sass apps, I came across maintaining separate databases and shared schemas of a database. With this article, I came to know how to maintain multiple databases of a single application instance
If I want to setup 10 separate databases, Do I need to create 10 database.yml files. Can’t we accomplish in an optimised way?
Thanks in advance.
I just want to say thank you veeeeeeeery much!
Thank you very much .. you make my day
Thanks for the guide. It’s mostly working except that some other migrations from a rails engine are getting called in my new db. I can manually clean that up and not a huge deal.