Rails: 批量迁移数据

最近上线的Feature都需要进行数据迁移，在Rails中有三种比较好的数据批处理方式。

`1. find_each`

find_each方法，接收start, finish, batch_size等参数，默认每次查询1000条数据。

循环对象是单一实例，方便根据实例进行不同的处理逻辑。

e.g.

Person.find_each do |person|
  person.do_awesome_stuff
end

Person.where("age > 21").find_each do |person|
  person.party_all_night!
  person.drink_much! if person.male?
end

但是需要注意的是，find_each批量处理，实际上是进行数据偏移从而批量获取数据的，从SQL上理解，就是LIMIT + OFFSET。

而默认的查询方式是根据ID进行查询和数据偏移的，若批量处理的数据过多，且ID主键在较大偏移时查询过慢，在Mysql中可以强制find_each使用特定索引，加快查询速度。

使用方式如下:

from("#{self.table_name} USE INDEX(#{index})")

e.g.

User.from('users USE INDEX(index_users_on_xxx)').where(xxx).find_each do |user|
  # any logic
end

`2. find_in_batches`

find_in_batches方法与find_each类似, 而且find_each方法也是基于find_in_batches实现的，其也接收start, finish, batch_size等参数，但是其处理对象是一批实例组成的数组，当处理对象的行为一致、可以一次性批量处理时，使用find_in_batches是个不错的选择。

e.g.

Person.where("age > 21").find_in_batches do |group|
  sleep(50) # Make sure it doesn't get too crowded in there!
  group.each { |person| person.party_all_night! }
end

`3. in_batches`

find_in_batches是基于in_batches方法实现的， in_batches方法，接收of, start, finish, load等参数，返回ActiveRecord::Relation对象，因此我们可以使用很多作用在ActiveRecord::Relation对象上的方法，非常的方便。

e.g.

Person.where("age > 21").in_batches do |relation|
  relation.delete_all
  sleep(10) # Throttle the delete queries
end

FYI:

ActiveRecord::Batches