Sunday, October 9, 2016

Using Ruby CSV Library to Export CSV File in Rails Apps - send_data method and 'exec_query' Approach

Using Ruby CSV Library to Export CSV File in Rails Apps - send_data method and 'exec_query' Approach

N+1 Queries Problem

Since there are 4 relations (order, model, color, size) for model Orderdetail, it causes ‘N+1 Query’ problems for this scenario. Let’s take a look at the console log:
Started GET “/orders/1.csv” for ::1 at 2016-10-08 12:54:14 +0800
ActiveRecord::SchemaMigration Load (0.2ms) SELECT “schema_migrations”.* FROM “schema_migrations”
Processing by OrdersController#show as CSV
Parameters: {“id”=>”1”}
Order Load (0.2ms) SELECT “orders”.* FROM “orders” WHERE “orders”.”id” = ? LIMIT ? [[“id”, 1], [“LIMIT”, 1]]
Orderdetail Load (0.2ms) SELECT “orderdetails”.* FROM “orderdetails” WHERE “orderdetails”.”order_id” = ? ORDER BY model_id, size_id ASC [[“order_id”, 1]]
Order Load (0.1ms) SELECT “orders”.* FROM “orders” WHERE “orders”.”id” = ? LIMIT ? [[“id”, 1], [“LIMIT”, 1]]
Model Load (0.1ms) SELECT “models”.* FROM “models” WHERE “models”.”id” = ? LIMIT ? [[“id”, 1], [“LIMIT”, 1]]
Size Load (0.1ms) SELECT “sizes”.* FROM “sizes” WHERE “sizes”.”id” = ? ORDER BY name ASC LIMIT ? [[“id”, 1], [“LIMIT”, 1]]
Color Load (0.3ms) SELECT “colors”.* FROM “colors” WHERE “colors”.”id” = ? ORDER BY name ASC LIMIT ? [[“id”, 4], [“LIMIT”, 1]]
Order Load (0.1ms) SELECT “orders”.* FROM “orders” WHERE “orders”.”id” = ? LIMIT ? [[“id”, 1], [“LIMIT”, 1]]
Model Load (0.1ms) SELECT “models”.* FROM “models” WHERE “models”.”id” = ? LIMIT ? [[“id”, 1], [“LIMIT”, 1]]
Size Load (0.1ms) SELECT “sizes”.* FROM “sizes” WHERE “sizes”.”id” = ? ORDER BY name ASC LIMIT ? [[“id”, 1], [“LIMIT”, 1]]
Color Load (0.1ms) SELECT “colors”.* FROM “colors” WHERE “colors”.”id” = ? ORDER BY name ASC LIMIT ? [[“id”, 2], [“LIMIT”, 1]]
Order Load (0.1ms) SELECT “orders”.* FROM “orders” WHERE “orders”.”id” = ? LIMIT ? [[“id”, 1], [“LIMIT”, 1]]
Model Load (0.1ms) SELECT “models”.* FROM “models” WHERE “models”.”id” = ? LIMIT ? [[“id”, 1], [“LIMIT”, 1]]
Size Load (0.1ms) SELECT “sizes”.* FROM “sizes” WHERE “sizes”.”id” = ? ORDER BY name ASC LIMIT ? [[“id”, 2], [“LIMIT”, 1]]
Color Load (0.2ms) SELECT “colors”.* FROM “colors” WHERE “colors”.”id” = ? ORDER BY name ASC LIMIT ? [[“id”, 4], [“LIMIT”, 1]]
Order Load (0.1ms) SELECT “orders”.* FROM “orders” WHERE “orders”.”id” = ? LIMIT ? [[“id”, 1], [“LIMIT”, 1]]
Model Load (0.1ms) SELECT “models”.* FROM “models” WHERE “models”.”id” = ? LIMIT ? [[“id”, 1], [“LIMIT”, 1]]
Size Load (0.1ms) SELECT “sizes”.* FROM “sizes” WHERE “sizes”.”id” = ? ORDER BY name ASC LIMIT ? [[“id”, 2], [“LIMIT”, 1]]
Color Load (0.1ms) SELECT “colors”.* FROM “colors” WHERE “colors”.”id” = ? ORDER BY name ASC LIMIT ? [[“id”, 2], [“LIMIT”, 1]]
Rendering text template
Rendered text template (0.0ms)
Sent data order_1234_2016-10-08 12:54:14.csv (14.0ms)
Completed 200 OK in 149ms (Views: 13.8ms | ActiveRecord: 4.7ms)
There are four order details for order 1, it takes 19 queries for 4.7ms. We can conclude that:
query number = ( number of order details X 4 ) + 3
Thus, the more order details contained in one order, the more query time used.

Direct SQL Query

Is there a way to solve this problem? What if we just use one single query to get all the data we need? Here is my plan. I wrote a single SQL statement which can pull out all the data from the database:
select a.po_number as ‘PO Number’, c.name as ‘Model Name’,
d.name as ‘Color’, e.name as ‘Size’, b.price as ‘Price’, b.quantity as Quantity (b.price*b.quantity) as ‘Total Amount’
from orders a, orderdetails b, models c, colors d, sizes e
where a.id = b.order_id and b.model_id=c.id and b.color_id=d.id and b.size_id=e.id and b.order_id=a.id
I tested it through a standalone database client and it returned the correct result. Then, I called ‘“exec_query” method to do the one-query action in the orders controller. The class used is:
ActiveRecord::Base.connection.exec_query([query string])
The result set is an ActiveRecord::Result object and very easy to handle with. The header and values of the result set can be obtained by using the following methods:
# Get the column names of the result:
result.columns
# => ["id", "model", "color"]

# Get the record values of the result:
result.rows
# => [[1, "model_1", "color_1"],
      [2, "model_2", "color_2"],
      ...
     ]
All we need to do is to save the header and the values to the csv string. All these code was wrapped into a private action query_to_csv(). In action show, the csv string is passed into send_data which outputs the result to the file in csv format.
format.csv { send_data query_to_csv(@order.id), filename: fn }
It’s quite simple. Here is the code in the controller.
app/controllers/orders_controller.rb
class OrdersController < ApplicationController
    :
    :
  def show
    @orderdetails = @order.orderdetails
    add_breadcrumb @order.id, order_path

    respond_to do |format|
      format.html
      format.json
      fn = "order_#{@order.po_number}_#{Time.now.strftime("%Y-%m-%d %H:%M:%S")}.csv"
      # Send_data method and exec_query approach
      format.csv { send_data query_to_csv(@order.id), filename: fn }
    end
  end   

  private

  # Use ActiveRecord::Base.connection.exec_query to 
  # query db directly only once.
  def query_to_csv(order_id)
    query_string = 'select a.po_number as "PO Number", 
        c.name as "Model Name", d.name as Color,
        e.name as Size, b.price as Price, 
        b.quantity as Quantity,
        (b.price*b.quantity) as "Total Amount"
      from orders a, orderdetails b, models c, 
        colors d, sizes e
      where a.id = b.order_id and b.model_id=c.id and
        b.color_id=d.id and b.size_id=e.id and
        b.order_id=' + order_id.to_s
    results = ActiveRecord::Base.connection.exec_query(query_string)       
    CSV.generate(headers: true) do |csv|
      csv << results.columns
      results.rows.each do |row|
        csv << row
      end
    end
  end
end
Note that PostgreSQL Server only allow double quote for the sql statement. So I use single quote for quoting the whole query string and double quote for the item name with space in it. For example:
‘select a.po_number as “PO Number”…………….. ‘
That’s it!

Performance Improvement

Let’s take a look at the console log.
Started GET “/orders/1.csv” for ::1 at 2016-10-08 12:58:50 +0800
ActiveRecord::SchemaMigration Load (0.2ms) SELECT “schema_migrations”.* FROM “schema_migrations”
Processing by OrdersController#show as CSV
Parameters: {“id”=>”1”}
Order Load (0.2ms) SELECT “orders”.* FROM “orders” WHERE “orders”.”id” = ? LIMIT ? [[“id”, 1], [“LIMIT”, 1]]
(0.2ms) select a.po_number as ‘PO Number’, c.name as ‘Model Name’, d.name as ‘Color’, e.name as ‘Size’, b.price as ‘Price’, b.quantity as Quantity, (b.price*b.quantity) as ‘Total Amount’ from orders a, orderdetails b, models c, colors d, sizes ewhere a.id = b.order_id and b.model_id=c.id and b.color_id=d.id and b.size_id=e.id and b.order_id=1
Rendering text template
Rendered text template (0.0ms)
Sent data order_1234_2016-10-08 12:58:50.csv (4.5ms)
Completed 200 OK in 56ms (Views: 4.3ms | ActiveRecord: 1.3ms)
It takes 3 queries for 1.3ms! More than three times faster!
Note: the private action can be extracted as a Service Object. Please refer to this article.

No comments:

Post a Comment