Rader on Rails

Dispatches from my web development journey.

Make Sure You Get Only What You Need From Your Joins

TL;DR: SELECT * FROM ... is not the same as SELECT table_name.* FROM .... SELECT * will select everything returned from the query. SELECT table_name.* will select elements only from the specified table.

I spent some time in frustration with this today. Here’s the setup of the problem: I’ve been building a simple flashcards web application with Sinatra over the weekend. The game is simple enough: A user can log in and select a deck of cards. This begins a game where the user is presented with a question and has to put in the answer. If the answer is right, the number correct in the round increases and that card isn’t shown to the user again for the remainder of the round. If it’s wrong, the number incorrect increases and the card remains in the deck.

I figured four models would be necessary to accomplish this: Deck, Card, Round and Guess. A round belongs_to a deck, has_many guesses, and has_many cards. A card has a question and answer, belongs_to a deck and also has_many guesses. A guess belongs_to a card as well as a round and has one attribute: a boolean called ‘correct’, with it’s default value set to false (a guess isn’t true until it is, right?).

With this schema, I can keep track of the cards that have correct guesses and ensure that they aren’t repeated in the deck. I wasn’t immediately sure how to do this with Active Record queries (though it turned out to be incredibly easy), so I just used a direct SQL statement. Here’s the original code I used to accomplish this:

deck.rb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
class Deck < ActiveRecord::Base
  has_many :cards
  has_many :rounds

  def choose_card
    (self.cards - cards_with_correct_guesses).sample
  end

  def cards_with_correct_guesses
    self.cards.find_by_sql("SELECT * FROM cards
                              INNER JOIN guesses ON guesses.card_id = cards.id
                            WHERE guesses.correct = 'true'")
  end

end

In cards_with_correct_guesses I query the deck for all the cards that have correct guesses.

In as good as direct English as I can make it (it’s already pretty good, but the JOIN may mean nothing to readers who haven’t touched SQL), this says: Select everything from cards, looking at the cards inside the guesses table, matching the card IDs on each guess record with the card ID in the cards table, and returning the records of which the correct attribute is ‘true’.

Inside the choose_card method above, I subtract the array of cards with correct guesses from the deck of cards, so that cards that have already been guessed correctly aren’t shown again. I then call .sample on the resulting array to pick a random card to show to the user. As the round progresses, and more cards are guessed right, the cards available to select from shrinks.

This worked just fine when working with one deck of cards. When we put in another deck of cards though, we suddenly noticed that correctly guessed cards in the second deck were being repeated. Not good. Let the debugging process begin.

I jumped into the console and ran the cards_with_correct_guesses method on the current deck being played, which was the second deck. What I saw perplexed me:

1
2
3
4
5
6
7
8
9
irb(main):001:0> Deck.find(2).cards_with_correct_guesses

SELECT "decks".* FROM "decks" WHERE "decks"."id" = $1 LIMIT 1  [["id", 2]]

SELECT * FROM cards
INNER JOIN guesses ON guesses.card_id = cards.id
WHERE (cards.deck_id = 2 AND guesses.correct = 't')

=> [<Card id: 1, deck_id: 2, question: "dictionary", answer: "puke wehewehe ‘ōlelo", created_at: "2014-04-26 17:52:42", updated_at: "2014-04-26 17:52:42">, <Card id: 2, deck_id: 2, question: "San Francisco", answer: "Kapalakiko", created_at: "2014-04-26 17:52:42" ....

You might notice that the first Card id is 1. This was definitely not right. There were already cards in the deck – this was the second deck of cards, all with their own incremented IDs. When I ran Card.all, I could see the true card IDs:

1
2
3
4
5
6
7
irb(main):001:0> Card.all

SELECT "cards".* FROM "cards"
=> [<Card id: 1, deck_id: 1, question: "Alabama", answer: "Montgomery", created_at: "2014-04-26 17:52:41", updated_at: "2014-04-26 17:52:41">, <Card id: 2, deck_id: 1, question: "Alaska", answer: "Juneau",
.
.
<Card id: 85, deck_id: 2, question: "dictionary", answer: "puke wehewehe ‘ōlelo", created_at: "2014-04-26 17:52:42", updated_at: "2014-04-26 17:52:42"> ....

The card with an ID of 1 is actually a card in the first deck (and if you haven’t already guessed, it was a deck of states and capitals). The card with an ID of 85 (in the English-to-Hawaiian deck) is the same one that my cards_with_correct_guesses method was returning with a card ID of 1.

I was pretty stumped. I decided to run the query directly from within Postgres. This was the result:

1
2
3
4
5
6
7
8
9
SELECT * FROM cards JOIN guesses ON cards.id = guesses.card_id WHERE guesses.correct = 'true';

 id | deck_id |               question                |                  answer                  |         created_at         |         updated_at         | id | round_id | card_id | correct
----+---------+---------------------------------------+------------------------------------------+----------------------------+----------------------------+----+----------+---------+---------
 85 |       2 | dictionary                            | puke wehewehe ‘ōlelo                     | 2014-04-26 17:52:42.489341 | 2014-04-26 17:52:42.489341 |  1 |        1 |      85 | t
 66 |       2 | San Francisco                         | Kapalakiko                               | 2014-04-26 17:52:42.414003 | 2014-04-26 17:52:42.414003 |  2 |        1 |      66 | t
.
.
.

I could see the correct card IDs… I could also see the guess IDs. For some reason, it seemed that the query was assigning the card ID to the last appearing ID column, which was the guess column. This should have screamed to me exactly what my problem was, but I’ll just say my brain was a little fried at this point and I didn’t spot it right away. It wasn’t until I translated the SQL directly to an Active Record query that I finally figured it out.

In Active Record, I can write the same (and actually correct) query much more concisely: self.cards.joins(:guesses).where(guesses: { correct: true}).

When I ran this, I was getting the correct card IDs. I also noticed a slight difference in the SQL statement that the method was executing:

1
2
3
4
5
6
7
8
9
irb(main):001:0> Deck.find(2).cards_with_correct_guesses

SELECT "decks".* FROM "decks" WHERE "decks"."id" = $1 LIMIT 1  [["id", 2]]

SELECT cards.* FROM cards
INNER JOIN guesses ON guesses.card_id = cards.id
WHERE (cards.deck_id = 2 AND guesses.correct = 't')

=> [<Card id: 85, deck_id: 2, question: "dictionary", answer: "puke wehewehe ‘ōlelo", created_at: "2014-04-26 17:52:42", updated_at: "2014-04-26 17:52:42">, <Card id: 66, deck_id: 2, question: "San Francisco", answer: "Kapalakiko", created_at: "2014-04-26 17:52:42", updated_at: "2014-04-26 17:52:42">

There was something different… SELECT cards.* .... I’d seen syntax like this in Rails logs before, but naively assumed it just accomplished the same thing as SELECT *.... Apparently not. A quick Google search revealed the obvious difference: selecting all will do just that – SELECT ALL; selecting all from a specific table will return elements just from that table.

So the correct query in Postgres will indeed return only the card records I want, filtered correctly by whether they have a correct guess, but without the ID ambiguity by leaving off the guess IDs:

1
2
3
4
5
6
7
8
9
SELECT cards.* FROM cards JOIN guesses ON cards.id = guesses.card_id WHERE (cards.deck_id = 2 AND guesses.correct = 't');

 id | deck_id |               question                |                  answer                  |         created_at         |         updated_at         |
----+---------+---------------------------------------+------------------------------------------+----------------------------+----------------------------+-
 85 |       2 | dictionary                            | puke wehewehe ‘ōlelo                     | 2014-04-26 17:52:42.489341 | 2014-04-26 17:52:42.489341 |
 66 |       2 | San Francisco                         | Kapalakiko                               | 2014-04-26 17:52:42.414003 | 2014-04-26 17:52:42.414003 |
.
.
.

A lot of pain for such a simple problem. But that’s how you learn.

Comments