Putting aside any opinions on performance, I’ve been trying to test a notion about whether a couple queries would output the same data (ordering doesn’t matter).

SELECT *
FROM articles
WHERE (
  last_updated >= %s
  OR id IN (1, 2, 3)
  )
  AND created_at IS NOT NULL
SELECT *
FROM articles
WHERE last_updated >= %s
  AND created_at IS NOT NULL
UNION
SELECT *
FROM articles
WHERE id IN (1, 2, 3)
  AND created_at IS NOT NULL

I think they’re equivalent, but I can’t prove it to myself.

Edit: Aye, looking at the replies, I’m becoming aware that I left out a couple key assumptions I’ve made. Assuming:

a) id is a PRIMARY KEY (or otherwise UNIQUE)

b) I mean equivalent insofar as “the rows returned will contain equivalent data same (though maybe ordered differently)”

  • boatswain
    link
    fedilink
    arrow-up
    1
    ·
    6 months ago

    I really didn’t think that’s correct–though it’s been a few years since I did SQL regularly.

    SELECT *
    FROM articles
    WHERE last_updated >= %s
      AND created_at IS NOT NULL
    UNION
    SELECT *
    FROM articles
    WHERE id IN (1, 2, 3)
      AND created_at IS NOT NULL
    

    That should give a list of all articles updated after whatever date (regardless of ID), appended to a list of all articles where the ID is 1, 2, or 3 (regardless of when they were last updated). I would expect to see extra articles that only fit one criteria or the other, and also duplicate articles.

    I included the join quote because an inner join would be the way to do this, rather than a union–though it would likely be less efficient than just filtering on the required parameters.

    If I’m wrong here, I’d love an explanation of why.

    • Deebster@programming.dev
      link
      fedilink
      English
      arrow-up
      1
      ·
      6 months ago

      It’s equivalent because UNION removes duplicates; the behaviour you’re describing happens with UNION ALL. Since both queries are article.*, both halves will have the same columns and the dedupe will be successful.

      UNION is less efficient because of this deduplication, but it’s the default since that’s what most people want. If that matters then you’d be correct that a JOIN version will be more efficient (possibly depending on indexes present and sql engine).