Skip to content

Commit

Permalink
Improve documentation's description of JOIN clauses.
Browse files Browse the repository at this point in the history
In bug #12000, Andreas Kunert complained that the documentation was
misleading in saying "FROM T1 CROSS JOIN T2 is equivalent to FROM T1, T2".
That's correct as far as it goes, but the equivalence doesn't hold when
you consider three or more tables, since JOIN binds more tightly than
comma.  I added a <note> to explain this, and ended up rearranging some
of the existing text so that the note would make sense in context.

In passing, rewrite the description of JOIN USING, which was unnecessarily
vague, and hadn't been helped any by somebody's reliance on markup as a
substitute for clear writing.  (Mostly this involved reintroducing a
concrete example that was unaccountably removed by commit 032f3b7.)

Back-patch to all supported branches.
  • Loading branch information
tglsfdc committed Nov 19, 2014
1 parent a9f3280 commit 0632eff
Showing 1 changed file with 98 additions and 56 deletions.
154 changes: 98 additions & 56 deletions doc/src/sgml/queries.sgml
Original file line number Diff line number Diff line change
Expand Up @@ -118,10 +118,12 @@ FROM <replaceable>table_reference</replaceable> <optional>, <replaceable>table_r
</synopsis>

A table reference can be a table name (possibly schema-qualified),
or a derived table such as a subquery, a table join, or complex
combinations of these. If more than one table reference is listed
in the <literal>FROM</> clause they are cross-joined (see below)
to form the intermediate virtual table that can then be subject to
or a derived table such as a subquery, a <literal>JOIN</> construct, or
complex combinations of these. If more than one table reference is
listed in the <literal>FROM</> clause, the tables are cross-joined
(that is, the Cartesian product of their rows is formed; see below).
The result of the <literal>FROM</> list is an intermediate virtual
table that can then be subject to
transformations by the <literal>WHERE</>, <literal>GROUP BY</>,
and <literal>HAVING</> clauses and is finally the result of the
overall table expression.
Expand Down Expand Up @@ -161,6 +163,16 @@ FROM <replaceable>table_reference</replaceable> <optional>, <replaceable>table_r
A joined table is a table derived from two other (real or
derived) tables according to the rules of the particular join
type. Inner, outer, and cross-joins are available.
The general syntax of a joined table is
<synopsis>
<replaceable>T1</replaceable> <replaceable>join_type</replaceable> <replaceable>T2</replaceable> <optional> <replaceable>join_condition</replaceable> </optional>
</synopsis>
Joins of all types can be chained together, or nested: either or
both <replaceable>T1</replaceable> and
<replaceable>T2</replaceable> can be joined tables. Parentheses
can be used around <literal>JOIN</> clauses to control the join
order. In the absence of parentheses, <literal>JOIN</> clauses
nest left-to-right.
</para>

<variablelist>
Expand Down Expand Up @@ -197,10 +209,28 @@ FROM <replaceable>table_reference</replaceable> <optional>, <replaceable>table_r
<para>
<literal>FROM <replaceable>T1</replaceable> CROSS JOIN
<replaceable>T2</replaceable></literal> is equivalent to
<literal>FROM <replaceable>T1</replaceable>,
<replaceable>T2</replaceable></literal>. It is also equivalent to
<literal>FROM <replaceable>T1</replaceable> INNER JOIN
<replaceable>T2</replaceable> ON TRUE</literal> (see below).
It is also equivalent to
<literal>FROM <replaceable>T1</replaceable>,
<replaceable>T2</replaceable></literal>.
<note>
<para>
This latter equivalence does not hold exactly when more than two
tables appear, because <literal>JOIN</> binds more tightly than
comma. For example
<literal>FROM <replaceable>T1</replaceable> CROSS JOIN
<replaceable>T2</replaceable> INNER JOIN <replaceable>T3</replaceable>
ON <replaceable>condition</replaceable></literal>
is not the same as
<literal>FROM <replaceable>T1</replaceable>,
<replaceable>T2</replaceable> INNER JOIN <replaceable>T3</replaceable>
ON <replaceable>condition</replaceable></literal>
because the <replaceable>condition</replaceable> can
reference <replaceable>T1</replaceable> in the first case but not
the second.
</para>
</note>
</para>
</listitem>
</varlistentry>
Expand Down Expand Up @@ -240,47 +270,6 @@ FROM <replaceable>table_reference</replaceable> <optional>, <replaceable>table_r
<quote>match</quote>, as explained in detail below.
</para>

<para>
The <literal>ON</> clause is the most general kind of join
condition: it takes a Boolean value expression of the same
kind as is used in a <literal>WHERE</> clause. A pair of rows
from <replaceable>T1</> and <replaceable>T2</> match if the
<literal>ON</> expression evaluates to true for them.
</para>

<para>
<literal>USING</> is a shorthand notation: it takes a
comma-separated list of column names, which the joined tables
must have in common, and forms a join condition specifying
equality of each of these pairs of columns. Furthermore, the
output of <literal>JOIN USING</> has one column for each of
the equated pairs of input columns, followed by the
remaining columns from each table. Thus, <literal>USING (a, b,
c)</literal> is equivalent to <literal>ON (t1.a = t2.a AND
t1.b = t2.b AND t1.c = t2.c)</literal> with the exception that
if <literal>ON</> is used there will be two columns
<literal>a</>, <literal>b</>, and <literal>c</> in the result,
whereas with <literal>USING</> there will be only one of each
(and they will appear first if <command>SELECT *</> is used).
</para>

<para>
<indexterm>
<primary>join</primary>
<secondary>natural</secondary>
</indexterm>
<indexterm>
<primary>natural join</primary>
</indexterm>
Finally, <literal>NATURAL</> is a shorthand form of
<literal>USING</>: it forms a <literal>USING</> list
consisting of all column names that appear in both
input tables. As with <literal>USING</>, these columns appear
only once in the output table. If there are no common
columns, <literal>NATURAL</literal> behaves like
<literal>CROSS JOIN</literal>.
</para>

<para>
The possible types of qualified join are:

Expand Down Expand Up @@ -358,19 +347,70 @@ FROM <replaceable>table_reference</replaceable> <optional>, <replaceable>table_r
</varlistentry>
</variablelist>
</para>

<para>
The <literal>ON</> clause is the most general kind of join
condition: it takes a Boolean value expression of the same
kind as is used in a <literal>WHERE</> clause. A pair of rows
from <replaceable>T1</> and <replaceable>T2</> match if the
<literal>ON</> expression evaluates to true.
</para>

<para>
The <literal>USING</> clause is a shorthand that allows you to take
advantage of the specific situation where both sides of the join use
the same name for the joining column(s). It takes a
comma-separated list of the shared column names
and forms a join condition that includes an equality comparison
for each one. For example, joining <replaceable>T1</>
and <replaceable>T2</> with <literal>USING (a, b)</> produces
the join condition <literal>ON <replaceable>T1</>.a
= <replaceable>T2</>.a AND <replaceable>T1</>.b
= <replaceable>T2</>.b</literal>.
</para>

<para>
Furthermore, the output of <literal>JOIN USING</> suppresses
redundant columns: there is no need to print both of the matched
columns, since they must have equal values. While <literal>JOIN
ON</> produces all columns from <replaceable>T1</> followed by all
columns from <replaceable>T2</>, <literal>JOIN USING</> produces one
output column for each of the listed column pairs (in the listed
order), followed by any remaining columns from <replaceable>T1</>,
followed by any remaining columns from <replaceable>T2</>.
</para>

<para>
<indexterm>
<primary>join</primary>
<secondary>natural</secondary>
</indexterm>
<indexterm>
<primary>natural join</primary>
</indexterm>
Finally, <literal>NATURAL</> is a shorthand form of
<literal>USING</>: it forms a <literal>USING</> list
consisting of all column names that appear in both
input tables. As with <literal>USING</>, these columns appear
only once in the output table. If there are no common
column names, <literal>NATURAL</literal> behaves like
<literal>CROSS JOIN</literal>.
</para>

<note>
<para>
<literal>USING</literal> is reasonably safe from column changes
in the joined relations since only the listed columns
are combined. <literal>NATURAL</> is considerably more risky since
any schema changes to either relation that cause a new matching
column name to be present will cause the join to combine that new
column as well.
</para>
</note>
</listitem>
</varlistentry>
</variablelist>

<para>
Joins of all types can be chained together or nested: either or
both <replaceable>T1</replaceable> and
<replaceable>T2</replaceable> can be joined tables. Parentheses
can be used around <literal>JOIN</> clauses to control the join
order. In the absence of parentheses, <literal>JOIN</> clauses
nest left-to-right.
</para>

<para>
To put this together, assume we have tables <literal>t1</literal>:
<programlisting>
Expand Down Expand Up @@ -487,6 +527,8 @@ FROM <replaceable>table_reference</replaceable> <optional>, <replaceable>table_r
clause is processed <emphasis>before</> the join, while
a restriction placed in the <literal>WHERE</> clause is processed
<emphasis>after</> the join.
That does not matter with inner joins, but it matters a lot with outer
joins.
</para>
</sect3>

Expand Down

0 comments on commit 0632eff

Please sign in to comment.