Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync with upstream master #27

Merged
merged 7,865 commits into from
Oct 31, 2018
Merged

Sync with upstream master #27

merged 7,865 commits into from
Oct 31, 2018

Conversation

aspekt112
Copy link

No description provided.

hlinnaka and others added 30 commits September 30, 2018 19:17
They're still not supported for external protocols, resource groups, and
resource queues. Nor for any other GPDB-specific object types I might have
missed. I think that's OK, I don't think anyone particularly cares about
event triggers for those objects, and we can add the support later if
someone asks for it. But external tables seems good to support, to be
consistent with other kinds of tables.
We don't particularly care about event triggers on CREATE PROTOCOL as such,
but we were tripping the assertion in EventTriggerCommonSetup(), because
it was being called for CREATE PROTOCOL, even though it was listed as
not supported. It seems to actually work fine, if we just mark it as
suppported, so might as well.

Add a test case, also for the CREATE EXTERNAL TABLE support.
The _complex_ops operator class for GIN indexes was incomplete. The
support functions, in pg_amproc, were missing, and the datatype in
pg_opclass was incorrect. (It was 'numeric', instead of 'complex',
a copy-pasto from _numeric_ops). As a result, trying to create a GIN
index on complex[] did not work:

postgres=# CREATE INDEX ON complex_array_tab USING gin (complex_arr);
ERROR:  missing support function 1 for attribute 1 of index "complex_array_tab_complex_arr_idx" (indexam.c:832)

This was not a problem in GPDB 5, because we didn't support GIN indexes
at all. But now that we do, let's fix it. I don't think anyone cares much
about this particular case, but we support GIN indexes on arrays of all
built-in types, so let's support this too completeness.
Also display cache hit rate and other stats after each Travis CI run
The Apache Portable Runtime (APR) is not a direct dependency of the
Greenplum server, but it's depended upon by `gpfdist` and `gpperfmon`.
Judging from the code, we don't seem to depend on any (very) particular
version of APR, as long as it's something recent-ish (say 1.5.0+).

Commit 168f612a73e (#58) started our Travis CI config with directly
downloading APR from Apache releases. This was probably due to the fact
that Travis in 2015 was still running Ubuntu 12.04 Precise that shipped
too old of a version of APR. Over the years, updating APR has been a
recurrent semiannual ritual, see commits 530db5ea71e (#5873),
27adcf92a13, 19251f3ff72, 5a9885f4f84, 2c68384e510.

This patch ends that fun madness by installing `libapr1-dev` from apt.
It turns out -- interestingly enough -- APU (a sister library of APR)
isn't needed in Travis: when we started using Travis (in commit
168f612a73e #58), we were building APU, but APU is really _only_ a
dependency of `gpperfmon`, which we _never_ built in Travis.
Commit df050f9bfd9 changed the signature of url_curl_fopen but left
behind the stub for when we configure the server without linking to
libcurl. This patch fixes the following compilation error:

```
make[4]: Entering directory '/home/pivotal/src/pvtl/gpdb/src/backend/access/external'
ccache clang-8 -O3 -std=gnu99  -Wall -Wmissing-prototypes
-Wpointer-arith -Wendif-labels -Wmissing-format-attribute
-Wformat-security -fno-strict-aliasing -fwrapv -Wno-address
-Wno-unused-command-line-argument -g -ggdb -Werror=uninitialized
-Werror=implicit-function-declaration -I../../../../src/include
-D_GNU_SOURCE   -c -o url_curl.o url_curl.c -MMD -MP -MF
.deps/url_curl.Po
url_curl.c:1821:1: error: conflicting types for 'url_curl_fopen'
url_curl_fopen(char *url, bool forwrite, extvar_t *ev, CopyState pstate, int *response_code, const char **response_string)
^
../../../../src/include/access/url.h:88:18: note: previous declaration is here
extern URL_FILE *url_curl_fopen(char *url, bool forwrite, extvar_t *ev, CopyState pstate);
                 ^
1 error generated.
../../../../src/Makefile.global:768: recipe for target 'url_curl.o'
failed

```
[#160659122]

Co-authored-by: Goutam Tadi <[email protected]>
Co-authored-by: Fei Yang <[email protected]>
Co-authored-by: Xin Zhang <[email protected]>
The unix python module had grown support for a lot of commands which
are no longer used, this removes all unused functions.
Commenting out a block of code with a multiline string is undoubtedly
a questionable practice. Remove.
The tests, and the testharness code, has seemingly been unused for
quite some time. Remove all the dead code and the invocations of
testUtils functions in which were all noops as the global variable
written to hadn't been initialized.
The ability to inject a pause for the FTS prober was added in 2011
in order to test the resolution to a bug in the FTS prober.  There
is no test that currently use this, and seemingly there never was.
Further, the comment is incorrect which adds to the suspicion that
this has been unused for quite some time so remove.
This was never actually implemented, and thus never used.
A cached query planned statement contains information that is
freed after the first execution of a function. The second execution
used the cached planned statement to populate the execution state
using a freed pointer and throws a segmentation fault.

To resolve, we copy the contents out of the planned statement rather
than copying a pointer to the planned statement, so that when our
executor state gets freed, it does not also free the cached data.

Note: `numSelectorsPerScanId` is the problematic property, and is only
used for partition tables.

Co-authored-by: David Kimura <[email protected]>
Co-authored-by: Taylor Vesely <[email protected]>
* Modify SQL references for replicated tables

* Changes for edits and review comments
…D BY.

Allow specifying multiple UNIQUE constraints in the CREATE TABLE command,
or PRIMARY KEY and UNIQUE constraints, as long as they are all compatible
with the DISTRIBUTED BY clause.

Don't require the constraint's key columns to be a left-subset of the
DISTRIBUTED BY clause. It still needs to be a subset, but any subset will
do.

Likewise adjust the rules for deriving the distribution key from the
constraints, if no DISTRIBUTED BY was given, to choose the largest common
subset across all constraints.

Discussion: https://groups.google.com/a/greenplum.org/d/msg/gpdb-dev/OJzz8I6WuVI/tMB8GcXdAgAJ
The return value from AllocateDir() was not checked. That's not
mandatory, if you immediately pass the return value to ReadDir(), as
explained in the comment in ReadDir(), but you can't rely on that if
you do anything in between the calls that might change errno. We were
requesting a checkpoint, and doing many other things in between.

Also, don't try to open the directory in the mirror until after the
checkpoint. Before the checkpoint, the directory might not exist in
the mirror yet. (I don't think there's any guarantee that it will exist
after requesting the checkpoint either, but we're taking chances with
that here anyway. At least it makes it a lot more likely.)

I think this explains the occasional errors like:

ERROR:  could not open directory "/tmp/build/e18b2f02/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror1/demoDataDir0/base/513914": Success (fd.c:2207)

that we've been seeing on the pipeline. Or at least, if this doesn't fix
them, we should now get a more correct error message than "Success".
The AoHeaderCheckErrorStr string is a statically allocated buffer,
so we must always use snprintf() when writing to it to reduce the
risk of an out-of-bounds write. This also moves the definition of
the error string to the top of the file to match project style.

Also remove unnecessary assertions on snprintf(); if we consider
it an error that message is truncated, then we should be using a
datastructure that supports being extended.

Reviewed-by: Jacob Champion <[email protected]>
Reviewed-by: Joao Pereira <[email protected]>
Distribution key joins on pathkeys where the parser invokes an implicit
cast via a RelabelType node would introduce a redistribution motion due
to the join not being recognized as a colocated equijoin.

As an example, consider the following relation

    create table test (a varchar(10), b varchar(10)) distributed by (a);

The below query will introduce an implicit cast on a.a and b.a to TEXT by
the parser. This means that the pathkey EquivalenceClass is on VARCHAR but
the RestrictInfo ECs are for TEXT.

    select * from test a, test b where a.a=b.a;

Fix by following the EC members to their basetypes to allow for colocated
equijoins in these cases.

Affected testfiles updated, and one affected testcase slightly tweaked to
ignore less and generate less uninteresting output in the .out file.

Reviewed-by: Jesse Zhang <[email protected]>
It's confusing to see whitespace differences after a failure, since they
weren't the diffs that *caused* the failure. Suppress them.
When pg_dump or pg_dumpall dumps a database that has gp_default_storage_options
GUC set to something other than the default settings, restoring the tables in
that database immediately after connecting to it would possibly create the table
with the wrong storage parameters. To prevent this, we make sure the
gp_default_storage_options GUC is set to built-in default values before
restoring tables.

Co-authored-by: Jamie McAtamney <[email protected]>
`hashDatum` expects the incoming oid for the array types to be
ANYARRAYOID, else it will flag them as unsupported.  While performing
the MCV calcuation on the array type columns, we were passing the
specific array oid, i.e character array oid for char array type column,
due to which `hashdatum` was marking them as not supported.  However, we
should pass ANYARRAYOID if the column is an array type. This commit
fixes the issue.

Also, relevant test cases are added.
Commit 4eb65a53dea added column numsegments to catalog table
gp_distribution_policy but forgot to bump the catalog. We've since made
smaller changes to the catalog in commits e628194e49d, 009218954e4, and
c09cc2d80fa . 🤷‍♀️

Fix that now.
…ion.

The new window function executor doesn't require a combine function
anymore.
…tions.

If an aggregate, with transition type 'internal', is missing the serial or
deserial function, it cannot be used in partial aggregation. In principle,
it could still be done within a single process, but GPDB doesn't create
such plans, currently, so in practice the combine function is useless
without serial and deserial functions.

There is a check for this in the executor, but we better catch this at
planning time. All of the built-in aggregates have both combine and
serial/deserial functions, or neither, so this is only a problem for
user-defined aggregates.
bhuvnesh2703 and others added 29 commits October 29, 2018 14:11
This patch converts "(NOT (OR A B))" to "(AND (NOT A) (NOT B))". As a
result, it would enable pull-up for sublink of type "NOT OR".

For instance, for the query below:

```
select * from a where NOT
		( a.i in (select b.i from b) OR a.i in (select c.i from c) );
```

Previously, the scan of b and c would be two separate SubPlans:
```
                               QUERY PLAN
-------------------------------------------------------------------------
 Gather Motion 3:1  (slice3; segments: 3)
   ->  Seq Scan on a
         Filter: ((NOT (hashed SubPlan 1)) AND (NOT (hashed SubPlan 2)))
         SubPlan 1  (slice3; segments: 3)
           ->  Materialize
                 ->  Broadcast Motion 3:3  (slice1; segments: 3)
                       ->  Seq Scan on b
         SubPlan 2  (slice3; segments: 3)
           ->  Materialize
                 ->  Broadcast Motion 3:3  (slice2; segments: 3)
                       ->  Seq Scan on c
 Optimizer: legacy query optimizer
```

And now the scan of b and c would be converted into semi joins:
```
                             QUERY PLAN
---------------------------------------------------------------------
 Gather Motion 3:1  (slice3; segments: 3)
   ->  Hash Left Anti Semi (Not-In) Join
         Hash Cond: (a.i = c.i)
         ->  Hash Left Anti Semi (Not-In) Join
               Hash Cond: (a.i = b.i)
               ->  Seq Scan on a
               ->  Hash
                     ->  Broadcast Motion 3:3  (slice1; segments: 3)
                           ->  Seq Scan on b
         ->  Hash
               ->  Broadcast Motion 3:3  (slice2; segments: 3)
                     ->  Seq Scan on c
 Optimizer: legacy query optimizer
(13 rows)
```

Authored-by: Richard Guo <[email protected]>
Authored-by: Alexandra Wang <[email protected]>
In previous commit 4eb65a5, we allow table distributed
on subsets of all the segments. For the corner case of
tables only distributed on 1 segments, we might generate
1:1 motions. We reduce 1:1 motions in this commit to improve
performance.

Co-authored-by: Zhenghua Lyu <[email protected]>
Co-authored-by: Shujie Zhang <[email protected]>
…lterObjectRename_internal() function (#6078)

Reviewed by Heikki, Daniel.
…th SimpleLruDoesPhysicalPageExist() (#6094)

Reviewed by Heikki.
The --enable-orafce and --enable-gphdfs were both described as "do
not build X", probably from copy/pasting another PGAC_OPTION_BOOL
which defaults to true and thus becomes --disable-X.

Reviewed-by: Heikki Linnakangas <[email protected]>
Extensions with custom plan nodes might like to use these in their
EXPLAIN output.

Hadi Moshayedi

Discussion: https://postgr.es/m/CA+_kT_dU-rHCN0u6pjA6bN5CZniMfD=-wVqPY4QLrKUY_uJq5w@mail.gmail.com
This adds support for the structured formats in EXPLAIN (format ..) to
the MemAccounting code, which previously only supported string output.
The walker code is updated to take the ExplainState and make decisions
on output format rather then just the take the buffer and assume text
output format.

As the PrettyPrinting debug function was using the ToString walker, it
was removed in this commit since there are no callsites actually using
it. The unittest for PrettyPrint was also removed, and the unittest for
ToString updated to match the new code.

Also tidies up the header includes to be in alphabetical order as per
how the code is usually structured.

Co-authored-by: Jimmy Yih <[email protected]>
The lock TablespaceHashLock was added during 5.0 development in commit
677c3eaca06. With the removal of filespace this lock effectively became
dead code. This is made more obvious by commit fe12fd8c6c6, which
removed the vestigial references to this lock.

I noticed this while glancing over lwlock.h as part of reviewing
#6057 .
There used to be an unconditional call to detoast the value earlier in the
function, as hinted by the comments, but that was removed in PostgreSQL 9.4
by commit b006f4ddb9. We need to deal with toasted values here now.

Fixes github issue #6081.
…ion.

This has been broken since commit af7914c6627bcf0b0ca614e9ce95d3f8056602bf,
which added the EXPLAIN (TIMING) option.  Although that commit included
updates to auto_explain, they evidently weren't tested very carefully,
because the code failed to print node timings even when it should, due to
failure to set es.timing in the ExplainState struct.  Reported off-list by
Neelakanth Nadgir of Salesforce.

In passing, clean up the documentation for auto_explain's options a
little bit, including re-ordering them into what seems to me a more
logical order.
When running EXPLAIN ANALYZE queries, the showstat context was being
populated twice which led to an assertion failure. Fix by adding a
flag to the struct to signal that the stats gathering has already
been performed. Also add the Greenplum Executor end statistics to
the printing and properly set CDB instrumentation.

Closes #5906
Co-authored-by: Daniel Gustafsson <[email protected]>
This really should've been included in the last commit, but I missed
adding it before pushing..
I noticed this after committing 7581b5370d7, that there's a gap between
ErrorLogLock and SessionStateLock in the main LWLock array. After a bit
of digging, seems the 7th (missing) lock was intended for
FirstWorkfileMgrLock, which got re-arranged to after the individual
Greenplum locks, but we forgot to squeeze out the gap while resolving
conflicts during the merge with Postgres 9.4.

Eliminate that gap now.
When follow the building guide to build gp in centos7, meet some problems.
 * Remove pysql from develop-dependencies.txt since it would meet an error
"install_script 'pysql_w32_postinst.py' not found in scripts" when
execute "pip install pysql".
 * Use absolute-path to rename cmake3 to cmake.
 * Run ldconfig after build orca to avoid "configure: error: Your
ORCA version is expected to be 2.67.XXX" error

Signed-off-by: zhanggang <[email protected]>
Two bugs:

1. NULLs were not hashed correctly.

2. The code to fetch the attribute's datatype was wrong. It fetched the
   datatype of the table's nth attribute, instead of the nth distribution
   key column's.

Both of these would lead to false report of wrong distribution, or errors.

Reviewed-by: Paul Guo <[email protected]>
Run pgindent, and some small manual cleanup on top of it.
This code is concerned with partitioning, not with distribution across
segments, so there's no need to rely on the "cdb" hashing. The cdb hashing
only works for a limited set of built-in datatypes, so this allows merging
leaf stats for any datatype that has a hash opclass defined.

One small downside of this is that there are actually two built-in datatypes
that have cdbhash support: money and varbit. So we leaf statistics merging
no longer works for them. That's an acceptable loss, those datatypes should
hardly be used in production, anyway.

Another little thing that caused expected output changes was that the new
code makes a different choice on which MCVs to keep, when there is a tie
between multiple values. The choice is arbitrary, and the new one is as
good as the old one.
Use built-in hash_any instead. The "match" function uses equal() to
compare the Consts, which compares the raw bytes instead of doing a
datatype-aware comparison, so we might as well hash the raw bytes in the
hash function, too.

While we're at it, refactor the predicate testing code a bit. Move the
functions related to the "possible values" computation to a separate file,
and clean up and refactor the code for readability.
Now that there are no callers of hashDatum/hashNullDatum left outside
cdbhash.c, we can simplify it.
* topifying data types doc

* Doc: Breaking data type topics into separate files

* topifying data types doc

* Doc: Breaking data type topics into separate files

* Fix ditamap and remove duplicate content
…6057)

Also delete the unittest for set_gp_replication_config
Remove GpReplicationConfigFileLock

Co-authored-by: Jesse Zhang <[email protected]>
Co-authored-by: Shaoqi Bai <[email protected]>
It was incorrectly checking *isNull, which is an output from the function.
As a result, reshuffling failed to move rows that had NULLs in
distribution key columns. Add test case.
A collection of typo fixes in comments that I found sitting around in
a local patch.
@aspekt112 aspekt112 merged commit fd68ad8 into arenadata:master Oct 31, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.