forked from greenplum-db/gpdb-archive
-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sync with upstream master #27
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
They're still not supported for external protocols, resource groups, and resource queues. Nor for any other GPDB-specific object types I might have missed. I think that's OK, I don't think anyone particularly cares about event triggers for those objects, and we can add the support later if someone asks for it. But external tables seems good to support, to be consistent with other kinds of tables.
We don't particularly care about event triggers on CREATE PROTOCOL as such, but we were tripping the assertion in EventTriggerCommonSetup(), because it was being called for CREATE PROTOCOL, even though it was listed as not supported. It seems to actually work fine, if we just mark it as suppported, so might as well. Add a test case, also for the CREATE EXTERNAL TABLE support.
The _complex_ops operator class for GIN indexes was incomplete. The support functions, in pg_amproc, were missing, and the datatype in pg_opclass was incorrect. (It was 'numeric', instead of 'complex', a copy-pasto from _numeric_ops). As a result, trying to create a GIN index on complex[] did not work: postgres=# CREATE INDEX ON complex_array_tab USING gin (complex_arr); ERROR: missing support function 1 for attribute 1 of index "complex_array_tab_complex_arr_idx" (indexam.c:832) This was not a problem in GPDB 5, because we didn't support GIN indexes at all. But now that we do, let's fix it. I don't think anyone cares much about this particular case, but we support GIN indexes on arrays of all built-in types, so let's support this too completeness.
Also display cache hit rate and other stats after each Travis CI run
The Apache Portable Runtime (APR) is not a direct dependency of the Greenplum server, but it's depended upon by `gpfdist` and `gpperfmon`. Judging from the code, we don't seem to depend on any (very) particular version of APR, as long as it's something recent-ish (say 1.5.0+). Commit 168f612a73e (#58) started our Travis CI config with directly downloading APR from Apache releases. This was probably due to the fact that Travis in 2015 was still running Ubuntu 12.04 Precise that shipped too old of a version of APR. Over the years, updating APR has been a recurrent semiannual ritual, see commits 530db5ea71e (#5873), 27adcf92a13, 19251f3ff72, 5a9885f4f84, 2c68384e510. This patch ends that fun madness by installing `libapr1-dev` from apt. It turns out -- interestingly enough -- APU (a sister library of APR) isn't needed in Travis: when we started using Travis (in commit 168f612a73e #58), we were building APU, but APU is really _only_ a dependency of `gpperfmon`, which we _never_ built in Travis.
Commit df050f9bfd9 changed the signature of url_curl_fopen but left behind the stub for when we configure the server without linking to libcurl. This patch fixes the following compilation error: ``` make[4]: Entering directory '/home/pivotal/src/pvtl/gpdb/src/backend/access/external' ccache clang-8 -O3 -std=gnu99 -Wall -Wmissing-prototypes -Wpointer-arith -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -Wno-address -Wno-unused-command-line-argument -g -ggdb -Werror=uninitialized -Werror=implicit-function-declaration -I../../../../src/include -D_GNU_SOURCE -c -o url_curl.o url_curl.c -MMD -MP -MF .deps/url_curl.Po url_curl.c:1821:1: error: conflicting types for 'url_curl_fopen' url_curl_fopen(char *url, bool forwrite, extvar_t *ev, CopyState pstate, int *response_code, const char **response_string) ^ ../../../../src/include/access/url.h:88:18: note: previous declaration is here extern URL_FILE *url_curl_fopen(char *url, bool forwrite, extvar_t *ev, CopyState pstate); ^ 1 error generated. ../../../../src/Makefile.global:768: recipe for target 'url_curl.o' failed ```
[#160659122] Co-authored-by: Goutam Tadi <[email protected]> Co-authored-by: Fei Yang <[email protected]> Co-authored-by: Xin Zhang <[email protected]>
The unix python module had grown support for a lot of commands which are no longer used, this removes all unused functions.
Commenting out a block of code with a multiline string is undoubtedly a questionable practice. Remove.
The tests, and the testharness code, has seemingly been unused for quite some time. Remove all the dead code and the invocations of testUtils functions in which were all noops as the global variable written to hadn't been initialized.
The ability to inject a pause for the FTS prober was added in 2011 in order to test the resolution to a bug in the FTS prober. There is no test that currently use this, and seemingly there never was. Further, the comment is incorrect which adds to the suspicion that this has been unused for quite some time so remove.
This was never actually implemented, and thus never used.
A cached query planned statement contains information that is freed after the first execution of a function. The second execution used the cached planned statement to populate the execution state using a freed pointer and throws a segmentation fault. To resolve, we copy the contents out of the planned statement rather than copying a pointer to the planned statement, so that when our executor state gets freed, it does not also free the cached data. Note: `numSelectorsPerScanId` is the problematic property, and is only used for partition tables. Co-authored-by: David Kimura <[email protected]> Co-authored-by: Taylor Vesely <[email protected]>
* Modify SQL references for replicated tables * Changes for edits and review comments
…D BY. Allow specifying multiple UNIQUE constraints in the CREATE TABLE command, or PRIMARY KEY and UNIQUE constraints, as long as they are all compatible with the DISTRIBUTED BY clause. Don't require the constraint's key columns to be a left-subset of the DISTRIBUTED BY clause. It still needs to be a subset, but any subset will do. Likewise adjust the rules for deriving the distribution key from the constraints, if no DISTRIBUTED BY was given, to choose the largest common subset across all constraints. Discussion: https://groups.google.com/a/greenplum.org/d/msg/gpdb-dev/OJzz8I6WuVI/tMB8GcXdAgAJ
The return value from AllocateDir() was not checked. That's not mandatory, if you immediately pass the return value to ReadDir(), as explained in the comment in ReadDir(), but you can't rely on that if you do anything in between the calls that might change errno. We were requesting a checkpoint, and doing many other things in between. Also, don't try to open the directory in the mirror until after the checkpoint. Before the checkpoint, the directory might not exist in the mirror yet. (I don't think there's any guarantee that it will exist after requesting the checkpoint either, but we're taking chances with that here anyway. At least it makes it a lot more likely.) I think this explains the occasional errors like: ERROR: could not open directory "/tmp/build/e18b2f02/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror1/demoDataDir0/base/513914": Success (fd.c:2207) that we've been seeing on the pipeline. Or at least, if this doesn't fix them, we should now get a more correct error message than "Success".
Co-authored-by: Asim R P <[email protected]> Co-authored-by: Jacob Champion <[email protected]>
The AoHeaderCheckErrorStr string is a statically allocated buffer, so we must always use snprintf() when writing to it to reduce the risk of an out-of-bounds write. This also moves the definition of the error string to the top of the file to match project style. Also remove unnecessary assertions on snprintf(); if we consider it an error that message is truncated, then we should be using a datastructure that supports being extended. Reviewed-by: Jacob Champion <[email protected]> Reviewed-by: Joao Pereira <[email protected]>
Distribution key joins on pathkeys where the parser invokes an implicit cast via a RelabelType node would introduce a redistribution motion due to the join not being recognized as a colocated equijoin. As an example, consider the following relation create table test (a varchar(10), b varchar(10)) distributed by (a); The below query will introduce an implicit cast on a.a and b.a to TEXT by the parser. This means that the pathkey EquivalenceClass is on VARCHAR but the RestrictInfo ECs are for TEXT. select * from test a, test b where a.a=b.a; Fix by following the EC members to their basetypes to allow for colocated equijoins in these cases. Affected testfiles updated, and one affected testcase slightly tweaked to ignore less and generate less uninteresting output in the .out file. Reviewed-by: Jesse Zhang <[email protected]>
It's confusing to see whitespace differences after a failure, since they weren't the diffs that *caused* the failure. Suppress them.
When pg_dump or pg_dumpall dumps a database that has gp_default_storage_options GUC set to something other than the default settings, restoring the tables in that database immediately after connecting to it would possibly create the table with the wrong storage parameters. To prevent this, we make sure the gp_default_storage_options GUC is set to built-in default values before restoring tables. Co-authored-by: Jamie McAtamney <[email protected]>
`hashDatum` expects the incoming oid for the array types to be ANYARRAYOID, else it will flag them as unsupported. While performing the MCV calcuation on the array type columns, we were passing the specific array oid, i.e character array oid for char array type column, due to which `hashdatum` was marking them as not supported. However, we should pass ANYARRAYOID if the column is an array type. This commit fixes the issue. Also, relevant test cases are added.
Commit 4eb65a53dea added column numsegments to catalog table gp_distribution_policy but forgot to bump the catalog. We've since made smaller changes to the catalog in commits e628194e49d, 009218954e4, and c09cc2d80fa . 🤷♀️ Fix that now.
…ion. The new window function executor doesn't require a combine function anymore.
…tions. If an aggregate, with transition type 'internal', is missing the serial or deserial function, it cannot be used in partial aggregation. In principle, it could still be done within a single process, but GPDB doesn't create such plans, currently, so in practice the combine function is useless without serial and deserial functions. There is a check for this in the executor, but we better catch this at planning time. All of the built-in aggregates have both combine and serial/deserial functions, or neither, so this is only a problem for user-defined aggregates.
This patch converts "(NOT (OR A B))" to "(AND (NOT A) (NOT B))". As a result, it would enable pull-up for sublink of type "NOT OR". For instance, for the query below: ``` select * from a where NOT ( a.i in (select b.i from b) OR a.i in (select c.i from c) ); ``` Previously, the scan of b and c would be two separate SubPlans: ``` QUERY PLAN ------------------------------------------------------------------------- Gather Motion 3:1 (slice3; segments: 3) -> Seq Scan on a Filter: ((NOT (hashed SubPlan 1)) AND (NOT (hashed SubPlan 2))) SubPlan 1 (slice3; segments: 3) -> Materialize -> Broadcast Motion 3:3 (slice1; segments: 3) -> Seq Scan on b SubPlan 2 (slice3; segments: 3) -> Materialize -> Broadcast Motion 3:3 (slice2; segments: 3) -> Seq Scan on c Optimizer: legacy query optimizer ``` And now the scan of b and c would be converted into semi joins: ``` QUERY PLAN --------------------------------------------------------------------- Gather Motion 3:1 (slice3; segments: 3) -> Hash Left Anti Semi (Not-In) Join Hash Cond: (a.i = c.i) -> Hash Left Anti Semi (Not-In) Join Hash Cond: (a.i = b.i) -> Seq Scan on a -> Hash -> Broadcast Motion 3:3 (slice1; segments: 3) -> Seq Scan on b -> Hash -> Broadcast Motion 3:3 (slice2; segments: 3) -> Seq Scan on c Optimizer: legacy query optimizer (13 rows) ``` Authored-by: Richard Guo <[email protected]> Authored-by: Alexandra Wang <[email protected]>
In previous commit 4eb65a5, we allow table distributed on subsets of all the segments. For the corner case of tables only distributed on 1 segments, we might generate 1:1 motions. We reduce 1:1 motions in this commit to improve performance. Co-authored-by: Zhenghua Lyu <[email protected]> Co-authored-by: Shujie Zhang <[email protected]>
…lterObjectRename_internal() function (#6078) Reviewed by Heikki, Daniel.
…th SimpleLruDoesPhysicalPageExist() (#6094) Reviewed by Heikki.
The --enable-orafce and --enable-gphdfs were both described as "do not build X", probably from copy/pasting another PGAC_OPTION_BOOL which defaults to true and thus becomes --disable-X. Reviewed-by: Heikki Linnakangas <[email protected]>
Extensions with custom plan nodes might like to use these in their EXPLAIN output. Hadi Moshayedi Discussion: https://postgr.es/m/CA+_kT_dU-rHCN0u6pjA6bN5CZniMfD=-wVqPY4QLrKUY_uJq5w@mail.gmail.com
This adds support for the structured formats in EXPLAIN (format ..) to the MemAccounting code, which previously only supported string output. The walker code is updated to take the ExplainState and make decisions on output format rather then just the take the buffer and assume text output format. As the PrettyPrinting debug function was using the ToString walker, it was removed in this commit since there are no callsites actually using it. The unittest for PrettyPrint was also removed, and the unittest for ToString updated to match the new code. Also tidies up the header includes to be in alphabetical order as per how the code is usually structured. Co-authored-by: Jimmy Yih <[email protected]>
The lock TablespaceHashLock was added during 5.0 development in commit 677c3eaca06. With the removal of filespace this lock effectively became dead code. This is made more obvious by commit fe12fd8c6c6, which removed the vestigial references to this lock. I noticed this while glancing over lwlock.h as part of reviewing #6057 .
There used to be an unconditional call to detoast the value earlier in the function, as hinted by the comments, but that was removed in PostgreSQL 9.4 by commit b006f4ddb9. We need to deal with toasted values here now. Fixes github issue #6081.
…ion. This has been broken since commit af7914c6627bcf0b0ca614e9ce95d3f8056602bf, which added the EXPLAIN (TIMING) option. Although that commit included updates to auto_explain, they evidently weren't tested very carefully, because the code failed to print node timings even when it should, due to failure to set es.timing in the ExplainState struct. Reported off-list by Neelakanth Nadgir of Salesforce. In passing, clean up the documentation for auto_explain's options a little bit, including re-ordering them into what seems to me a more logical order.
When running EXPLAIN ANALYZE queries, the showstat context was being populated twice which led to an assertion failure. Fix by adding a flag to the struct to signal that the stats gathering has already been performed. Also add the Greenplum Executor end statistics to the printing and properly set CDB instrumentation. Closes #5906 Co-authored-by: Daniel Gustafsson <[email protected]>
This really should've been included in the last commit, but I missed adding it before pushing..
I noticed this after committing 7581b5370d7, that there's a gap between ErrorLogLock and SessionStateLock in the main LWLock array. After a bit of digging, seems the 7th (missing) lock was intended for FirstWorkfileMgrLock, which got re-arranged to after the individual Greenplum locks, but we forgot to squeeze out the gap while resolving conflicts during the merge with Postgres 9.4. Eliminate that gap now.
When follow the building guide to build gp in centos7, meet some problems. * Remove pysql from develop-dependencies.txt since it would meet an error "install_script 'pysql_w32_postinst.py' not found in scripts" when execute "pip install pysql". * Use absolute-path to rename cmake3 to cmake. * Run ldconfig after build orca to avoid "configure: error: Your ORCA version is expected to be 2.67.XXX" error Signed-off-by: zhanggang <[email protected]>
Two bugs: 1. NULLs were not hashed correctly. 2. The code to fetch the attribute's datatype was wrong. It fetched the datatype of the table's nth attribute, instead of the nth distribution key column's. Both of these would lead to false report of wrong distribution, or errors. Reviewed-by: Paul Guo <[email protected]>
Run pgindent, and some small manual cleanup on top of it.
This code is concerned with partitioning, not with distribution across segments, so there's no need to rely on the "cdb" hashing. The cdb hashing only works for a limited set of built-in datatypes, so this allows merging leaf stats for any datatype that has a hash opclass defined. One small downside of this is that there are actually two built-in datatypes that have cdbhash support: money and varbit. So we leaf statistics merging no longer works for them. That's an acceptable loss, those datatypes should hardly be used in production, anyway. Another little thing that caused expected output changes was that the new code makes a different choice on which MCVs to keep, when there is a tie between multiple values. The choice is arbitrary, and the new one is as good as the old one.
Use built-in hash_any instead. The "match" function uses equal() to compare the Consts, which compares the raw bytes instead of doing a datatype-aware comparison, so we might as well hash the raw bytes in the hash function, too. While we're at it, refactor the predicate testing code a bit. Move the functions related to the "possible values" computation to a separate file, and clean up and refactor the code for readability.
Now that there are no callers of hashDatum/hashNullDatum left outside cdbhash.c, we can simplify it.
* topifying data types doc * Doc: Breaking data type topics into separate files * topifying data types doc * Doc: Breaking data type topics into separate files * Fix ditamap and remove duplicate content
…6057) Also delete the unittest for set_gp_replication_config Remove GpReplicationConfigFileLock Co-authored-by: Jesse Zhang <[email protected]> Co-authored-by: Shaoqi Bai <[email protected]>
It was incorrectly checking *isNull, which is an output from the function. As a result, reshuffling failed to move rows that had NULLs in distribution key columns. Add test case.
A collection of typo fixes in comments that I found sitting around in a local patch.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.