merge cloudera impala branchs #2

yuzzjj · 2017-07-28T01:00:30Z

merge

This commit fixes an issue where dropping a table that is not loaded correctly (throws TableLoadingException) generates an access event that doesn't use a fully qualified table name. Change-Id: Icd63f7e4accc7fda9719e13059fa8d432981618a Reviewed-on: http://gerrit.cloudera.org:8080/6879 Reviewed-by: Alex Behm <[email protected]> Tested-by: Impala Public Jenkins

The following Hadoop metrics have been added to the /metrics page: hedgedReadOps - the number of hedged reads that have occurred hedgedReadOpsWin - the number of times the hedged read returned faster than the original read The metrics will be updated only when --use_hdfs_pread is set to 'true'. This change depends on the following new commit to HDFS: apache/hadoop@8c81a16 Testing: Not adding tests since it requires some custom hadoop configuration. Tested manually by setting the configurations and verifying that the metrics work. Change-Id: Id4a5d396abb3373d352ad2df8c2272db018114da Reviewed-on: http://gerrit.cloudera.org:8080/6886 Reviewed-by: Matthew Jacobs <[email protected]> Reviewed-by: Lars Volker <[email protected]> Tested-by: Impala Public Jenkins

Allow users to keep a longer history of queries if desired. I personally find it useful to keep a long history of queries to reference and want to bump this up to a very large value, but keep the default reasonable. Also change the config loader to not freak out over unknown parameters so as not to break for users that end up with new options set running on older shells. Testing: Created .impalarc as follows, now getting more history saved. Put broken things in .impalarc and make sure they are logged as warnings. [impala] history_max=1000 Change-Id: Iaf65bbecb8fd7f1105aac62b6745d6125a603d7f Reviewed-on: http://gerrit.cloudera.org:8080/6335 Reviewed-by: Michael Brown <[email protected]> Tested-by: Impala Public Jenkins

A memory intensive UDF test takes a while to completely finish and for the memory in Impala to be completely freed. This caused a problem in ASAN builds (and potentially in normal builds) because we would start the next test right away, before the memory is freed. We fix the issue by checking that all fragments finish executing before starting the next test. Testing: - Ran a private ASAN build which passed. Change-Id: I0555b5327945c522f70f449caa1214ee0bfd84fe Reviewed-on: http://gerrit.cloudera.org:8080/6893 Reviewed-by: Alex Behm <[email protected]> Reviewed-by: Michael Ho <[email protected]> Tested-by: Impala Public Jenkins

This gflags patch adds DEFINE_int32_hidden() etc. macros, which suppress flags from appearing in /varz, --help and other flag enumerations. Our toolchain glog is statically linked against gflags, and therefore had to be rebuilt, however its version number did not change. You will likely need to do the following: rm -rf ${IMPALA_TOOLCHAIN_DIR}/glog-0.3.4-p2/ before running bin/bootstrap_toolchain.py, otherwise building Impala may fail with a linking error. Change-Id: Ibc09a750879a8eae8b3549b9438241cb7c4448ed Reviewed-on: http://gerrit.cloudera.org:8080/6889 Reviewed-by: Lars Volker <[email protected]> Reviewed-by: Matthew Jacobs <[email protected]> Tested-by: Impala Public Jenkins

This change switches to a new Breakpad version, which includes fixes for Breakpad bugs #681 and #728. The toolchain change was reviewed here: https://gerrit.cloudera.org/6866 The change also undoes the workaround introduced in IMPALA-3794. In addition to running test_breakpad.py in a loop for a while, I tested Then I verified that the test fails with the old toolchain version (88e5b2) and works with the new one (ffe3e4). To test #728 I added a sleep() call before SendContinueSignalToChild() and then killed the parent process, manually observing that the child would die, too. Change-Id: Ic541ccd565f2bb51f68c085747fc47ae8c905d19 Reviewed-on: http://gerrit.cloudera.org:8080/6883 Reviewed-by: Lars Volker <[email protected]> Tested-by: Impala Public Jenkins

The recent Kudu TIMESTAMP patch (IMPALA-5137) made an inadvertent change [1] to alltypeserror_tmp and alltypeserrornonulls_tmp, changing 'timestamp_col' from STRING to TIMESTAMP. This seems to cause failures on exhaustive jobs which run test_hdfs_scan_node_errors against all file-formats. I haven't been able to reproduce this failure myself, so cannot test whether this fixes the jobs that are failing, but this change to revert these tables seems warranted given they were changed inadvertently. 1: https://gerrit.cloudera.org/#/c/6526/11/testdata/datasets/functional/functional_schema_template.sql Change-Id: I533f1921662802ea6e076eefac973f50c014fcb5 Reviewed-on: http://gerrit.cloudera.org:8080/6891 Reviewed-by: Matthew Jacobs <[email protected]> Tested-by: Matthew Jacobs <[email protected]>

By default, Kudu assumes it has 80% of system memory which is far too high for the minicluster. This sets a mem limit of 2gb and lowers the limit of the block cache. These values were tested on a gerrit-verify-dryrun job as well as an exhaustive run. This patch also simplifies TestKuduMemLimits which was unnecessarily creating a large table during test execution. Change-Id: I7fd7e1cd9dc781aaa672a2c68c845cb57ec885d5 Reviewed-on: http://gerrit.cloudera.org:8080/6844 Reviewed-by: Todd Lipcon <[email protected]> Reviewed-by: Tim Armstrong <[email protected]> Tested-by: Impala Public Jenkins

This change builds on the support for reading and writing TIMESTAMP columns to Kudu tables (see [1]), adding support for pushing TIMESTAMP predicates to Kudu for scans. Binary predicates and IN list predicates are supported. Testing: Added some planner and EE tests to validate the behavior. 1: https://gerrit.cloudera.org/#/c/6526/ Change-Id: I08b6c8354a408e7beb94c1a135c23722977246ea Reviewed-on: http://gerrit.cloudera.org:8080/6789 Reviewed-by: Matthew Jacobs <[email protected]> Tested-by: Impala Public Jenkins

Adds support in DDL for timestamps in Kudu range partition syntax. For convenience, strings can be specified with or without explicit casts to TIMESTAMP. E.g. create table ts_ranges (ts timestamp primary key, i int) partition by range ( partition '2009-01-02 00:00:00' <= VALUES < '2009-01-03 00:00:00' ) stored as kudu Range bounds are converted to Kudu UNIXTIME_MICROS during analysis. Testing: Adds FE and EE tests. Change-Id: Iae409b6106c073b038940f0413ed9d5859daaeff Reviewed-on: http://gerrit.cloudera.org:8080/6849 Reviewed-by: Matthew Jacobs <[email protected]> Tested-by: Impala Public Jenkins

Non-deterministic exprs which evaluate as constant should not be used during HDFS partition pruning. We consider Exprs which have no SlotRefs as bound by default, and thus we end up trying to apply them indisrciminately. Constant propagation makes this situation easier to run into and the behavior is rather unexpected. The fix for now is to explicitly disallow non-deterministic Exprs in partition pruning. Change-Id: I91054c6bf017401242259a1eff5e859085285546 Reviewed-on: http://gerrit.cloudera.org:8080/6575 Reviewed-by: Alex Behm <[email protected]> Tested-by: Impala Public Jenkins

Change-Id: I17268bdb480230938f94559fe1eabe34ac2448b7 Reviewed-on: http://gerrit.cloudera.org:8080/5589 Reviewed-by: Jim Apple <[email protected]> Tested-by: Impala Public Jenkins

IMPALA-4166 introduced a bug by duplicating code that adds sort expressions. Upon re-analysis, this code would hit an IndexOutOfBoundsException. Change-Id: Ibebba29509ae7eaa691fe305500cda6bd41a179a Reviewed-on: http://gerrit.cloudera.org:8080/6921 Reviewed-by: Lars Volker <[email protected]> Tested-by: Impala Public Jenkins

Previously, updates to the query state in ClientRequestState were not immediately reflected in the query profile, potentially leading to the profile showing an incorrect state for an extended perioud during execution. In particular, queries were being shown in the 'CREATED' state long after they had started 'RUNNING'. The fix is to update the profile whenever the state is updated. Testing: - Extended existing hs2 tests and added a beeswax test to check for expected query states in the profile Change-Id: I952319b7308a24d4e2dff924199c0c771bce25b3 Reviewed-on: http://gerrit.cloudera.org:8080/6923 Reviewed-by: Dan Hecht <[email protected]> Reviewed-by: Thomas Tauber-Marshall <[email protected]> Tested-by: Impala Public Jenkins

The sortby() hint is superseded by the SORT BY SQL clause, which has been introduced in IMPALA-4166. This changes removes the hint. Change-Id: I83e1cd6fa7039035973676322deefbce00d3f594 Reviewed-on: http://gerrit.cloudera.org:8080/6885 Reviewed-by: Lars Volker <[email protected]> Tested-by: Impala Public Jenkins

Without this, buildall.sh -ninja fails to run the backend tests or runs them with the Makefiles that were created when buildall.sh was last run without the -ninja flag. Change-Id: Idb920dd4b08d8ef5fbc0bf1ea1b424a0c544e1db Reviewed-on: http://gerrit.cloudera.org:8080/6942 Reviewed-by: Tim Armstrong <[email protected]> Tested-by: Impala Public Jenkins

The buffers contain the Parquet DataPages, which need to be attached to the row batch if the rows point to var-len data stored directly in the page. Otherwise the buffers can be discarded once the values in the page have been materialized. This reduces the amount of memory transferred between threads, which is a known TCMalloc anti-pattern. It also allows us to free memory earlier, which may help reduce memory consumption slightly. Also fix a latent bug I noticed where needs_conversion_ is not always initialised in the constructor. Testing Ran exhaustive build. Most of the Parquet tests use compressed Parquet, which should exercise this code path. Change-Id: I2dbd749f43078b222ff8e1ddcec840986c466de6 Reviewed-on: http://gerrit.cloudera.org:8080/6876 Reviewed-by: Tim Armstrong <[email protected]> Tested-by: Impala Public Jenkins

Misc changes to improve usability of the profiles. * Separate out detailed BufferPool metrics into a "Buffer pool" sub-profile. * Only create the limit counter if there is a limit * Show BufferPool using in query MemTracker (it was accidentally disabled before because there was no query-level profile). * Reduce clutter in MemTracker dump by only showing buffer pool reservation, not usage (the usage was misleading anyway because it didn't include child usage). * Remove TotalUnpinnedBytes, which had limited value - WriteIoBytes and PeakUnpinnedBytes can answer most of the same questions - i.e. did it unpin any pages, and how many did it need to write to disk. * Add buffer pool metrics to /memz (if buffer pool is enabled) and reorder /memz so more useful information is up the top. Change-Id: I34b7f4d94c3d396ac89026c7559d6b2c6e02697c Reviewed-on: http://gerrit.cloudera.org:8080/6690 Reviewed-by: Tim Armstrong <[email protected]> Tested-by: Impala Public Jenkins

Add CLUSTERED hint. Update hint syntax in INSERT topic. Also modernize the hint syntax as shown under INSERT to include the -- and /* */ formats also. List the [] style last since it is the least-preferred option. Switch to preferring /* */ syntax for hints instead of using the [ ] notation by default. Finally, take out references to the SORTBY hint because it didn't actually make it in. Intent for future is to have a way to get this behavior without using a hint. Change-Id: Id3c1da9a87ace361b096fa73d8504b2f54e75bed Reviewed-on: http://gerrit.cloudera.org:8080/5655 Reviewed-by: John Russell <[email protected]> Tested-by: Impala Public Jenkins

…n Parquet" Reverting IMPALA-2716 as SparkSQL does not agree with the approach taken. More details can be found at: https://issues.apache.org/jira/browse/SPARK-12297 Change-Id: Ic66de277c622748540c1b9969152c2cabed1f3bd Reviewed-on: http://gerrit.cloudera.org:8080/6896 Reviewed-by: Dan Hecht <[email protected]> Tested-by: Impala Public Jenkins

The assertion was incorrect and racy - it is ok if the write error wins the race with the Unpin() calls, causing them to fail. Change-Id: I023193b9ad6c6ac0ee114ad77ddf04d7d7185809 Reviewed-on: http://gerrit.cloudera.org:8080/6953 Reviewed-by: Henry Robinson <[email protected]> Reviewed-by: Dan Hecht <[email protected]> Tested-by: Impala Public Jenkins

We use the new libHDFS API hdfsGetLastExceptionRootCause() to return the last seen HDFS error on that thread. This patch depends on the recent HDFS commit: apache/hadoop@fda86ef Testing: A test has been added which puts HDFS in safe mode and then verifies that we see a 255 error with the root cause. Change-Id: I181e316ed63b70b94d4f7a7557d398a931bb171d Reviewed-on: http://gerrit.cloudera.org:8080/6894 Tested-by: Impala Public Jenkins Reviewed-by: Alex Behm <[email protected]>

Start with placeholder for 2.9 new features topic. Initially just point to the changelog file. Change-Id: I1f6cabc2427daf1243bd69dbed295c6923c4091b Reviewed-on: http://gerrit.cloudera.org:8080/6954 Reviewed-by: Michael Brown <[email protected]> Tested-by: Impala Public Jenkins

UBSan reports "runtime error: load of value 32, which is not a valid value for type 'bool'". Change-Id: I0ddc496019941048b3e0775606fa5e8e3f9c075a Reviewed-on: http://gerrit.cloudera.org:8080/6937 Reviewed-by: Tim Armstrong <[email protected]> Tested-by: Impala Public Jenkins

Change-Id: I636a6f2dcd0555ab9b46304e3a7298c598a511da Reviewed-on: http://gerrit.cloudera.org:8080/6964 Reviewed-by: Michael Brown <[email protected]> Tested-by: Impala Public Jenkins

The writeup for sortby() was removed during the gerrit review process. This bullet in the New Features list was left behind, and is now being removed. Change-Id: Ib0c32df2dcfbde47a16e4692e5953b31cb144bcc Reviewed-on: http://gerrit.cloudera.org:8080/6965 Reviewed-by: Alex Behm <[email protected]> Tested-by: Impala Public Jenkins

Syntax: <tableref> TABLESAMPLE SYSTEM(<number>) [REPEATABLE(<number>)] The first number specifies the percent of table bytes to sample. The second number specifies the random seed to use. The sampling is coarse-grained. Impala keeps randomly adding files to the sample until at least the desired percentage of file bytes have been reached. Examples: SELECT * FROM t TABLESAMPLE SYSTEM(10) SELECT * FROM t TABLESAMPLE SYSTEM(50) REPEATABLE(1234) Testing: - Added parser, analyser, planner, and end-to-end tests - Private core/hdfs run passed Change-Id: Ief112cfb1e4983c5d94c08696dc83da9ccf43f70 Reviewed-on: http://gerrit.cloudera.org:8080/6868 Reviewed-by: Alex Behm <[email protected]> Tested-by: Impala Public Jenkins

Holding client_request_state_map_lock_ and CRS::lock_ together in certain paths could potentially block the impalad from registering new queries. The most common occurrence of this is while loading the webpage of a query while the query planning is still in progress. Since we hold the CRS::lock_ during planning, it blocks the web page from loading which inturn blocks incoming queries by holding client_request_state_map_lock_. This patch makes client_request_state_map_lock_ a terminal lock so that we don't have interleaving locking with CRS::lock_. Testing: Tested it locally by adding a long sleep in JniFrontend.createExecRequest() and still was able to refresh the web UI and run parallel queries. Also added a custom cluster test that does the same sequence of actions by injecting a metadata loading pause. Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026 Reviewed-on: http://gerrit.cloudera.org:8080/6707 Reviewed-by: Bharath Vissapragada <[email protected]> Tested-by: Impala Public Jenkins

Instead of default 0 switching to 1 for verbose, now the default is 1 (named 'standard') and extended is 2. Change-Id: Ib18cfbfa35a4e3b324e6744da62de2fad86c1e67 Reviewed-on: http://gerrit.cloudera.org:8080/6966 Reviewed-by: Alex Behm <[email protected]> Tested-by: Impala Public Jenkins

Kudu tables did not treat some table properties correctly. Change-Id: I69fa661419897f2aab4632015a147b784a6e7009 Reviewed-on: http://gerrit.cloudera.org:8080/7454 Reviewed-by: Matthew Jacobs <[email protected]> Tested-by: Impala Public Jenkins

Disables a test that seemed to get flaky recently, perhaps related to testing with Java 8 or maybe even changes in YARN (which get used by RequestPoolService). Since we're not changing this code right now, let's disable this test to unblock builds. Keeping the JIRA open to track a better solution. Change-Id: I616961457cd48d31d618c8b58f5279b89d3cdcd6 Reviewed-on: http://gerrit.cloudera.org:8080/7466 Reviewed-by: Matthew Jacobs <[email protected]> Tested-by: Impala Public Jenkins

Bugs: - FunctionCallExpr's toSql() doesn't include IGNORE NULLS if present causing view definitions to break and leading to incorrect results. - FunctionCallExpr's clone() implementation doesn't carry forward IGNORE NULLS option if present. One case that breaks with this is querying views containing analytic exprs causing wrong plans. Fixed both the bugs and added a test that can reliably reproduce this. Change-Id: I723897886c95763c3f29d6f24c4d9e7d43898ade Reviewed-on: http://gerrit.cloudera.org:8080/7416 Reviewed-by: Bharath Vissapragada <[email protected]> Tested-by: Impala Public Jenkins

Doing an O(n) consistency check every time the read or write page was advanced results in O(n^2) overall runtime. The fix is to separate the O(1) and O(n) checks and only do the O(n) checks if: * The function does an an O(n) pass over the pages anyway (e.g. PinStream()) * The function is called only once per read or write pass over the stream. This should make the cost of the checks O(n) (if we make the reasonable assumption that PrepareForWrite(), PrepareForRead(), PinStream() and UnpinStream() are called a bounded number of times per stream). Testing: Ran BufferedTupleStreamV2Test. Change-Id: I8b380fcd0568cb73b36a490954bcd316db969ede Reviewed-on: http://gerrit.cloudera.org:8080/7459 Reviewed-by: Tim Armstrong <[email protected]> Tested-by: Impala Public Jenkins

Fix to populate the non-default query options set by planner in the runtime profile. Added a corresponding test case. Change-Id: I08e9dc2bebb83101976bbbd903ee48c5068dbaab Reviewed-on: http://gerrit.cloudera.org:8080/7419 Reviewed-by: Matthew Jacobs <[email protected]> Tested-by: Impala Public Jenkins

Currently, creation of a Status object (non-OK and non-EXPECTED) prints the stack trace to the log. Fetching the stack trace takes a large chunk of CPU time close to 130ms and results in a significant perf hit when encountered on hot paths. Five such hot paths were identified and the following changes were made to fix it: 1. In ImpalaServer::GetExecSummary(), create Status() without holding the query_log_lock_. 2, 3 and 4. In impala::DeserializeThriftMsg<>(), PartitionedAggregationNode::CodegenUpdateTuple() and HdfsScanner::CodegenWriteCompleteTuple, use Status::Expected where appropriate. 5. In Status::MemLimitExceeded(), create Status object without printing stacktrace Change-Id: Ief083f558fba587381aa7fe8f99da279da02f1f2 Reviewed-on: http://gerrit.cloudera.org:8080/7449 Reviewed-by: Matthew Jacobs <[email protected]> Tested-by: Impala Public Jenkins

The change is mostly mechanical - added Status returns where need. In one place I restructured the the logic around 'current_encoding_' for Parquet to allow a cleaner solution to the dropped status from FinalizeCurrentPage() call in ProcessValue(): after the restructuring the call was no longer needed. 'current_encoding_' was overloaded to represent both the encoding of the current page and the preferred encoding for subsequent pages. Testing: Ran exhaustive build. Change-Id: I753d352c640faf5eaef650cd743e53de53761431 Reviewed-on: http://gerrit.cloudera.org:8080/7372 Reviewed-by: Tim Armstrong <[email protected]> Tested-by: Impala Public Jenkins

Privileges granted to a role assigned to a db/table whose name contains upper case characters can disappear after a few seconds. A privilege is inserted into the catalogObjectCache using a key that uses the db/table name. The key gets converted to a lower case before inserting. Privilege name returned by sentryProxy is always lower case, which might not match the privilegeName built in the catalog. This triggers an update of the catalog object followed by a removal of the old object. Since they both use the same key in lower case it ends up deleting the newly updated object. This change also adds a new catalogd startup option (sentry_catalog_polling_frequency) to configure the frequency at which catalogd polls the sentry service to update any policy changes. The default value is 60 seconds. Test: Added a test which adds select privileges to 3 tables and dbs specified in lower case, upper case and mixed case. The test verifies that the privileges on the 3 tables do not disappear on a sentry update. Change-Id: Ide3dfa601fcf77f5acc6adce9bea443aea600901 Reviewed-on: http://gerrit.cloudera.org:8080/7332 Reviewed-by: Matthew Jacobs <[email protected]> Tested-by: Impala Public Jenkins

Creating Kudu clients is very expensive as each will fetch metadata from the Kudu master, so we should minimize the number of Kudu clients that get created. This patch stores a map from Kudu master addressed to Kudu clients in KuduUtil to be used across the FE and catalog. Another patch has already addressed the BE. Future work will consider providing a way to invalidate the stored Kudu clients in case something goes wrong (IMPALA-5685) This relies on two changes on the Kudu side: one that clears non-covered range entries from the client's cache on table open (d07ecd6ded01201c912d2e336611a6a941f48d98), and one that automatically refreshes auth tokens when they expire (603c1578c78c0377ffafdd9c427ebfd8a206bda3). This patch disables some tests that no longer work as they relied on Kudu metadata loading operations timing out, but since we're reusing clients the metadata is already loaded when the test is run. Testing: - Ran a stress test on a 10 node cluster: scan of a small Kudu table, 1000 concurrent queries, load on the Kudu master was reduced signficantly, from ~50% cpu to ~5%. (with the BE changes included) - Ran the Kudu e2e tests. - Manually ran a test with concurrent INSERTs and 'ALTER TABLE ADD PARTITION' (which is affected by the Kudu side change mentiond above) and verified correctness. Change-Id: I9b0b346f37ee43f7f0eefe34a093eddbbdcf2a5e Reviewed-on: http://gerrit.cloudera.org:8080/6898 Reviewed-by: Thomas Tauber-Marshall <[email protected]> Tested-by: Impala Public Jenkins

Impala currently supports total sorts (the entire set of data is sorted) and top-n sorts (only the highest/lowest n elements are sorted). This patch adds the ability to do partial sorts, where the data is divided up into some number of subsets, each of which is sorted individually. It accomplishes this by adding a new exec node, PartialSortNode. When PartialSortNode::GetNext() is called, it retrieves input up to the query memory limit, uses the existing Sorter class to sort it, and outputs it. This is faster than a total sort with SortNode as it avoids the need to spill if the input is larger than the memory limit. Future work will look into setting a more restrictive memory limit on the PartialSortNode. (IMPALA-5669) In the planner, the SortNode plan node is used, with an enum value indicating if it is a total or partial sort. This also adds a new counter 'RunSize' to the runtime profile which tracks the min, max, and avg size of the generated runs, in tuples. As a first use case, partial sort is used where a total sort was used previously for inserts/upserts into Kudu tables only. Future work can extend this to other table sinks. (IMPALA-5649) Testing: - E2E test with a large INSERT into a Kudu table with a mem limit. Checks that no spills occurred. - Updated planner tests. - Existing E2E tests and stress test verify correctness of INSERT. - Perf tests on the 10 node cluster: inserting tpch_100.lineitem into a Kudu table with mem_limit=3gb: Previously: 5 runs are spilled, sort took 7m33s Now: no spills, sort takes 6m19s, for ~18% speedup Change-Id: Ieec2a15a0cc5240b1c13682067ab64670d1e0a38 Reviewed-on: http://gerrit.cloudera.org:8080/7267 Reviewed-by: Thomas Tauber-Marshall <[email protected]> Tested-by: Impala Public Jenkins

FLAGS_be_service_threads does nothing, and can be removed. Backend Thrift servers do not use a fix-sized thread pool, instead using one thread-per-connection. Change-Id: I10e48014f24eebd22251bac4734bc3c90dee47c0 Reviewed-on: http://gerrit.cloudera.org:8080/7483 Reviewed-by: Matthew Jacobs <[email protected]> Tested-by: Impala Public Jenkins

Two tests (LongReverse and the base64 tests in StringFunctions) run their tests over all lengths from 0..{{some length}}. Both take several minutes to complete. This adds a lot of runtime for not much more confidence. Pick a set of 'interesting' (including powers-of-two, prime numbers, edge-cases) lengths to run them over instead. Test time is reduced by >150s on my desktop machine in debug mode. Change-Id: I2962115734aff8dcaae0cc405274765105e31572 Reviewed-on: http://gerrit.cloudera.org:8080/7474 Reviewed-by: Henry Robinson <[email protected]> Tested-by: Impala Public Jenkins

In a recent patch (IMPALA-5036) a bug was introduced where a count(*) query with a group by a string partition column returned incorrect results. Data was being written into the tuple at an incorrect offset. Testing: - Added an end to end test where we are selecting from a table partitioned by string. Change-Id: I225547574c2b2259ca81cb642d082e151f3bed6b Reviewed-on: http://gerrit.cloudera.org:8080/7481 Reviewed-by: Tim Armstrong <[email protected]> Tested-by: Impala Public Jenkins

While working on another patch, I noticed that a lot of includes and forward declarations were spurious and possibly the result of bit rot. This patch removes them and hopefully improves compile time a little. Testing: Made sure that Impala and the BE tests compile successfully. Change-Id: Id0afed224fad6a00698701487b51506d414f83ac Reviewed-on: http://gerrit.cloudera.org:8080/7482 Reviewed-by: Sailesh Mukil <[email protected]> Tested-by: Impala Public Jenkins

Change allocation pattern for Codec objects in RowBatch to be stack-allocated. Make c'tors and Init() methods of codec implementations publicly visible in order to do so. Fix bit-rotting bug in row-batch-serialize-benchmark that made it abort on start up. Change-Id: I6641f4a08bd2711c4f4515ab29a6e5418cbd5f51 Reviewed-on: http://gerrit.cloudera.org:8080/7478 Reviewed-by: Henry Robinson <[email protected]> Tested-by: Impala Public Jenkins

Formerly the project used SVN and the instructions were posted on a public page. Now it's at github and the user has to get the doc source from the project to view it. Therefore I'm changing both the URL and the descriptive text of the link. Change-Id: I668dc3739a9c95c788408bfc73480793ae5ba4c3 Reviewed-on: http://gerrit.cloudera.org:8080/7447 Reviewed-by: Tim Armstrong <[email protected]> Tested-by: Impala Public Jenkins

For 2.9, I believe the only new reserved keyword is TABLESAMPLE from IMPALA-5309. Based on commit history from: https://github.com/apache/incubator-impala/commits/master/fe/src/main/jflex/sql-scanner.flex Change-Id: If4a340a033ff3f529061e48c4a5558b1ad1637ef Reviewed-on: http://gerrit.cloudera.org:8080/7452 Reviewed-by: Michael Brown <[email protected]> Tested-by: Impala Public Jenkins

Change-Id: I0b9414c21faca00e4a64a35888bd50caac94318f Reviewed-on: http://gerrit.cloudera.org:8080/7486 Reviewed-by: Thomas Tauber-Marshall <[email protected]> Tested-by: Impala Public Jenkins

Change-Id: I127b7806feca810503ba3dd000a8e972835e715a Reviewed-on: http://gerrit.cloudera.org:8080/7487 Reviewed-by: Greg Rahn <[email protected]> Reviewed-by: Sailesh Mukil <[email protected]> Tested-by: Impala Public Jenkins

If the coordinator, in UpdateBackendExecStatus(), receives a report that includes a TInsertExecStatus, it will call UpdateInsertExecStatus() which takes the coordinator-wide lock. Avoid doing this for fragment instances that would only send an empty TInsertExecStatus (including instances that belong to SELECT queries, not DML queries). Future changes should fix the locking around the UpdateBackendExecStatus() path to remove dependencies on Coordinator::lock_, but this fix is simple and addresses one point of needless contention. Change-Id: I314dd8d96922d273c6329266970d249ec8c5c608 Reviewed-on: http://gerrit.cloudera.org:8080/7457 Reviewed-by: Henry Robinson <[email protected]> Tested-by: Impala Public Jenkins

Remove untested / unused mini-impala-cluster binary. Change-Id: I677314fc1a998dffa9120c016bfcf761b4e39f05 Reviewed-on: http://gerrit.cloudera.org:8080/7488 Reviewed-by: Matthew Jacobs <[email protected]> Tested-by: Impala Public Jenkins

The fix for IMPALA-1654 has broken the compute incremental stats child query generation logic for general partition expressions. This commit fixes it and also adds new queries to fix the test gap. These tests fail consistently without the patch. Change-Id: I227fc06f580eb9174f60ad0f515a3641cec19268 Reviewed-on: http://gerrit.cloudera.org:8080/7379 Reviewed-by: Bharath Vissapragada <[email protected]> Tested-by: Impala Public Jenkins

Queries with a null-aware anti-join joining on a large number of NULLs can take a long time to cancel if threads are stuck in PartitionedHashJoinNode::EvaluateNullProbe(). This change adds the RETURN_IF_CANCELLED macro to the function. Testing: Added logs to PartitionedHashJoinNode::EvaluateNullProbe() and made sure that the function returns right away on cancellation. Change-Id: I0800754d4ad31cbadbdfadc630c640963f3f6053 Reviewed-on: http://gerrit.cloudera.org:8080/7393 Tested-by: Impala Public Jenkins Reviewed-by: Tim Armstrong <[email protected]>

IMPALA-4000 added basic authorization support for Kudu tables, but it had several limitations: * Only the ALL privilege level can be granted to Kudu tables. (Finer-grained levels such as only SELECT or only INSERT are not supported.) * Column level permissions on Kudu tables are not supported. * Only users with ALL privileges on SERVER may create external Kudu tables. This patch relaxes the restrictions to allow: * Allow column-level permissions * Allow fine grained privileges SELECT and INSERT for those statement types. DELETE/UPDATE/UPSERT privileges now require ALL privileges because Sentry will eventually get fine grained privilege actions, and at that point Impala should support the more specific actions (IMPALA-3840). The assumption is that the Kudu table authorization support is currently so limited that most users are not using this functionality yet, but this is a behavior change that needs to be clearly stated in the Impala release notes. Testing: Adds FE and EE tests. Change-Id: Ib12d2b32fa3e142e69bd8b0f24f53f9e5cbf7460 Reviewed-on: http://gerrit.cloudera.org:8080/7307 Reviewed-by: Matthew Jacobs <[email protected]> Tested-by: Impala Public Jenkins

Print the version info of each impalad that's used in a stress test run, sorted by host name. Testing done: $ tests/stress/concurrent_select.py [redacted cluster options] --tpcds-db null --max-queries 0 Cluster Impalad Version Info: host2.redacted: impalad version 2.10.0-SNAPSHOT RELEASE (build e862385) Built on Tue Jul 25 07:06:27 PDT 2017 host3.redacted: impalad version 2.10.0-SNAPSHOT RELEASE (build e862385) Built on Tue Jul 25 07:06:27 PDT 2017 host4.redacted: impalad version 2.10.0-SNAPSHOT RELEASE (build e862385) Built on Tue Jul 25 07:06:27 PDT 2017 host5.redacted: impalad version 2.10.0-SNAPSHOT RELEASE (build e862385) Built on Tue Jul 25 07:06:27 PDT 2017 host6.redacted: impalad version 2.10.0-SNAPSHOT RELEASE (build e862385) Built on Tue Jul 25 07:06:27 PDT 2017 2017-07-25 12:38:52,732 12793 Thread-1 INFO:cluster[691]:Finding impalad binary location ... Change-Id: Ie4b40783ddae6b1bfb2bb4e28c0e3bf97ab944c5 Reviewed-on: http://gerrit.cloudera.org:8080/7501 Reviewed-by: Michael Brown <[email protected]> Tested-by: Michael Brown <[email protected]>

Read the start date and time of the impalad, catalogd and statestored processes for the Debug Web UI. Uses the stat command on the /proc/<pid> directory and format the date with the date command to local time format. Change-Id: I05ae2f80835b1b0e4bc3b38731778ba0871338a4 Reviewed-on: http://gerrit.cloudera.org:8080/7363 Reviewed-by: Matthew Jacobs <[email protected]> Tested-by: Impala Public Jenkins

I ran the stress test binary search locally and it produced a slightly higher number for Q18 than the hardcoded value. This is enough to move it above one of the thresholds, so may reduce flakiness. Testing: I wasn't able to reproduce the flakiness locally, so can't confirm this fixes it. Change-Id: I1ffa969061a52730c5147d142dcd2e3cb3626590 Reviewed-on: http://gerrit.cloudera.org:8080/7512 Reviewed-by: Matthew Jacobs <[email protected]> Tested-by: Impala Public Jenkins

If $IMPALA_HOME ends with a /, the clean_cmake_files function in distcc_env.sh will emit a find command with a double // at the end for the cmake_modules directory, and since it contains the substring cmake, find will match and delete its contents. Fix is to use a whitelist of locations and filenames to look for, and delete only those. Testing: manually ran enable_distcc, observed that my files were still there. Change-Id: I8a6e34bedf8000aed9e2b0597cfe86f73222c6ed Reviewed-on: http://gerrit.cloudera.org:8080/7493 Reviewed-by: Tim Armstrong <[email protected]> Tested-by: Impala Public Jenkins

yuzzjj · 2017-07-28T01:01:22Z

could commit

dtsirogiannis and others added 30 commits May 12, 2017 19:39

Update thirdparty dependencies

01d9eac

IMPALA-3973: optional 3rd and 4th arguments for instr().

ac209cf

Change-Id: I17268bdb480230938f94559fe1eabe34ac2448b7 Reviewed-on: http://gerrit.cloudera.org:8080/5589 Reviewed-by: Jim Apple <[email protected]> Tested-by: Impala Public Jenkins

IMPALA-4803: [DOCS] 2.9 release notes placeholder

3e32a33

Change-Id: I636a6f2dcd0555ab9b46304e3a7298c598a511da Reviewed-on: http://gerrit.cloudera.org:8080/6964 Reviewed-by: Michael Brown <[email protected]> Tested-by: Impala Public Jenkins

Matthew Jacobs and others added 28 commits July 20, 2017 02:40

Bump Kudu version to 27854fd

bed85e1

Change-Id: I0b9414c21faca00e4a64a35888bd50caac94318f Reviewed-on: http://gerrit.cloudera.org:8080/7486 Reviewed-by: Thomas Tauber-Marshall <[email protected]> Tested-by: Impala Public Jenkins

[DOCS] Take out not-production-ready notice from ADLS page

f8491de

Change-Id: I127b7806feca810503ba3dd000a8e972835e715a Reviewed-on: http://gerrit.cloudera.org:8080/7487 Reviewed-by: Greg Rahn <[email protected]> Reviewed-by: Sailesh Mukil <[email protected]> Tested-by: Impala Public Jenkins

IMPALA-5709: Remove mini-impala-cluster

ac69517

Remove untested / unused mini-impala-cluster binary. Change-Id: I677314fc1a998dffa9120c016bfcf761b4e39f05 Reviewed-on: http://gerrit.cloudera.org:8080/7488 Reviewed-by: Matthew Jacobs <[email protected]> Tested-by: Impala Public Jenkins

yuzzjj merged commit e500350 into yuzzjj:cdh5-trunk Jul 28, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

merge cloudera impala branchs #2

merge cloudera impala branchs #2

yuzzjj commented Jul 28, 2017

yuzzjj commented Jul 28, 2017

merge cloudera impala branchs #2

merge cloudera impala branchs #2

Conversation

yuzzjj commented Jul 28, 2017

yuzzjj commented Jul 28, 2017