Skip to content

Commit

Permalink
[SPARK-29286][PYTHON][TESTS] Uses UTF-8 with 'replace' on errors at P…
Browse files Browse the repository at this point in the history
…ython testing script

### What changes were proposed in this pull request?

This PR proposes to let Python 2 uses UTF-8, instead of ASCII, with permissively replacing non-UDF-8 unicodes into unicode points in Python testing script.

### Why are the changes needed?

When Python 2 is used to run the Python testing script, with `decode(encoding='ascii')`, it fails whenever non-ascii codes are printed out.

### Does this PR introduce any user-facing change?

To dev, it will enable to support to print out non-ASCII characters.

### How was this patch tested?

Jenkins will test it for our existing test codes. Also, manually tested with UTF-8 output.

Closes apache#26021 from HyukjinKwon/SPARK-29286.

Authored-by: HyukjinKwon <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
  • Loading branch information
HyukjinKwon authored and dongjoon-hyun committed Oct 4, 2019
1 parent eecef75 commit 20ee2f5
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions python/run-tests.py
Original file line number Diff line number Diff line change
Expand Up @@ -117,7 +117,7 @@ def run_individual_python_test(target_dir, test_name, pyspark_python):
log_file.writelines(per_test_output)
per_test_output.seek(0)
for line in per_test_output:
decoded_line = line.decode()
decoded_line = line.decode("utf-8", "replace")
if not re.match('[0-9]+', decoded_line):
print(decoded_line, end='')
per_test_output.close()
Expand All @@ -134,7 +134,7 @@ def run_individual_python_test(target_dir, test_name, pyspark_python):
per_test_output.seek(0)
# Here expects skipped test output from unittest when verbosity level is
# 2 (or --verbose option is enabled).
decoded_lines = map(lambda line: line.decode(), iter(per_test_output))
decoded_lines = map(lambda line: line.decode("utf-8", "replace"), iter(per_test_output))
skipped_tests = list(filter(
lambda line: re.search(r'test_.* \(pyspark\..*\) ... (skip|SKIP)', line),
decoded_lines))
Expand Down

0 comments on commit 20ee2f5

Please sign in to comment.