Skip to content

Commit

Permalink
[SPARK-12925] Improve HiveInspectors.unwrap for StringObjectInspector.…
Browse files Browse the repository at this point in the history
Earlier fix did not copy the bytes and it is possible for higher level to reuse Text object. This was causing issues. Proposed fix now copies the bytes from Text. This still avoids the expensive encoding/decoding

Author: Rajesh Balamohan <[email protected]>

Closes apache#11477 from rajeshbalamohan/SPARK-12925.2.
  • Loading branch information
rbalamohan authored and srowen committed Mar 4, 2016
1 parent c04dc27 commit 204b02b
Showing 1 changed file with 3 additions and 2 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -320,9 +320,10 @@ private[hive] trait HiveInspectors {
case hvoi: HiveCharObjectInspector =>
UTF8String.fromString(hvoi.getPrimitiveJavaObject(data).getValue)
case x: StringObjectInspector if x.preferWritable() =>
// Text is in UTF-8 already. No need to convert again via fromString
// Text is in UTF-8 already. No need to convert again via fromString. Copy bytes
val wObj = x.getPrimitiveWritableObject(data)
UTF8String.fromBytes(wObj.getBytes, 0, wObj.getLength)
val result = wObj.copyBytes()
UTF8String.fromBytes(result, 0, result.length)
case x: StringObjectInspector =>
UTF8String.fromString(x.getPrimitiveJavaObject(data))
case x: IntObjectInspector if x.preferWritable() => x.get(data)
Expand Down

0 comments on commit 204b02b

Please sign in to comment.