Skip to content

Commit

Permalink
Add BigQuery enrichment bug to changes.md and blogs (apache#33165)
Browse files Browse the repository at this point in the history
* Add BigQuery enrichment bug to changes.md and blogs

* Address comments

---------

Co-authored-by: Claude <[email protected]>
  • Loading branch information
claudevdm and Claude authored Nov 19, 2024
1 parent fd6b133 commit 12a056d
Show file tree
Hide file tree
Showing 5 changed files with 40 additions and 2 deletions.
24 changes: 24 additions & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,7 @@
* (Java) Fixed tearDown not invoked when DoFn throws on Portable Runners ([#18592](https://github.com/apache/beam/issues/18592), [#31381](https://github.com/apache/beam/issues/31381)).
* (Java) Fixed protobuf error with MapState.remove() in Dataflow Streaming Java Legacy Runner without Streaming Engine ([#32892](https://github.com/apache/beam/issues/32892)).
* Adding flag to support conditionally disabling auto-commit in JdbcIO ReadFn ([#31111](https://github.com/apache/beam/issues/31111))
* (Python) Fixed BigQuery Enrichment bug that can lead to multiple conditions returning duplicate rows, batching returning incorrect results and conditions not scoped by row during batching ([#32780](https://github.com/apache/beam/pull/32780)).

## Security Fixes
* Fixed (CVE-YYYY-NNNN)[https://www.cve.org/CVERecord?id=CVE-YYYY-NNNN] (Java/Python/Go) ([#X](https://github.com/apache/beam/issues/X)).
Expand Down Expand Up @@ -187,6 +188,13 @@ when running on 3.8. ([#31192](https://github.com/apache/beam/issues/31192))
* (Java, Python, Go) Fixed PeriodicSequence backlog bytes reporting, which was preventing Dataflow Runner autoscaling from functioning properly ([#32506](https://github.com/apache/beam/issues/32506)).
* (Java) Fix improper decoding of rows with schemas containing nullable fields when encoded with a schema with equal encoding positions but modified field order. ([#32388](https://github.com/apache/beam/issues/32388)).

## Known Issues

* BigQuery Enrichment (Python): The following issues are present when using the BigQuery enrichment transform ([#32780](https://github.com/apache/beam/pull/32780)):
* Duplicate Rows: Multiple conditions may be applied incorrectly, leading to the duplication of rows in the output.
* Incorrect Results with Batched Requests: Conditions may not be correctly scoped to individual rows within the batch, potentially causing inaccurate results.
* Fixed in 2.61.0.

# [2.59.0] - 2024-09-11

## Highlights
Expand Down Expand Up @@ -230,6 +238,10 @@ when running on 3.8. ([#31192](https://github.com/apache/beam/issues/31192))
* If your pipeline is having difficulty with the Python or Java direct runners, but runs well on Prism, please let us know.

* Java file-based IOs read or write lots (100k+) files could experience slowness and/or broken metrics visualization on Dataflow UI [#32649](https://github.com/apache/beam/issues/32649).
* BigQuery Enrichment (Python): The following issues are present when using the BigQuery enrichment transform ([#32780](https://github.com/apache/beam/pull/32780)):
* Duplicate Rows: Multiple conditions may be applied incorrectly, leading to the duplication of rows in the output.
* Incorrect Results with Batched Requests: Conditions may not be correctly scoped to individual rows within the batch, potentially causing inaccurate results.
* Fixed in 2.61.0.

# [2.58.1] - 2024-08-15

Expand All @@ -241,6 +253,10 @@ when running on 3.8. ([#31192](https://github.com/apache/beam/issues/31192))

* Large Dataflow graphs using runner v2, or pipelines explicitly enabling the `upload_graph` experiment, will fail at construction time ([#32159](https://github.com/apache/beam/issues/32159)).
* Python pipelines that run with 2.53.0-2.58.0 SDKs and read data from GCS might be affected by a data corruption issue ([#32169](https://github.com/apache/beam/issues/32169)). The issue will be fixed in 2.59.0 ([#32135](https://github.com/apache/beam/pull/32135)). To work around this, update the google-cloud-storage package to version 2.18.2 or newer.
* BigQuery Enrichment (Python): The following issues are present when using the BigQuery enrichment transform ([#32780](https://github.com/apache/beam/pull/32780)):
* Duplicate Rows: Multiple conditions may be applied incorrectly, leading to the duplication of rows in the output.
* Incorrect Results with Batched Requests: Conditions may not be correctly scoped to individual rows within the batch, potentially causing inaccurate results.
* Fixed in 2.61.0.

# [2.58.0] - 2024-08-06

Expand Down Expand Up @@ -272,6 +288,10 @@ when running on 3.8. ([#31192](https://github.com/apache/beam/issues/31192))
* Large Dataflow graphs using runner v2, or pipelines explicitly enabling the `upload_graph` experiment, will fail at construction time ([#32159](https://github.com/apache/beam/issues/32159)).
* Python pipelines that run with 2.53.0-2.58.0 SDKs and read data from GCS might be affected by a data corruption issue ([#32169](https://github.com/apache/beam/issues/32169)). The issue will be fixed in 2.59.0 ([#32135](https://github.com/apache/beam/pull/32135)). To work around this, update the google-cloud-storage package to version 2.18.2 or newer.
* [KafkaIO] Records read with `ReadFromKafkaViaSDF` are redistributed and may contain duplicates regardless of the configuration. This affects Java pipelines with Dataflow v2 runner and xlang pipelines reading from Kafka, ([#32196](https://github.com/apache/beam/issues/32196))
* BigQuery Enrichment (Python): The following issues are present when using the BigQuery enrichment transform ([#32780](https://github.com/apache/beam/pull/32780)):
* Duplicate Rows: Multiple conditions may be applied incorrectly, leading to the duplication of rows in the output.
* Incorrect Results with Batched Requests: Conditions may not be correctly scoped to individual rows within the batch, potentially causing inaccurate results.
* Fixed in 2.61.0.

# [2.57.0] - 2024-06-26

Expand Down Expand Up @@ -328,6 +348,10 @@ when running on 3.8. ([#31192](https://github.com/apache/beam/issues/31192))

* Large Dataflow graphs using runner v2, or pipelines explicitly enabling the `upload_graph` experiment, will fail at construction time ([#32159](https://github.com/apache/beam/issues/32159)).
* Python pipelines that run with 2.53.0-2.58.0 SDKs and read data from GCS might be affected by a data corruption issue ([#32169](https://github.com/apache/beam/issues/32169)). The issue will be fixed in 2.59.0 ([#32135](https://github.com/apache/beam/pull/32135)). To work around this, update the google-cloud-storage package to version 2.18.2 or newer.
* BigQuery Enrichment (Python): The following issues are present when using the BigQuery enrichment transform ([#32780](https://github.com/apache/beam/pull/32780)):
* Duplicate Rows: Multiple conditions may be applied incorrectly, leading to the duplication of rows in the output.
* Incorrect Results with Batched Requests: Conditions may not be correctly scoped to individual rows within the batch, potentially causing inaccurate results.
* Fixed in 2.61.0.

# [2.56.0] - 2024-05-01

Expand Down
4 changes: 4 additions & 0 deletions website/www/site/content/en/blog/beam-2.57.0.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,10 @@ For more information on changes in 2.57.0, check out the [detailed release notes
## Known Issues

* Python pipelines that run with 2.53.0-2.58.0 SDKs and read data from GCS might be affected by a data corruption issue ([#32169](https://github.com/apache/beam/issues/32169)). The issue will be fixed in 2.59.0 ([#32135](https://github.com/apache/beam/pull/32135)). To work around this, update the google-cloud-storage package to version 2.18.2 or newer.
* BigQuery Enrichment (Python): The following issues are present when using the BigQuery enrichment transform ([#32780](https://github.com/apache/beam/pull/32780)):
* Duplicate Rows: Multiple conditions may be applied incorrectly, leading to the duplication of rows in the output.
* Incorrect Results with Batched Requests: Conditions may not be correctly scoped to individual rows within the batch, potentially causing inaccurate results.
* Fixed in 2.61.0.

For the most up to date list of known issues, see https://github.com/apache/beam/blob/master/CHANGES.md

Expand Down
4 changes: 4 additions & 0 deletions website/www/site/content/en/blog/beam-2.58.0.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,10 @@ For more information about changes in 2.58.0, check out the [detailed release no

* Python pipelines that run with 2.53.0-2.58.0 SDKs and read data from GCS might be affected by a data corruption issue ([#32169](https://github.com/apache/beam/issues/32169)). The issue will be fixed in 2.59.0 ([#32135](https://github.com/apache/beam/pull/32135)). To work around this, update the google-cloud-storage package to version 2.18.2 or newer.
* [KafkaIO] Records read with `ReadFromKafkaViaSDF` are redistributed and may contain duplicates regardless of the configuration. This affects Java pipelines with Dataflow v2 runner and xlang pipelines reading from Kafka, ([#32196](https://github.com/apache/beam/issues/32196))
* BigQuery Enrichment (Python): The following issues are present when using the BigQuery enrichment transform ([#32780](https://github.com/apache/beam/pull/32780)):
* Duplicate Rows: Multiple conditions may be applied incorrectly, leading to the duplication of rows in the output.
* Incorrect Results with Batched Requests: Conditions may not be correctly scoped to individual rows within the batch, potentially causing inaccurate results.
* Fixed in 2.61.0.

For the most up to date list of known issues, see https://github.com/apache/beam/blob/master/CHANGES.md

Expand Down
5 changes: 4 additions & 1 deletion website/www/site/content/en/blog/beam-2.59.0.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,8 +66,11 @@ For more information on changes in 2.59.0, check out the [detailed release notes
* In the 2.59.0 release, Prism passes most runner validations tests with the exceptions of pipelines using the following features:
OrderedListState, OnWindowExpiry (eg. GroupIntoBatches), CustomWindows, MergingWindowFns, Trigger and WindowingStrategy associated features, Bundle Finalization, Looping Timers, and some Coder related issues such as with Python combiner packing, and Java Schema transforms, and heterogenous flatten coders. Processing Time timers do not yet have real time support.
* If your pipeline is having difficulty with the Python or Java direct runners, but runs well on Prism, please let us know.

* Java file-based IOs read or write lots (100k+) files could experience slowness and/or broken metrics visualization on Dataflow UI [#32649](https://github.com/apache/beam/issues/32649).
* BigQuery Enrichment (Python): The following issues are present when using the BigQuery enrichment transform ([#32780](https://github.com/apache/beam/pull/32780)):
* Duplicate Rows: Multiple conditions may be applied incorrectly, leading to the duplication of rows in the output.
* Incorrect Results with Batched Requests: Conditions may not be correctly scoped to individual rows within the batch, potentially causing inaccurate results.
* Fixed in 2.61.0.

For the most up to date list of known issues, see https://github.com/apache/beam/blob/master/CHANGES.md

Expand Down
5 changes: 4 additions & 1 deletion website/www/site/content/en/blog/beam-2.60.0.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,10 @@ when running on 3.8. ([#31192](https://github.com/apache/beam/issues/31192))

## Known Issues

N/A
* BigQuery Enrichment (Python): The following issues are present when using the BigQuery enrichment transform ([#32780](https://github.com/apache/beam/pull/32780)):
* Duplicate Rows: Multiple conditions may be applied incorrectly, leading to the duplication of rows in the output.
* Incorrect Results with Batched Requests: Conditions may not be correctly scoped to individual rows within the batch, potentially causing inaccurate results.
* Fixed in 2.61.0.

For the most up to date list of known issues, see https://github.com/apache/beam/blob/master/CHANGES.md

Expand Down

0 comments on commit 12a056d

Please sign in to comment.