Skip to content

Commit

Permalink
Fix append without duplicates (#83)
Browse files Browse the repository at this point in the history
* fix: append without duplicates

* chore: update readme
  • Loading branch information
robertkossendey authored Feb 15, 2023
1 parent 0f8771c commit e5b3f18
Show file tree
Hide file tree
Showing 3 changed files with 5 additions and 1 deletion.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -311,6 +311,7 @@ Here is data to be appended:
+----+----+----+
| 2| R| T| # duplicate col1
| 8| A| B|
| 8| C| D| # duplicate col1
| 10| X| Y|
+----+----+----+
```
Expand Down
4 changes: 3 additions & 1 deletion mack/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -405,9 +405,11 @@ def append_without_duplicates(

condition_columns = " AND ".join(condition_columns)

deduplicated_append_df = append_df.drop_duplicates(p_keys)

# Insert records without duplicates
delta_table.alias("old").merge(
append_df.alias("new"), condition_columns
deduplicated_append_df.alias("new"), condition_columns
).whenNotMatchedInsertAll().execute()


Expand Down
1 change: 1 addition & 0 deletions tests/test_public_interface.py
Original file line number Diff line number Diff line change
Expand Up @@ -468,6 +468,7 @@ def test_append_without_duplicates_single_column(tmp_path):
[
(2, "R", "T"), # duplicate
(8, "A", "B"),
(8, "B", "C"), # duplicate
(10, "X", "Y"),
],
["col1", "col2", "col3"],
Expand Down

0 comments on commit e5b3f18

Please sign in to comment.