Skip to content

Commit

Permalink
Update data.py
Browse files Browse the repository at this point in the history
  • Loading branch information
zanussbaum authored Mar 29, 2023
1 parent c5f5882 commit 7e468f2
Showing 1 changed file with 0 additions and 4 deletions.
4 changes: 0 additions & 4 deletions data.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,14 +70,10 @@ def load_data(config, tokenizer):
else:
dataset = load_dataset(dataset_path)

uuids = load_dataset("json", data_files="watermark.jsonl", split="train")
dataset = dataset.train_test_split(test_size=.05, seed=config["seed"])

train_dataset, val_dataset = dataset["train"], dataset["test"]

train_dataset = concatenate_datasets([train_dataset, uuids])
train_dataset = train_dataset.shuffle(seed=config["seed"])

if config["streaming"] is False:
kwargs = {"num_proc": config["num_proc"]}
else:
Expand Down

0 comments on commit 7e468f2

Please sign in to comment.