Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to update a sheet in workbook without having to read the whole workbook #430

Open
janlimbeck opened this issue May 29, 2023 · 5 comments

Comments

@janlimbeck
Copy link

Is your feature request related to a problem? Please describe.

When I want to update one sheet in an existing and potentially quite big notebook I currently have to do something like the below:

workBook <- loadWorkbook(workbookPath)
writeData(workBook, sheet = sheetName, x = dataFrame)
saveWorkbook(workBook, workbookPath, overwrite = TRUE)

This can take several minutes for the notebooks I am handling.

Describe the solution you'd like
It would be great if e.g. write.xlsx could be adapted such that it can update a sheet in an existing workbook without having to load/parse the complete notebook. Additionally I have observed that parsing/writing back the unchanged sheets can sometime still lead to breaking changes for more complicated sheets. Hence it would be great if those were not touched at all both from a performance and compatibility perspective.

Describe alternatives you've considered

In principle the approach with loadWorkbook, writeData, saveWorkbook works but it is slow and occasionally tends to alter content.

@JanMarvin
Copy link
Collaborator

Hi @janlimbeck , this is a reasonable user requirement, but it is not implemented in this project, nor is it provided for in any other project I know of. openxlsx imports the entire workbook and imports styles, drawings and so on for a reason. To allow changes and access to the whole workbook. Still, what you want might be possible, see below, but if the entire world economy collapses because of you simply following guides from strangers on the internet, just to save yourself a few minutes, I point you to the MIT license: WITHOUT WARRANTY OF ANY KIND.
No, joking aside, make a few copies and check everything extensively (Oh, and use a database if saving takes a long time, databases were invented for a reason ...).

The following should come close to what you want:

  1. It unzips an existing workbook (old)
  2. It loads only a single sheet from old in openxlsx2 and saves the output (new)
  3. New is unzipped and sheet1.xml from old is overwritten with new
  4. The unzipped old with the replaced sheet from new is zipped again

This should work as long as you only have data on the sheet (hyperlinks, drawings, pivot tables and other references might differ, in old and new. Look for any r:id= part in the xml files. The relationship ids on the new sheet need to match the old sheet you are going to replace). Of course references on other sheets, formulas in this workbook etc. will all be broken if the file contains a calcChain.xml file, but at least broken in an interesting way (hint: you have to manually click into every cell to repair the calculation chain).

library(openxlsx2)

### create a directory
tmp <- paste0(tempdir(), "/unziptest")
# unlink(tmp, recursive = TRUE)
dir.create(tmp)
setwd(tmp)

tmp_old <- paste0(tmp, "/old_xlsx")
tmp_new <- paste0(tmp, "/new_xlsx")

xlsx_old <- "readTest.xlsx"
xlsx_new <- "Sheet1.xlsx"

xlsx_fin <- paste0(tmp, "/Updated.xlsx")

# download the old workbook
fl <- "https://github.com/JanMarvin/openxlsx-data/raw/main/readTest.xlsx"
curl::curl_download(fl, xlsx_old)

# extract the old workbook
unlink(tmp_old, recursive = TRUE)
unzip(xlsx_old, exdir = tmp_old)

# modify the new workbook
wb <- wb_load(xlsx_old, sheet = 1, data_only = TRUE)
wb$add_data(sheet = 1, dims = "A14", x = mtcars)
wb$save(xlsx_new)

# extract the new workbook
unlink(tmp_new, recursive = TRUE)
unzip(xlsx_new, exdir = tmp_new)

# replace a sheet from the old workbook with the new workbook
sheet_xml <- "/xl/worksheets/sheet1.xml"
old <- paste0(tmp_old, sheet_xml)
new <- paste0(tmp_new, sheet_xml)

file.copy(new, old, overwrite = TRUE)

# create the output
zip::zip(
  zipfile = xlsx_fin,
  files = list.files(tmp_old, full.names = FALSE),
  recurse = TRUE, 
  compression_level = 6,
  include_directories = FALSE,
  root = "old_xlsx",
  mode = "cherry-pick"
)

xl_open(xlsx_fin)

@janlimbeck
Copy link
Author

janlimbeck commented May 30, 2023

@JanMarvin Many thanks for this! I will take a good look and see if anything breaks in my scenario. I see you are the author of openxlsx2. Any chance a feature like this may find its way - in a less hacky way - into your library?

@JanMarvin
Copy link
Collaborator

No, I'm not really interested in this feature, keeping track of references in a workbook is hard enough. Keeping track of references in the input file and the output file really seems to cause major headaches.
But this is the openxlsx issue tracker and I had a spare moment to explain why it is not possible in openxlsx and provide a small example of what could be done instead.
PS: You should probably copy styles.xml from new (if you're writing anything other than plain text and numbers) and certainly delete calcChain.xml in the old folder and reference it in [Content_Types].xml. Keeping the calculation string really tends to brick the output file.

Copy link

github-actions bot commented Jun 1, 2024

This issue is stale because it has been open 365 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the Stale label Jun 1, 2024
@janlimbeck
Copy link
Author

Hi @janlimbeck , this is a reasonable user requirement, but it is not implemented in this project, nor is it provided for in any other project I know of. openxlsx imports the entire workbook and imports styles, drawings and so on for a reason. To allow changes and access to the whole workbook. Still, what you want might be possible, see below, but if the entire world economy collapses because of you simply following guides from strangers on the internet, just to save yourself a few minutes, I point you to the MIT license: WITHOUT WARRANTY OF ANY KIND. No, joking aside, make a few copies and check everything extensively (Oh, and use a database if saving takes a long time, databases were invented for a reason ...).

The following should come close to what you want:

  1. It unzips an existing workbook (old)
  2. It loads only a single sheet from old in openxlsx2 and saves the output (new)
  3. New is unzipped and sheet1.xml from old is overwritten with new
  4. The unzipped old with the replaced sheet from new is zipped again

This should work as long as you only have data on the sheet (hyperlinks, drawings, pivot tables and other references might differ, in old and new. Look for any r:id= part in the xml files. The relationship ids on the new sheet need to match the old sheet you are going to replace). Of course references on other sheets, formulas in this workbook etc. will all be broken if the file contains a calcChain.xml file, but at least broken in an interesting way (hint: you have to manually click into every cell to repair the calculation chain).

library(openxlsx2)

### create a directory
tmp <- paste0(tempdir(), "/unziptest")
# unlink(tmp, recursive = TRUE)
dir.create(tmp)
setwd(tmp)

tmp_old <- paste0(tmp, "/old_xlsx")
tmp_new <- paste0(tmp, "/new_xlsx")

xlsx_old <- "readTest.xlsx"
xlsx_new <- "Sheet1.xlsx"

xlsx_fin <- paste0(tmp, "/Updated.xlsx")

# download the old workbook
fl <- "https://github.com/JanMarvin/openxlsx-data/raw/main/readTest.xlsx"
curl::curl_download(fl, xlsx_old)

# extract the old workbook
unlink(tmp_old, recursive = TRUE)
unzip(xlsx_old, exdir = tmp_old)

# modify the new workbook
wb <- wb_load(xlsx_old, sheet = 1, data_only = TRUE)
wb$add_data(sheet = 1, dims = "A14", x = mtcars)
wb$save(xlsx_new)

# extract the new workbook
unlink(tmp_new, recursive = TRUE)
unzip(xlsx_new, exdir = tmp_new)

# replace a sheet from the old workbook with the new workbook
sheet_xml <- "/xl/worksheets/sheet1.xml"
old <- paste0(tmp_old, sheet_xml)
new <- paste0(tmp_new, sheet_xml)

file.copy(new, old, overwrite = TRUE)

# create the output
zip::zip(
  zipfile = xlsx_fin,
  files = list.files(tmp_old, full.names = FALSE),
  recurse = TRUE, 
  compression_level = 6,
  include_directories = FALSE,
  root = "old_xlsx",
  mode = "cherry-pick"
)

xl_open(xlsx_fin)

I have implemented the workaround described above but would still be curious if somebody has started working on a native solution.

@github-actions github-actions bot removed the Stale label Jun 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants