Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Images #249

Closed
stitam opened this issue May 6, 2020 · 39 comments · Fixed by #253 · May be fixed by #247 or #235
Closed

Images #249

stitam opened this issue May 6, 2020 · 39 comments · Fixed by #253 · May be fixed by #247 or #235
Labels
consistent api Uniformity across functions and data sources

Comments

@stitam
Copy link
Contributor

stitam commented May 6, 2020

Some web services allow us to retrieve images of substances. There are already implementations in PR #235 and PR #247. Also ChemSpider and PubChem both offer such functionality. I thought it would be good to open a small discussion about design considerations to ensure consistency.

  • function name alternatives: *_view(), *_viewer(), _*img(), *_image(). Let's choose one and stick to it. I prefer *_img(), as proposed by @andschar, e.g. actor_img(), pc_img().
  • arguments: query, from, format, width, height, verbose, in this order?
  • outputs: 1. a list of raw character vectors. 2. exported images. Alternatively, we can add two more arguments, one to decide if we want to download, e.g. output = c('image', 'download') as suggested by @gjgetzinger, and another for path.

On first thought, I think I would prefer these functions to return a list of raw character vectors. I tried png, svg, jpeg, these can all be accessed e.g. through httr::content(httr::GET(url), type = "image"). Raw vectors can then be rendered easily with image processing packages like magick, e.g. plot(as.raster(image_read(raw))) and also exported in any format, but we wouldn't have to import them as dependencies, so webchem could remain lightweight.

What do you think?

This was referenced May 6, 2020
@stitam stitam added the consistent api Uniformity across functions and data sources label May 6, 2020
@andschar
Copy link
Contributor

andschar commented May 6, 2020

I was about to open the same issue^^

  • I agree with point one, to choose one and I also prefer img. However, maybe it's better to turn the name around like in get_*() to have such a naming convention img_actor(). What do you think?
  • Ma idea for mandatory arguments:
    • query
    • from
    • dir or directory argument to allow the user to store the image at a specific place
      • default write location: tmepdir() b/c it's OS independant
      • or name it locaiton?
    • format (e.g. svg, png, jpeg)
    • width & height
      • usually provided by the API anyway
    • apart from that any number of arguments can follow the above ones
    • verbose argument at the end

@andschar
Copy link
Contributor

andschar commented May 6, 2020

IUp to now, I haven't thought of the output argument. Why should we want to retrieve only the paths and not directly store the image to disk?

@andschar
Copy link
Contributor

andschar commented May 6, 2020

There is also the CIR service which provides this functionality.
I will include a function for it in PR #247.

@stitam
Copy link
Contributor Author

stitam commented May 6, 2020

I thought one option could be output = "download" and then path would be used to tell the function where to save, and the other could be output = "image" and then path wouldn't be used and the function would keep the images in R to be used for analysis/reporting. Maybe calling the argument dir, as you suggested is less confusing than path. Regardless, I am leaning towards no export functionality and leaving that to other functions. What do you think?

@stitam
Copy link
Contributor Author

stitam commented May 6, 2020

There is also the CIR service which provides this functionality.
I will include a function for it in PR #247.

Though maybe it would be better to separate it into another PR.

@andschar
Copy link
Contributor

andschar commented May 6, 2020

I thought one option could be output = "download" and then path would be used to tell the function where to save, and the other could be output = "image" and then path wouldn't be used and the function would keep the images in R to be used for analysis/reporting. Maybe calling the argument dir, as you suggested is less confusing than path. Regardless, I am leaning towards no export functionality and leaving that to other functions. What do you think?

I would rather include the export functionality b/c the image is anyway created by the web service. We just have to call download.file or curl::download_file() then to store it somewhere to disk.

Hm, not sure about keeping images in R. They can probably also get quite big. I would rather store them to disk, and read them again, when needed, using a non-webchem function. E.g. knitr::include_graphics() in an RMardown for example. This approach would also not include any new dependencies.

@andschar
Copy link
Contributor

andschar commented May 6, 2020

There is also the CIR service which provides this functionality.
I will include a function for it in PR #247.

Though maybe it would be better to separate it into another PR.

Agreed.

@andschar
Copy link
Contributor

andschar commented May 6, 2020

* I agree with point one, to choose one and I also prefer `img`. However, maybe it's better to turn the name around like in `get_*()` to have such a naming convention `img_actor()`. What do you think?

What do you prefer? actor_img() or img_actor()?

@stitam
Copy link
Contributor Author

stitam commented May 6, 2020

* I agree with point one, to choose one and I also prefer `img`. However, maybe it's better to turn the name around like in `get_*()` to have such a naming convention `img_actor()`. What do you think?

What do you prefer? actor_img() or img_actor()?

I prefer actor_img() because it is more consistent e.g. with etox_basic(), pc_prop().

@stitam
Copy link
Contributor Author

stitam commented May 6, 2020

I see your point regarding image size. However, if you wanted to use images of compounds as descriptors for an ML problem (I don't know if this is a reasonable thing to do), then returning a vector of images and keeping them in R would be much simpler than importing them from hard drive.

@andschar andschar mentioned this issue May 6, 2020
4 tasks
@andschar
Copy link
Contributor

andschar commented May 6, 2020

* I agree with point one, to choose one and I also prefer `img`. However, maybe it's better to turn the name around like in `get_*()` to have such a naming convention `img_actor()`. What do you think?

What do you prefer? actor_img() or img_actor()?

I prefer actor_img() because it is more consistent e.g. with etox_basic(), pc_prop().

Ok, sounds also good to me.

@Aariq
Copy link
Collaborator

Aariq commented May 6, 2020

I thought one option could be output = "download" and then path would be used to tell the function where to save, and the other could be output = "image" and then path wouldn't be used and the function would keep the images in R to be used for analysis/reporting. Maybe calling the argument dir, as you suggested is less confusing than path. Regardless, I am leaning towards no export functionality and leaving that to other functions. What do you think?

Some image types can't be represented as R objects and only a download/export option would make sense. E.g. I think in #235 one of the file options is a PDF.

@Aariq
Copy link
Collaborator

Aariq commented May 6, 2020

I think they image functions should either return raster images, or in the case that they download image files, they should return the file paths (silently). That will make integration with things like knitr::include_graphics() or pander::pandoc.image.return() easier so we can make things like this:

library(webchem)
library(pander)
library(knitr)

id <- c("Glyphosate", "Isoproturon", "BSYNRYMUTXBXSQ-UHFFFAOYSA-N")
cir_img(id, bgcolor = "transparent", antialising = 0, width = 200, height = 200, symbolfontsize = 9, dir = here::here())
#> Querying: Glyphosate
#> https://cactus.nci.nih.gov/chemical/structure/Glyphosate/image?format=png&width=200&height=200&linewidth=2&symbolfontsize=9&bgcolor=transparent&csymbol=special&hsymbol=special
#> Image saved under: /private/var/folders/b_/2vfnxxls5vs401tmhhb3wqdh0000gp/T/RtmpH52OZ2/reprex9d0d4d96caea/Glyphosate.png
#> Querying: Isoproturon
#> https://cactus.nci.nih.gov/chemical/structure/Isoproturon/image?format=png&width=200&height=200&linewidth=2&symbolfontsize=9&bgcolor=transparent&csymbol=special&hsymbol=special
#> Image saved under: /private/var/folders/b_/2vfnxxls5vs401tmhhb3wqdh0000gp/T/RtmpH52OZ2/reprex9d0d4d96caea/Isoproturon.png
#> Querying: BSYNRYMUTXBXSQ-UHFFFAOYSA-N
#> https://cactus.nci.nih.gov/chemical/structure/BSYNRYMUTXBXSQ-UHFFFAOYSA-N/image?format=png&width=200&height=200&linewidth=2&symbolfontsize=9&bgcolor=transparent&csymbol=special&hsymbol=special
#> Image saved under: /private/var/folders/b_/2vfnxxls5vs401tmhhb3wqdh0000gp/T/RtmpH52OZ2/reprex9d0d4d96caea/BSYNRYMUTXBXSQ-UHFFFAOYSA-N.png
paths <- list.files(here::here(), pattern = "*.png", full.names = TRUE)

kable(data.frame(id = id, picture = pandoc.image.return(paths)))
id picture
Glyphosate
Isoproturon
BSYNRYMUTXBXSQ-UHFFFAOYSA-N

Created on 2020-05-06 by the reprex package (v0.3.0)

(pictures work when code is run locally with PR #250, but they don't show up in correct order. If cir_img() returned file paths, this would work correctly and with less code.)

@andschar
Copy link
Contributor

andschar commented May 7, 2020

I see your point regarding image size. However, if you wanted to use images of compounds as descriptors for an ML problem (I don't know if this is a reasonable thing to do), then returning a vector of images and keeping them in R would be much simpler than importing them from hard drive.

I have never worked with images as objects in R. Can all the common image formats (.png, .jpg, .giv, .svg) be imported as image objects? What class is that? Are these the raster images @Aariq mentioned in the above comment?

Would I have to convert the images for that? Which function?

Sorry, for all the questions, I'm a bit puzzled.

@andschar
Copy link
Contributor

andschar commented May 7, 2020

I think they image functions should either return raster images, or in the case that they download image files, they should return the file paths (silently). That will make integration with things like knitr::include_graphics() or pander::pandoc.image.return() easier so we can make things like this:

library(webchem)
library(pander)
library(knitr)

id <- c("Glyphosate", "Isoproturon", "BSYNRYMUTXBXSQ-UHFFFAOYSA-N")
cir_img(id, bgcolor = "transparent", antialising = 0, width = 200, height = 200, symbolfontsize = 9, dir = here::here())
#> Querying: Glyphosate
#> https://cactus.nci.nih.gov/chemical/structure/Glyphosate/image?format=png&width=200&height=200&linewidth=2&symbolfontsize=9&bgcolor=transparent&csymbol=special&hsymbol=special
#> Image saved under: /private/var/folders/b_/2vfnxxls5vs401tmhhb3wqdh0000gp/T/RtmpH52OZ2/reprex9d0d4d96caea/Glyphosate.png
#> Querying: Isoproturon
#> https://cactus.nci.nih.gov/chemical/structure/Isoproturon/image?format=png&width=200&height=200&linewidth=2&symbolfontsize=9&bgcolor=transparent&csymbol=special&hsymbol=special
#> Image saved under: /private/var/folders/b_/2vfnxxls5vs401tmhhb3wqdh0000gp/T/RtmpH52OZ2/reprex9d0d4d96caea/Isoproturon.png
#> Querying: BSYNRYMUTXBXSQ-UHFFFAOYSA-N
#> https://cactus.nci.nih.gov/chemical/structure/BSYNRYMUTXBXSQ-UHFFFAOYSA-N/image?format=png&width=200&height=200&linewidth=2&symbolfontsize=9&bgcolor=transparent&csymbol=special&hsymbol=special
#> Image saved under: /private/var/folders/b_/2vfnxxls5vs401tmhhb3wqdh0000gp/T/RtmpH52OZ2/reprex9d0d4d96caea/BSYNRYMUTXBXSQ-UHFFFAOYSA-N.png
paths <- list.files(here::here(), pattern = "*.png", full.names = TRUE)

kable(data.frame(id = id, picture = pandoc.image.return(paths)))

id picture
Glyphosate
Isoproturon
BSYNRYMUTXBXSQ-UHFFFAOYSA-N

Created on 2020-05-06 by the reprex package (v0.3.0)

(pictures work when code is run locally with PR #250, but they don't show up in correct order. If cir_img() returned file paths, this would work correctly and with less code.)

Thank @Aariq for the comment.
I also like the idea of downloading the images and returning the paths.
Just an idea: What about returning a data.frame with paths to local files and URLs?

@andschar
Copy link
Contributor

andschar commented May 7, 2020

I have updated PR #250 to return (1) a file written to disk and (2) a data.frame containing the local path and the URL. What do you think?

@stitam
Copy link
Contributor Author

stitam commented May 7, 2020

I have never worked with images as objects in R. Can all the common image formats (.png, .jpg, .giv, .svg) be imported as image objects? What class is that? Are these the raster images @Aariq mentioned in the above comment?

Some image types can't be represented as R objects and only a download/export option would make sense. E.g. I think in #235 one of the file options is a PDF.

Working with images as objects:

#works with .png, .svg, .jpeg, .pdf
library(magick)
library(httr)
url <- "https://actorws.epa.gov/actorws/chemical/image?smiles=CC%3DO&w=400&h=400&fmt=png"
cont <- content(GET(url), type = "image") #object of class "raw"
img <- image_read(cont) #object of class "magick-image"
raster <- as.raster(img) #object of class "raster"
plot(img)
plot(raster)

We can get to raw without depending on another package but we cannot get to raster I think. On the other hand, image processing packages work well with images on disk, so maybe it is simpler to just export, I rest my case for keeping image objects in R.

Just an idea: What about returning a data.frame with paths to local files and URLs?

I like the idea of returning a data.frame! I think the query column would also be useful similarly to the get_*() functions. Are relative paths enough for subsequent functions or do we need absoIute paths? I don't know if including the urls in the output has a benefit, other webchem functions only print the url in verbose messages.

@andschar
Copy link
Contributor

andschar commented May 7, 2020

Ok, cool. I really think adding dependencies such as magick and raster (?) is a bit too much for webchem.
Just a side note: In my experience, having everything in memory and not writing things to disc is a very "R" thing. Though I'm also not very proficient in other prog languages and this statement might not be TRUE^^

@stitam
Copy link
Contributor Author

stitam commented May 7, 2020

It's just very very polite:)

@andschar
Copy link
Contributor

andschar commented May 7, 2020

Ok, now that we agree to write to disk, what function should we use? As always in R, there are many possibilities. I prefer httr:: style, just because it is similar to other GET() and POST() functions in webchem. What about you?

qurl = "https://cactus.nci.nih.gov/chemical/structure/Triclosan/image?format=png&width=500&height=500&linewidth=2&symbolfontsize=16&csymbol=special&hsymbol=special"
# httr
require(httr)
h <- try(
  GET(qurl,
      timeout(5),
      write_disk(tempfile(), overwrite = TRUE))
)
# base
download.file(qurl,
              destfile = tempfile())
# curl
curl::curl_download(qurl,
                    destfile = tempfile())

@stitam
Copy link
Contributor Author

stitam commented May 7, 2020

I personally prefer httr, also because of the nice status messages that can be used for verbose output, but as long as it doesn't involve another dependency, I'd rather not search consensus here:)

@andschar
Copy link
Contributor

andschar commented May 7, 2020

I personally prefer httr, also because of the nice status messages that can be used for verbose output, but as long as it doesn't involve another dependency, I'd rather not search consensus here:)

Sounds like consensus to me :)

@Aariq
Copy link
Collaborator

Aariq commented May 12, 2020

I think I've had a bit of change of heart on this issue.

I think to keep things simple, image functions could return png files as R objects (with png::readPNG()) when no directory is specified and plot them by default (e.g. with grid::grid.raster()). If no directory is specified, then you shouldn't be allowed to get other file types. If other file types are specified, then the user must supply a directory. In the case that a directory is specified, the function should return file paths silently. Why? I think saving files to to a temporary directory is less transparent than returning R objects and more useful. If you wanted to use them externally to R, then you'd provide a directory, and if you want to use them internally, then this skips writing them to file and then reading back in. WIth defaults, the image function provides a way to preview the image, and then, once the user is satisfied, they can provide a directory and a file type for download.

@andschar
Copy link
Contributor

I think I've had a bit of change of heart on this issue.

I think to keep things simple, image functions could return png files as R objects (with png::readPNG()) when no directory is specified and plot them by default (e.g. with grid::grid.raster()). If no directory is specified, then you shouldn't be allowed to get other file types. If other file types are specified, then the user must supply a directory. In the case that a directory is specified, the function should return file paths silently. Why? I think saving files to to a temporary directory is less transparent than returning R objects and more useful. If you wanted to use them externally to R, then you'd provide a directory, and if you want to use them internally, then this skips writing them to file and then reading back in. WIth defaults, the image function provides a way to preview the image, and then, once the user is satisfied, they can provide a directory and a file type for download.

Thanks @Aariq for the comment!

Would you store multiple outputs in an R list or directly plot them then?

Agreeing:

  • I think your suggestion has the advantage to retrieve a single image and directly plot it (e.g. in an RMarkdown document)

My concerns:

  • Imho, Your suggestion makes things more complex. We would have to add an extra conditional (e.g. if (type == "png" & is.null(dir))).
    • Ok, technically not so difficult, yet still increased complexity
    • I agree with you though, that saving files to tempdir() might not be obvious to a novel user. One idea to prevent that, would be to throw an error (e.g. No directory supplied. Please provide one), if no directory is supplied.
  • Filling up RAM with pictures
    • In the example below, 40 (700x700px) pngs lead 470MB
  • We would add image processing methods to webchem. Why not just let webchem be the "download vessel" and let then the user decide how to process the images?

Code example for the CIR image service comparing the two approaches

# setup -------------------------------------------------------------------
require(httr)
query = c(
  "25057-89-0", "72-43-5", "640-15-3", "82-68-8", "50471-44-8", 
  "13457-18-6", "94-82-6", "93-76-5", "15972-60-8", "834-12-8", 
  "1912-24-9", "2642-71-9", "314-40-9", "1689-84-5", "470-90-6", 
  "1698-60-8", "1982-47-4", "2921-88-2", "15545-48-9", "21725-46-2", 
  "52315-07-8", "6190-65-4", "30125-63-4", "1007-28-9", "1014-69-3", 
  "333-41-5", "120-36-5", "62-73-7", "115-32-2", "83164-33-4", 
  "60-51-5", "298-04-4", "330-54-1", "38260-54-7", "122-14-5", 
  "93-72-1", "67564-91-4", "55-38-9", "96525-23-4", "51235-04-2"
)

# Option 1: Store as R object ---------------------------------------------
img_l = list()
for (i in query) {
  # url
  qurl = paste("https://cactus.nci.nih.gov/chemical/structure",
               i,
               "image?format=png&width=700&height=700",
               sep = '/')
  # query
  Sys.sleep(1.5)
  res = GET(qurl,
            timeout(5))
  img = png::readPNG(res$content)
  img_l[[i]] = img
  message('Processed: ', i, ' (stored in list).') # ~ 8KB / image: object.size(img)
}

object.size(img_l) # already 470 MB in memory

# Option 2: Save local ----------------------------------------------------
for (i in query) {
  # url
  qurl = paste("https://cactus.nci.nih.gov/chemical/structure",
               i,
               "image?format=png&width=700&height=700",
               sep = '/')
  # query
  Sys.sleep(1.5)
  GET(qurl,
      timeout(5),
      write_disk(file.path(tempdir(), paste0(i, ".png")), overwrite = TRUE))
  # ~ 8KB / image
  message('Processed: ', i, ' (saved to disk).')
}

This reflects just my opinion. If you @Aariq and @stitam prefer to including images as R objects, I can also live with it :)
Feel free to build on the code snippet.

@stitam
Copy link
Contributor Author

stitam commented May 12, 2020

Just some ideas how others manage images: https://jcheminf.biomedcentral.com/articles/10.1186/s13321-019-0405-0. rckd::view.molecule.2d executes a java command which returns the image?

@stitam
Copy link
Contributor Author

stitam commented May 12, 2020

I am ok with removing tempdir(), write.csv() doesn't write to tempdir either. I would love to have images in memory but I am also a concerned about importing image processing methods, I think webchem is quite heavy already. We still have to option of keeping the images in raw format (the natural class for the content of a GET() or POST() request), providing some good examples, but otherwise and leaving it to the user to import image processing packages if they want to work with images.

require(httr)
query = c(
  "25057-89-0", "72-43-5", "640-15-3", "82-68-8", "50471-44-8", 
  "13457-18-6", "94-82-6", "93-76-5", "15972-60-8", "834-12-8", 
  "1912-24-9", "2642-71-9", "314-40-9", "1689-84-5", "470-90-6", 
  "1698-60-8", "1982-47-4", "2921-88-2", "15545-48-9", "21725-46-2", 
  "52315-07-8", "6190-65-4", "30125-63-4", "1007-28-9", "1014-69-3", 
  "333-41-5", "120-36-5", "62-73-7", "115-32-2", "83164-33-4", 
  "60-51-5", "298-04-4", "330-54-1", "38260-54-7", "122-14-5", 
  "93-72-1", "67564-91-4", "55-38-9", "96525-23-4", "51235-04-2"
)

img_l = list()
for (i in query) {
  # url
  qurl = paste("https://cactus.nci.nih.gov/chemical/structure",
               i,
               "image?format=png&width=700&height=700",
               sep = '/')
  # query
  Sys.sleep(1.5)
  res = GET(qurl,
            timeout(5))
  img = content(res, type = "image) ### this is the change.
  img_l[[i]] = img
  message('Processed: ', i, ' (stored in list).') # ~ 8KB / image: object.size(img)
}

object.size(img_l)
332192 bytes

#plot image
library(magick)
img <- image_read(img_l[[1]]) #object of class "magick-image"
plot(img)

@andschar
Copy link
Contributor

Hm, that sounds convincing @stitam and would also greatly reduce the RAM footprint.
So, something like this, where the user would only decide (1) whether to leave directory = NULL (default) and return the image as a raw vector OR (2) to provide a directory = "path" and have the images stored to disk. In the latter case, should we also return a data.frame with links?

require(httr)
require(dplyr)

query = c(
  "25057-89-0", "72-43-5", "640-15-3", "82-68-8", "50471-44-8", 
  "13457-18-6", "94-82-6", "93-76-5", "15972-60-8", "834-12-8", 
  "1912-24-9", "2642-71-9", "314-40-9", "1689-84-5", "470-90-6", 
  "1698-60-8", "1982-47-4", "2921-88-2", "15545-48-9", "21725-46-2", 
  "52315-07-8", "6190-65-4", "30125-63-4", "1007-28-9", "1014-69-3", 
  "333-41-5", "120-36-5", "62-73-7", "115-32-2", "83164-33-4", 
  "60-51-5", "298-04-4", "330-54-1", "38260-54-7", "122-14-5", 
  "93-72-1", "67564-91-4", "55-38-9", "96525-23-4", "51235-04-2"
)
# dummy function
fun_img <- function(query, directory = NULL) {
  foo <- function(query, directory) {
      qurl = paste("https://cactus.nci.nih.gov/chemical/structure",
                   query,
                   "image?format=png&width=700&height=700",
                   sep = '/')
    # query
    Sys.sleep(1.5)
    if (is.null(directory)) {
      message("Retrieving raw image: ", query)
      res <- GET(qurl,
                 timeout(5))
      content(res, type = "image")
    } else {
      message("Storing image: ", query, " to: ", directory)
      GET(qurl,
          timeout(5),
          write_disk(file.path(directory, paste0(query, ".png")), # depending on the webservice, more options here
                     overwrite = TRUE))
      data.frame(query = query,
                 qurl = qurl,
                 stringsAsFactors = FALSE)
    }
  }
  l <- lapply(query, foo, directory = directory)
  names(l) <- query
  if (is.null(directory)) {
    l
  } else {
    dplyr::bind_rows(l) 
  }
}

img_l = fun_img(query[1:2])
lapply(img_l, head)
img_df = fun_img(query[1:2], directory = tempdir())
img_df

@Aariq
Copy link
Collaborator

Aariq commented May 13, 2020

Good points about RAM. It looks like rcdk also uses a temporary directory with tempfile() for downloading molfiles, but it also returns an R object that you can plot with view.molecule.2d(). That's the thing---I don't think there's a point of downloading files to a temporary directory if the user can't easily do something with them in R. I don't think I know enough about raw format images to comment on that.

@stitam
Copy link
Contributor Author

stitam commented May 13, 2020

raw is not as universal as I thought, e.g. while image_read() can convert from raw ACToR images, it doesn't work with raw CACTUS images. png::readPNG() works with both web services, but the outputs are much larger than with image_read(). Maybe it's not a good idea to return raw afterall. BTW if we use png::readPNG() and then image_read() we can shrink the image size considerably;).

@stitam
Copy link
Contributor Author

stitam commented May 14, 2020

I think we should implement the functions as downloaders for now, because that works both with web services that provide urls, and with web services that don't, e.g. ChemSpider. We can always extend that functionality later.

@Aariq
Copy link
Collaborator

Aariq commented May 14, 2020

We should make sure @gjgetzinger is looped in on this decision, since it probably changes some of our suggestions on PR #235

@gjgetzinger
Copy link
Contributor

I've been following the discussion. No objections from my side. I'm happy to make any adjustments to PR #235 as needed. (full disclosure, I am in the middle of a job change and relocation, so will be a minute before I make any changes)

@stitam
Copy link
Contributor Author

stitam commented May 15, 2020

Good luck with your new endeavor! I relocated during last August:)

@andschar
Copy link
Contributor

andschar commented May 18, 2020

I think we should implement the functions as downloaders for now, because that works both with web services that provide urls, and with web services that don't, e.g. ChemSpider. We can always extend that functionality later.

To sum up, we agree to download the images and return a data.frame with the paths to the downloaded images. If no directory is supplied they will be downloaded to tempdir(), the default. No raw images for now. Ok?

Argument examples (* mandatory ones):

  • query = *
  • from = *
  • dir = *
  • format =
  • width =
  • height =
  • any_other_argument =
  • verbose = *

Should we name the directory argument directory =, dir = or path?
I prefer dir = .

@Aariq
Copy link
Collaborator

Aariq commented May 18, 2020

If the output is just file paths, I'd say return it as a vector, silently.

@stitam
Copy link
Contributor Author

stitam commented May 19, 2020

I'd also prefer data.frame because then we have query and result, similarly to get_(). I like the idea of silent return, but I have no idea how that is implemented so I'll peek at your PRs:) Didn't we agree that tempdir() is not good practice afterall? We can just leave dir as a compulsory argument. I like dir because of its brevity, also I think path is ambiguous as it can refer to a directory or a file.

@andschar
Copy link
Contributor

I'd also prefer data.frame because then we have query and result, similarly to get_(). I like the idea of silent return, but I have no idea how that is implemented so I'll peek at your PRs:) Didn't we agree that tempdir() is not good practice afterall? We can just leave dir as a compulsory argument. I like dir because of its brevity, also I think path is ambiguous as it can refer to a directory or a file.

I also thin it's the best idea to have a mandatory dir = argument.

@stitam
Copy link
Contributor Author

stitam commented Jun 9, 2020

Do you think it is useful to return anything to the console, apart from the optional verbose messages? If we want to import the downloaded images into R later, we can easily construct the paths in the same order with paste0(dir, "/", csids, ".png"). I updated PR #253 for cs_img() by making dir mandatory and removing the tibble output. What do you think?

@Aariq
Copy link
Collaborator

Aariq commented Jun 12, 2020

Seems fine to me! Might be good to print the path and filename to the console with message() when verbose = TRUE, but I don't think it needs to output anything.

@Aariq Aariq mentioned this issue Jul 16, 2020
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
consistent api Uniformity across functions and data sources
Projects
None yet
4 participants