-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Images #249
Comments
I was about to open the same issue^^
|
IUp to now, I haven't thought of the |
I thought one option could be |
I would rather include the export functionality b/c the image is anyway created by the web service. We just have to call Hm, not sure about keeping images in R. They can probably also get quite big. I would rather store them to disk, and read them again, when needed, using a non-webchem function. E.g. |
What do you prefer? |
I prefer |
I see your point regarding image size. However, if you wanted to use images of compounds as descriptors for an ML problem (I don't know if this is a reasonable thing to do), then returning a vector of images and keeping them in R would be much simpler than importing them from hard drive. |
Ok, sounds also good to me. |
Some image types can't be represented as R objects and only a download/export option would make sense. E.g. I think in #235 one of the file options is a PDF. |
I think they image functions should either return raster images, or in the case that they download image files, they should return the file paths (silently). That will make integration with things like library(webchem)
library(pander)
library(knitr)
id <- c("Glyphosate", "Isoproturon", "BSYNRYMUTXBXSQ-UHFFFAOYSA-N")
cir_img(id, bgcolor = "transparent", antialising = 0, width = 200, height = 200, symbolfontsize = 9, dir = here::here())
#> Querying: Glyphosate
#> https://cactus.nci.nih.gov/chemical/structure/Glyphosate/image?format=png&width=200&height=200&linewidth=2&symbolfontsize=9&bgcolor=transparent&csymbol=special&hsymbol=special
#> Image saved under: /private/var/folders/b_/2vfnxxls5vs401tmhhb3wqdh0000gp/T/RtmpH52OZ2/reprex9d0d4d96caea/Glyphosate.png
#> Querying: Isoproturon
#> https://cactus.nci.nih.gov/chemical/structure/Isoproturon/image?format=png&width=200&height=200&linewidth=2&symbolfontsize=9&bgcolor=transparent&csymbol=special&hsymbol=special
#> Image saved under: /private/var/folders/b_/2vfnxxls5vs401tmhhb3wqdh0000gp/T/RtmpH52OZ2/reprex9d0d4d96caea/Isoproturon.png
#> Querying: BSYNRYMUTXBXSQ-UHFFFAOYSA-N
#> https://cactus.nci.nih.gov/chemical/structure/BSYNRYMUTXBXSQ-UHFFFAOYSA-N/image?format=png&width=200&height=200&linewidth=2&symbolfontsize=9&bgcolor=transparent&csymbol=special&hsymbol=special
#> Image saved under: /private/var/folders/b_/2vfnxxls5vs401tmhhb3wqdh0000gp/T/RtmpH52OZ2/reprex9d0d4d96caea/BSYNRYMUTXBXSQ-UHFFFAOYSA-N.png
paths <- list.files(here::here(), pattern = "*.png", full.names = TRUE)
kable(data.frame(id = id, picture = pandoc.image.return(paths)))
Created on 2020-05-06 by the reprex package (v0.3.0) (pictures work when code is run locally with PR #250, but they don't show up in correct order. If |
I have never worked with images as objects in R. Can all the common image formats (.png, .jpg, .giv, .svg) be imported as image objects? What class is that? Are these the raster images @Aariq mentioned in the above comment? Would I have to convert the images for that? Which function? Sorry, for all the questions, I'm a bit puzzled. |
Thank @Aariq for the comment. |
I have updated PR #250 to return (1) a file written to disk and (2) a data.frame containing the local path and the URL. What do you think? |
Working with images as objects:
We can get to
I like the idea of returning a data.frame! I think the query column would also be useful similarly to the |
Ok, cool. I really think adding dependencies such as magick and raster (?) is a bit too much for webchem. |
It's just very very polite:) |
Ok, now that we agree to write to disk, what function should we use? As always in R, there are many possibilities. I prefer httr:: style, just because it is similar to other qurl = "https://cactus.nci.nih.gov/chemical/structure/Triclosan/image?format=png&width=500&height=500&linewidth=2&symbolfontsize=16&csymbol=special&hsymbol=special"
# httr
require(httr)
h <- try(
GET(qurl,
timeout(5),
write_disk(tempfile(), overwrite = TRUE))
)
# base
download.file(qurl,
destfile = tempfile())
# curl
curl::curl_download(qurl,
destfile = tempfile()) |
I personally prefer httr, also because of the nice status messages that can be used for verbose output, but as long as it doesn't involve another dependency, I'd rather not search consensus here:) |
Sounds like consensus to me :) |
I think I've had a bit of change of heart on this issue. I think to keep things simple, image functions could return png files as R objects (with |
Thanks @Aariq for the comment! Would you store multiple outputs in an R list or directly plot them then? Agreeing:
My concerns:
Code example for the CIR image service comparing the two approaches# setup -------------------------------------------------------------------
require(httr)
query = c(
"25057-89-0", "72-43-5", "640-15-3", "82-68-8", "50471-44-8",
"13457-18-6", "94-82-6", "93-76-5", "15972-60-8", "834-12-8",
"1912-24-9", "2642-71-9", "314-40-9", "1689-84-5", "470-90-6",
"1698-60-8", "1982-47-4", "2921-88-2", "15545-48-9", "21725-46-2",
"52315-07-8", "6190-65-4", "30125-63-4", "1007-28-9", "1014-69-3",
"333-41-5", "120-36-5", "62-73-7", "115-32-2", "83164-33-4",
"60-51-5", "298-04-4", "330-54-1", "38260-54-7", "122-14-5",
"93-72-1", "67564-91-4", "55-38-9", "96525-23-4", "51235-04-2"
)
# Option 1: Store as R object ---------------------------------------------
img_l = list()
for (i in query) {
# url
qurl = paste("https://cactus.nci.nih.gov/chemical/structure",
i,
"image?format=png&width=700&height=700",
sep = '/')
# query
Sys.sleep(1.5)
res = GET(qurl,
timeout(5))
img = png::readPNG(res$content)
img_l[[i]] = img
message('Processed: ', i, ' (stored in list).') # ~ 8KB / image: object.size(img)
}
object.size(img_l) # already 470 MB in memory
# Option 2: Save local ----------------------------------------------------
for (i in query) {
# url
qurl = paste("https://cactus.nci.nih.gov/chemical/structure",
i,
"image?format=png&width=700&height=700",
sep = '/')
# query
Sys.sleep(1.5)
GET(qurl,
timeout(5),
write_disk(file.path(tempdir(), paste0(i, ".png")), overwrite = TRUE))
# ~ 8KB / image
message('Processed: ', i, ' (saved to disk).')
} This reflects just my opinion. If you @Aariq and @stitam prefer to including images as R objects, I can also live with it :) |
Just some ideas how others manage images: https://jcheminf.biomedcentral.com/articles/10.1186/s13321-019-0405-0. |
I am ok with removing
|
Hm, that sounds convincing @stitam and would also greatly reduce the RAM footprint. require(httr)
require(dplyr)
query = c(
"25057-89-0", "72-43-5", "640-15-3", "82-68-8", "50471-44-8",
"13457-18-6", "94-82-6", "93-76-5", "15972-60-8", "834-12-8",
"1912-24-9", "2642-71-9", "314-40-9", "1689-84-5", "470-90-6",
"1698-60-8", "1982-47-4", "2921-88-2", "15545-48-9", "21725-46-2",
"52315-07-8", "6190-65-4", "30125-63-4", "1007-28-9", "1014-69-3",
"333-41-5", "120-36-5", "62-73-7", "115-32-2", "83164-33-4",
"60-51-5", "298-04-4", "330-54-1", "38260-54-7", "122-14-5",
"93-72-1", "67564-91-4", "55-38-9", "96525-23-4", "51235-04-2"
)
# dummy function
fun_img <- function(query, directory = NULL) {
foo <- function(query, directory) {
qurl = paste("https://cactus.nci.nih.gov/chemical/structure",
query,
"image?format=png&width=700&height=700",
sep = '/')
# query
Sys.sleep(1.5)
if (is.null(directory)) {
message("Retrieving raw image: ", query)
res <- GET(qurl,
timeout(5))
content(res, type = "image")
} else {
message("Storing image: ", query, " to: ", directory)
GET(qurl,
timeout(5),
write_disk(file.path(directory, paste0(query, ".png")), # depending on the webservice, more options here
overwrite = TRUE))
data.frame(query = query,
qurl = qurl,
stringsAsFactors = FALSE)
}
}
l <- lapply(query, foo, directory = directory)
names(l) <- query
if (is.null(directory)) {
l
} else {
dplyr::bind_rows(l)
}
}
img_l = fun_img(query[1:2])
lapply(img_l, head)
img_df = fun_img(query[1:2], directory = tempdir())
img_df
|
Good points about RAM. It looks like |
|
I think we should implement the functions as downloaders for now, because that works both with web services that provide urls, and with web services that don't, e.g. ChemSpider. We can always extend that functionality later. |
We should make sure @gjgetzinger is looped in on this decision, since it probably changes some of our suggestions on PR #235 |
I've been following the discussion. No objections from my side. I'm happy to make any adjustments to PR #235 as needed. (full disclosure, I am in the middle of a job change and relocation, so will be a minute before I make any changes) |
Good luck with your new endeavor! I relocated during last August:) |
To sum up, we agree to download the images and return a data.frame with the paths to the downloaded images. If no directory is supplied they will be downloaded to Argument examples (* mandatory ones):
Should we name the directory argument |
If the output is just file paths, I'd say return it as a vector, silently. |
I'd also prefer data.frame because then we have query and result, similarly to |
I also thin it's the best idea to have a mandatory |
Do you think it is useful to return anything to the console, apart from the optional verbose messages? If we want to import the downloaded images into R later, we can easily construct the paths in the same order with |
Seems fine to me! Might be good to print the path and filename to the console with |
Some web services allow us to retrieve images of substances. There are already implementations in PR #235 and PR #247. Also ChemSpider and PubChem both offer such functionality. I thought it would be good to open a small discussion about design considerations to ensure consistency.
*_view()
,*_viewer()
,_*img()
,*_image()
. Let's choose one and stick to it. I prefer*_img()
, as proposed by @andschar, e.g.actor_img()
,pc_img()
.query
,from
,format
,width
,height
,verbose
, in this order?output = c('image', 'download')
as suggested by @gjgetzinger, and another forpath
.On first thought, I think I would prefer these functions to return a list of raw character vectors. I tried png, svg, jpeg, these can all be accessed e.g. through
httr::content(httr::GET(url), type = "image")
. Raw vectors can then be rendered easily with image processing packages likemagick
, e.g.plot(as.raster(image_read(raw)))
and also exported in any format, but we wouldn't have to import them as dependencies, so webchem could remain lightweight.What do you think?
The text was updated successfully, but these errors were encountered: