11-publish-work.Rmd

---
title: "Share and Production"
output: html_notebook
---
## 11.1 - Publish dashboard

1. Open the dashboard `app.R` file

2. Click on File

3. Click on Publish

4. Connect Account click Next

5. Select RStudio Connect

6. Copy and paste **your** RStudio Server URL and add `/rsconnect`

7. Enter your credentials

8. Complete the form

9. Click Proceed

10. Click on Connect

11. Click Publish

## 11.2 - Schedule scoring

1. Create a new RMarkdown

2. Start the new RMarkdown by loading all the needed libraries, connecting to the DB and setting `table_flights`
```{r, eval = FALSE}
library(tidyverse)
library(dbplyr)
library(tidypredict)
library(DBI)
library(lubridate)
con <- DBI::dbConnect(odbc::odbc(), "Postgres Dev")
table_flights <- tbl(con, in_schema("datawarehouse", "flight"))

```

3. Read the parsed model saved in exercise 5.6
```{r}
parsedmodel <- yaml::read_yaml("my_model.yml")
```

4. Copy the code from exercise 5.5 step 4. Load the code into a variable called *predictions*.  Change the model variable to *parsedmodel*
```{r}
predictions <- table_flights %>%
  filter(month == 2,
         dayofmonth == 1) %>%
    mutate(
    season = case_when(
      month >= 3 & month <= 5  ~ "Spring",
      month >= 6 & month <= 8  ~ "Summmer",
      month >= 9 & month <= 11 ~ "Fall",
      month == 12 | month <= 2  ~ "Winter"
    )
  ) %>%
  select( season, depdelay) %>%
  tidypredict_to_column(parsedmodel) %>%
  remote_query()
```

5. Change the `select()` verb to include `flightid`, and rename to `p_flightid` 
```{r}
select(p_flightid = flightid, season, depdelay) %>%
```


6. Append to the end, the SQL code needed to run the update inside the database
```{r}
update_statement <- build_sql(
  "UPDATE datawarehouse.flight SET nasdelay = fit FROM (",
  predictions,
  ") as p ",
  "WHERE flightid = p_flightid",
  con = con
)
dbSendQuery(con, update_statement)
```

7. `knit` the document to confirm it works

8. Click on File and then Publish

9. Select *Publish just this document*.  Confirm that the `parsemodel.csv` file is included in the list of files that are to be published.

10. In RStudio Connect, select `Schedule`

11. Click on `Schedule output for default`

12. Click on `Run every weekday (Monday to Friday)`

13. Click Save

## 11.3 - Scheduled pipeline

1. Create a new **RMarkdown** document

2. Copy the code from the **Class catchup** section in Spark Pipeline, unit 8
```{r}
library(tidyverse)
library(sparklyr)
library(lubridate)
top_rows <- read.csv("/usr/share/flights/data/flight_2008_1.csv", nrows = 5)
file_columns <- top_rows %>%
  rename_all(tolower) %>%
  map(function(x) "character")
conf <- spark_config()
conf$`sparklyr.cores.local` <- 4
conf$`sparklyr.shell.driver-memory` <- "8G"
conf$spark.memory.fraction <- 0.9
sc <- spark_connect(master = "local", config = conf, version = "2.0.0")
spark_flights <- spark_read_csv(
  sc,
  name = "flights",
  path = "/usr/share/flights/data/",
  memory = FALSE,
  columns = file_columns,
  infer_schema = FALSE
)
```

3. Move the *saved_model* folder under */tmp*

4. Copy all the code from exercise 8.3 starting with step 2
```{r, eval = FALSE}
reload <- ml_load(sc, "saved_model")
reload
library(lubridate)
current <- tbl(sc, "flights") %>%
  filter(
    month == !! month(now()),
    dayofmonth == !! day(now())
  )
show_query(current)
head(current)
new_predictions <- ml_transform(
  x = reload,
  dataset = current
)
new_predictions %>%
  summarise(late_fligths = sum(prediction, na.rm = TRUE))
```

5. Change the `ml_load()` location to `"/tmp/saved_model"`

6. Close the Spark session
```{r}
spark_disconnect(sc)
```

7. `knit` the document to confirm it works

8. Click on File and then Publish

9. Select *Publish just this document*

10. Click *Publish anyway* on the warning

11. In RStudio Connect, select `Schedule`

12. Click on `Schedule output for default`

13. Click on `Run every weekday (Monday to Friday)`

14. Click Save


## 11.4 - Scheduled re-fitting

1. Create a new **RMarkdown** document

2. Copy the code from the **Class catchup** section in Spark Pipeline, unit 8
```{r}
library(tidyverse)
library(sparklyr)
library(lubridate)
top_rows <- read.csv("/usr/share/flights/data/flight_2008_1.csv", nrows = 5)
file_columns <- top_rows %>%
  rename_all(tolower) %>%
  map(function(x) "character")
conf <- spark_config()
conf$`sparklyr.cores.local` <- 4
conf$`sparklyr.shell.driver-memory` <- "8G"
conf$spark.memory.fraction <- 0.9
sc <- spark_connect(master = "local", config = conf, version = "2.0.0")
spark_flights <- spark_read_csv(
  sc,
  name = "flights",
  path = "/usr/share/flights/data/",
  memory = FALSE,
  columns = file_columns,
  infer_schema = FALSE
)
```

3. Move the *saved_pipeline* folder under */tmp*

4. Copy all the code from exercise 8.4 
```{r}
pipeline <- ml_load(sc, "/tmp/saved_pipeline")
pipeline
sample <- tbl(sc, "flights") %>%
  sample_frac(0.001) 
new_model <- ml_fit(pipeline, sample)
new_model
ml_save(new_model, "new_model", overwrite = TRUE)
list.files("new_model")
spark_disconnect(sc)
```

5. Change the `ml_load()` location to `"/tmp/saved_pipeline"`

8. `knit` the document to confirm it works

9. Click on File and then Publish

10. Select *Publish just this document*

11. Click *Publish anyway* on the warning

12. In RStudio Connect, select `Schedule`

13. Click on `Schedule output for default`

14. On the *Schedule Type* dropdown, select *Monthly*

15. Click Save