Uploading Csv to Tmp Google App Engine
Five Ways to Schedule R scripts on Google Cloud Platform
Scheduling scripts communication
But first, some notes on the scripts you are scheduling, that I've picked up.
Don't save data to the scheduling server
I would propose to not save or employ data in the same place y'all are doing the scheduling. Use a service like BigQuery (bigQueryR
) or googleCloudStorageR (googleCloudStorageR
) to kickoff load whatsoever necessary data, do your work then save it out again. This may be a chip more complicated to gear up, but will save yous tears if the VM or service goes downwards - you lot still have your information.
To help with this, on Google Cloud you tin can authenticate with the same details y'all used to launch a VM to authenticate with the storage services above (as all are covered under the http://www.googleapis.com/auth/cloud-services
telescopic) - y'all tin access this auth when on a GCE VM in R via googleAuthR::gar_gce_auth()
An example skeleton script is shown beneath that may be something you are scheduling.
It downloads hallmark files, does an API telephone call, so saves it upward to the cloud once again:
library(googleAuthR) library(googleCloudStorageR) gcs_global_bucket("my-bucket") ## auth on the VM options(googleAuthR.scopes.selected = "https://world wide web.googleapis.com/auth/cloud-platform") gar_gce_auth() ## use the GCS auth to download the auth files for your API auth_file <- "auth/my_auth_file.json" gcs_get_object(auth_file, saveToDisk = Truthful) ## now auth with the file you merely download gar_auth_service(auth_file) ## exercise your work with APIs etc. ..... ## upload results back up to GCS (or BigQuery, etc.) gcs_upload(my_results, name = "results/my_results.csv")
Set up a schedule for logs too
Logs are important for scheduled jobs, then you have some idea on whats happened when things go wrong. To help with scheduling debugging, most googleAuthR
packages at present have a timestamp on their output messages.
You can ship the output of your scripts to log files, if using cron and RScript it looks something similar this:
RScript /your-r-script.R > your-r-script.log
…where >
sends the output to the new file.
Over time though, this can get big and (sigh) make full your deejay so you can't log in to the VM (speaking from experience here!) then I now fix up some other scheduled job that every week takes the logs and uploads to GCS, then deletes the current ones.
Consider using Docker for environments
Several of the methods beneath use Docker
.
The reasons for that is Docker provides a dainty reproducible way to define exactly what packages and dependencies you lot need for your script to run, which can run on top of any blazon of infrastructure equally Docker
has chop-chop become a cloud standard.
For instance, migrating from Google Deject to AWS is much easier if both can be deployed using Docker, and beneath Docker is instrumental in allowing y'all to run on multiple solutions.
Bear in mind that when a Docker container relaunches information technology won't save whatever data, so whatever non-saved country will be lost (you should make a new container if you need it to contain data), just you're not saving your information to the docker container anyhow, aren't you?
Scheduling options - Pros and cons
edit 2020-05-22 - I've added googleCloudRunner every bit an option, and this I think is the best i at the moment
Hither is an overview of the pros and cons of the options presented in more than detail beneath:
Method | Pros | Cons |
---|---|---|
ane. cronR Addin on RStudio Server | Simple and quick | Not and then robust, need to log into server to make changes, versioning packages. |
2. gce_vm_scheduler and Dockerfiles | Robust and can launch from local R session, support versioning | Need to build Dockerfiles, all scripts on one VM |
3. Master & Slave VM | Tailor a fresh VM for each script, cheaper | Need to build Dockerfiles, more complicated VM setup. |
4. Google AppEngine with flexible containers | Managed platform | Demand to turn script into web responses, more complicated setup |
five. googleCloudRunner R scheeduled scripts | Mod and serverless, about recommended | However need to know a fleck of Docker |
one - cronR plus RStudio Server
This is the simplest and the one to kickoff with.
- Start up an RStudio Server instance
- Install
cronR
- Upload your R script
- Schedule your script using
cronR
RStudio addin
With googleComputeEngineR
and the new gcer-public
projection containing public images that include one with cronR
already installed, this is as uncomplicated as the few lines of code below:
library(googleComputeEngineR) ## get the tag for prebuilt Docker image with googleAuthRverse, cronR and tidyverse tag <- gce_tag_container("google-auth-r-cron-tidy", project = "gcer-public") # gcr.io/gcer-public/google-auth-r-cron-tidy ## start a custom Rstudio example vm <- gce_vm(name = "my-rstudio", predefined_type = "n1-highmem-8", template = "rstudio", dynamic_image = tag, username = "me", countersign = "mypassword")
Wait for information technology to launch and requite you an IP, and so log in, upload a script and configure the schedule via the cronR
addin.
Some more than detail nearly this workflow can be found at these custom RStudio instance workflows on the googleComputeEngineR
website.
2- gce_vm_scheduler and Dockerfiles
This method I prefer to the above since it lets you create the exact environment (e.g. packet versions, dependencies) to run your script in, that you can trail dev and production versions with. It too works locally without needing to log into the server each time to deploy a script.
Recipe
Putting it all together then, documentation of this workflow for scheduling R scripts is found hither.
- If you don't already have one, start upwards a scheduler VM using
gce_vm_scheduler
- Create a Dockerfile either manually or using
containerit
that will run your script upon execution - Upload the Dockerfile to a git repo (private or public)
- Setup a build trigger for that Dockerfile
- In one case built, set a script to schedule inside that Dockerfile with
gce_schedule_docker
This is still in beta at fourth dimension of writing just should be stable by the time googlecomputeEngineR
hits CRAN 0.2.0
.
iii - Master and Slave VMs
Some scripts take more than resources than others, and since you are using VMs already yous can have more control over what specifications of VM to launch based on the script you want to run.
This ways y'all can accept a inexpensive scheduler server, that launch biggers VMs for the elapsing of the chore. As GCP charges per minute, this tin can salve you money over having a schedule server that is every bit big every bit what your almost expensive script needs running 24/7.
This method is largely like the scheduled scripts in a higher place, except in this case the scheduled script is also launching VMs to run the task upon.
Using googleCloudStorageR::gcs_source
you lot can run an R script direct from where information technology is hosted upon GCS, meaning all data, hallmark files and scripts can be kept seperate from the ciphering. An example master script is shown below:
## intended to be run on a small example via cron ## utilize this script to launch other VMs with more than expensive tasks library(googleComputeEngineR) library(googleCloudStorageR) gce_global_project("my-project") gce_global_zone("europe-west1-b") gcs_global_bucket("your-gcs-saucepan") ## auth to same project we're on googleAuthR::gar_gce_auth() ## launch the premade VM vm <- gce_vm("slave-1") ## set up SSH to utilize 'chief' username as configured before vm <- gce_ssh_setup(vm, username = "master", ssh_overwrite = TRUE) ## run the script on the VM that will source from GCS runme <- "Rscript -eastward \"googleAuthR::gar_gce_auth();googleCloudStorageR::gcs_source('download.R', saucepan = 'your-gcs-bucket')\"" out <- docker_cmd(vm, cmd = "exec", args = c("rstudio", runme), wait = TRUE) ## one time finished, finish the VM gce_vm_stop(vm)
More than detail is again bachelor at the googleComputeEngineR
website.
4 - Google App Engine with flexible custom runtimes
Google App Engine has always had schedule options, only merely for its supported languages of Python, Java, PHP etc. Now with the introduction of flexible containers, whatsoever Docker container running any language (including R) tin too be run.
This is potentially the all-time solution since information technology runs upon a 100% managed platform, meaning you don't need to worry about servers at all, and it takes care of things like server maintence, logging etc.
Setting upwardly your script for App Engine
There are some requirements for the container that demand configuring so information technology can run:
- You can non use
googleAuthR::gar_gce_auth()
then will need to upload the auth token within the Dockerfile.
- AppEngine expects a web service to exist listening on port 8080, and then your schedule script needs to be triggered via HTTP requests.
For authentication, I use the system environs arguments (i.due east. those unremarkably set in .Renviron
) that googleAuthR
packages employ for auto-authentication. Put the auth file (such as JSON or a .httr-oauth
file) into the deployment folder, and then bespeak to its location via specifying in the app.yaml
. Details below.
To solve the need for being a webservice on port 8080 (which is and then proxied to normal webports 80/443), plumber
is a neat service by Jeff Allen of RStudio, which already comes with its own Docker solution. Y'all can then modify that Dockerfile
slightly and then that it works on App Engine.
Recipe
To and then schedule your R script on app engine, follow the guide beneath, offset making sure yous have setup the gcloud CLI.
- Create a Google Appengine project in the Us region (only region that supports flexible containers at the moment)
- Create a scheduled script e.g.
schedule.R
- y'all tin use auth from environment files specified inapp.yaml
. - Make an API out of the script by using
plumber
- example:
library(googleAuthR) ## authentication library(googleCloudStorageR) ## google deject storage library(readr) ## ## gcs auto authenticated via environs file ## pointed to via sys.env GCS_AUTH_FILE #* @get /demoR demoScheduleAPI <- function(){ ## download or do something something <- tryCatch({ gcs_get_object("schedule/examination.csv", bucket = "mark-edmondson-public-files") }, error = office(ex) { NULL }) something_else <- data.frame(X1 = 1, fourth dimension = Sys.time(), blah = paste(sample(letters, x, replace = TRUE), plummet = "")) something <- rbind(something, something_else) tmp <- tempfile(fileext = ".csv") on.exit(unlink(tmp)) write.csv(something, file = tmp, row.names = FALSE) ## upload something gcs_upload(tmp, bucket = "mark-edmondson-public-files", name = "schedule/exam.csv") message("Done", Sys.time()) }
- Create Dockerfile. If using
containerit
then replace FROM withtrestletech/plumber
and add the beneath lines to use correct AppEngine port:
Example:
library(containerit) dockerfile <- dockerfile("schedule.R", re-create = "script_dir", soft = TRUE) write(dockerfile, file = "Dockerfile")
And then change/add these lines to the created Dockerfile:
EXPOSE 8080 ENTRYPOINT ["R", "-e", "pr <- plumber::plumb(commandArgs()[4]); pr$run(host='0.0.0.0', port=8080)"] CMD ["schedule.R"]
Case concluding Dockerfile beneath. This doesn't need to be built in say a build trigger every bit its built upon app engine deployment.
FROM trestletech/plumber Label maintainer="mark" RUN export DEBIAN_FRONTEND=noninteractive; apt-become -y update \ && apt-get install -y libcairo2-dev \ libcurl4-openssl-dev \ libgmp-dev \ libpng-dev \ libssl-dev \ libxml2-dev \ make \ pandoc \ pandoc-citeproc \ zlib1g-dev RUN ["install2.r", "-r 'https://deject.r-project.org'", "readr", "googleCloudStorageR", "Rcpp", "digest", "crayon", "withr", "mime", "R6", "jsonlite", "xtable", "magrittr", "httr", "curl", "testthat", "devtools", "hms", "shiny", "httpuv", "memoise", "htmltools", "openssl", "tibble", "remotes"] RUN ["installGithub.r", "MarkEdmondson1234/googleAuthR@7917351", "hadley/rlang@ff87439"] WORKDIR /payload/ Copy [".", "./"] EXPOSE 8080 ENTRYPOINT ["R", "-east", "pr <- plumber::plumb(commandArgs()[four]); pr$run(host='0.0.0.0', port=8080)"] CMD ["schedule.R"]
- Specify
app.yaml
for flexible containers as detailed here. Add any surroundings vars such as auth files, that volition be included in same deployment folder.
Example:
runtime: custom env: flex env_variables: GCS_AUTH_FILE: auth.json
- Specify
cron.yaml
for the schedule needed:
cron: - description: "test cron" url: /demoR schedule: every 1 hours
-
You lot should now have these files in the deployment folder:
-
app.yaml
- configuration of general app settings -
auth.json
- an authentication file specified in env arguments or app.yaml -
cron.yaml
- specification of when your scheduling is -
Dockerfile
- specification of the environment -
schedule.R
- the plumber version of your script containing your endpoints
-
Open the terminal in that binder, and deploy via gcloud app deploy --project your-project
and the cron schedule via gcloud app deploy cron.yaml --project your-project
.
Information technology volition take a while (up to ten mins) the first fourth dimension.
- The App Engine should then be deployed on https://your-projection.appspot.com/ - every
GET
request to https://your-project.appspot.com/demoR (or other endpoints you have specified in R script) will run the R code. The cron example above will run every 60 minutes to this endpoint.
Logs for the instance are found here.
This arroyo is the almost flexible, and offers a fully managed platform for your scripts. Scheduled scripts are only the get-go, since deploying such actually gives you lot a way to run R scripts in response to any HTTP request from any language - triggers could also include if someone updates a spreadsheet, adds a file to a folder, pushes to GitHub etc. which opens up a lot of heady possibilities. You can likewise scale it up to become a fully performance R API.
v - googleCloudRunner
This is at present possible from 2020 as googleCloudRunner is at present on CRAN. This is now what I use solar day to day.
Its the culmination of the techniques described above, and attempts to automate most of the painful bits above. Its using serverless services such as Cloud Build to build yous R script.
If you lot desire to use base R it is as simple equally:
library(googleCloudRunner) cr_deploy_r("your-script.R", schedule = "v 15 * * *")
If you want to use your own R packages then brand a Dockerfile with those packages within them, and supply it to the function:
cr_deploy_r("your-script.R", schedule = "5 15 * * *", r_image = "gcr.io/gcer-public/googleauthr-poetry")
There is likewise an RStudio gadget with a sorcerer to assistance you schedule them. Get started by going through resource and the employ cases listed on the website.
Summary
Hopefully this has given y'all an idea on your options for R on Google Deject regarding scheduling. If you have some other easier workflows or suggestions for improvements delight put them in the comments beneath!
rhodessubbeirie95.blogspot.com
Source: https://code.markedmondson.me/4-ways-schedule-r-scripts-on-google-cloud-platform/
0 Response to "Uploading Csv to Tmp Google App Engine"
Post a Comment