Title: | Interact with 'Condor' from R via SSH |
---|---|
Description: | Interact with 'Condor' from R via SSH connection. Files are first uploaded from user machine to submitter machine, and the job is then submitted from the submitter machine to 'Condor'. Functions are provided to submit, list, and download 'Condor' jobs from R. 'Condor' is an open source high-throughput computing software framework for distributed parallelization of computationally intensive tasks. |
Authors: | Arni Magnusson [aut, cre], Nan Yao [aut], Jemery Day [ctb], Thomas Teears [ctb] |
Maintainer: | Arni Magnusson <[email protected]> |
License: | GPL-3 |
Version: | 3.0.0 |
Built: | 2024-11-12 03:24:07 UTC |
Source: | https://github.com/pacificcommunity/ofp-sam-condor |
Interact with Condor from R via SSH connection. Files are first uploaded from user machine to submitter machine, and the job is then submitted from the submitter machine to Condor. Functions are provided to submit, list, and download Condor jobs from R.
Condor is an open source high-throughput computing software framework for distributed parallelization of computationally intensive tasks.
Main interface:
condor_submit |
submit |
condor_q |
list queue |
condor_dir |
list directories |
condor_download |
download |
Stop and remove:
condor_rm |
stop jobs |
condor_rmdir |
remove directories |
Utilities:
condor_log |
show log file |
dos2unix |
convert line endings |
summary.condor_log |
show log file summary |
ssh_exec_stdout |
execute command |
unix2dos |
convert line endings |
Arni Magnusson and Nan Yao, with contributions by Jemery Day and Thomas Teears.
https://github.com/PacificCommunity/ofp-sam-condor
condor uses the ssh package to connect to the Condor submitter machine.
List Condor run directories, either on submitter machine or on a local drive.
condor_dir(top.dir = "condor", local.dir = NULL, pattern = "*", report = TRUE, sort = "job.id", session = NULL, ...)
condor_dir(top.dir = "condor", local.dir = NULL, pattern = "*", report = TRUE, sort = "job.id", session = NULL, ...)
top.dir |
top directory on submitter machine that contains Condor run directories. |
local.dir |
local directory to examine instead of |
pattern |
regular expression identifying which run directories to show.
The default is to show all directories inside |
report |
whether to return a detailed report of the run status in each directory. |
sort |
column name or column number used to sort the report data frame. |
session |
optional object of class |
... |
passed to |
If the user passes top.dir
that resembles a Windows local directory
(drive letter, colon, forward slash), it is automatically interpreted as a
local.dir
. In other words, condor_dir("c:/myruns")
and
condor_dir(local.dir="c:/myruns")
are equivalent.
The default value of session = NULL
looks for a session
object
in the user workspace. This allows the user to run Condor functions without
explicitly specifying the session
.
A data frame containing details about each directory, or if
report = FALSE
a character
vector of directory names.
If there are many Condor run directories, the report generation can take
substantial time (one SSH execution per run directory). To quickly return a
vector of directory names, pass report = FALSE
.
Arni Magnusson.
condor_submit
, condor_q
, condor_dir
, and
condor_download
provide the main Condor interface.
condor_rm
stops Condor jobs and condor_rmdir
removes directories on the submitter machine.
condor_log
and summary.condor_log
are called to
produce the detailed report if report = TRUE
.
condor-package
gives an overview of the package.
## Not run: # General workflow session <- ssh_connect("servername") condor_submit() condor_q() condor_dir() condor_download() # after job has finished # Alternatively, examine runs on local drive condor_dir(local.dir="myruns") condor_dir("c:/myruns") ## End(Not run)
## Not run: # General workflow session <- ssh_connect("servername") condor_submit() condor_q() condor_dir() condor_download() # after job has finished # Alternatively, examine runs on local drive condor_dir(local.dir="myruns") condor_dir("c:/myruns") ## End(Not run)
Download results from a Condor job.
condor_download(run.dir = NULL, local.dir = ".", top.dir = "condor", create.dir = FALSE, pattern = "End.tar.gz|condor.*(err|log|out)$", overwrite = FALSE, untar.end = TRUE, session = NULL)
condor_download(run.dir = NULL, local.dir = ".", top.dir = "condor", create.dir = FALSE, pattern = "End.tar.gz|condor.*(err|log|out)$", overwrite = FALSE, untar.end = TRUE, session = NULL)
run.dir |
name of a Condor run directory inside |
local.dir |
local directory to download to. |
top.dir |
top directory on submitter machine that contains Condor run directories. |
create.dir |
whether to create |
pattern |
regular expression identifying which result files to download.
Passing |
overwrite |
whether to overwrite local files if they already exist. |
untar.end |
whether to extract |
session |
optional object of class |
The default value of run.dir = NULL
looks for Condor job results in
top.dir/
local.dir. For example, if
local.dir = "c:/yft/run01"
then the default run.dir
becomes
"condor/run01"
.
The default value of pattern="End.tar.gz|condor.*(err|log|out)$"
downloads End.tar.gz
and Condor log files. For many analyses, it can
be convenient to pack all results into End.tar.gz to make it easy to find,
download, and manage output files.
The default value of session = NULL
looks for a session
object
in the user workspace. This allows the user to run Condor functions without
explicitly specifying the session
.
No return value, called for side effects.
Arni Magnusson.
condor_submit
, condor_q
,
condor_dir
, and condor_download
provide the main Condor
interface.
condor_rm
stops Condor jobs and condor_rmdir
removes directories on the submitter machine.
condor-package
gives an overview of the package.
## Not run: # General workflow session <- ssh_connect("servername") condor_submit() condor_q() condor_dir() condor_download() # after job has finished # Alternatively, download specific run to specific folder condor_download("01_this_model", "c:/myruns/01_this_model") ## End(Not run)
## Not run: # General workflow session <- ssh_connect("servername") condor_submit() condor_q() condor_dir() condor_download() # after job has finished # Alternatively, download specific run to specific folder condor_download("01_this_model", "c:/myruns/01_this_model") ## End(Not run)
Show Condor log file from a run directory, either on submitter machine or on a local drive.
condor_log(run.dir = ".", top.dir = "condor", local.dir = NULL, session = NULL)
condor_log(run.dir = ".", top.dir = "condor", local.dir = NULL, session = NULL)
run.dir |
name of a Condor run directory inside |
top.dir |
top directory on submitter machine that contains Condor run directories. |
local.dir |
local directory to examine instead of
top.dir |
session |
optional object of class |
The default value of session = NULL
looks for a session
object
in the user workspace. This allows the user to run Condor functions without
explicitly specifying the session
.
Log file contents as an object of class condor_log
.
The condor_log
class is simply a "character"
vector with a
print.condor_log
method.
Arni Magnusson.
summary.condor_log
shows Condor log file summary.
condor_dir
lists Condor directories.
condor-package
gives an overview of the package.
## Not run: # Examine log files on submitter machine session <- ssh_connect("servername") condor_dir() condor_log() summary(condor_log()) # Alternatively, examine log file on local drive condor_dir(local.dir="c:/myruns") condor_log(local.dir="c:/myruns/01_this_model") summary(condor_log(local.dir="c:/myruns/01_this_model")) ## End(Not run)
## Not run: # Examine log files on submitter machine session <- ssh_connect("servername") condor_dir() condor_log() summary(condor_log()) # Alternatively, examine log file on local drive condor_dir(local.dir="c:/myruns") condor_log(local.dir="c:/myruns/01_this_model") summary(condor_log(local.dir="c:/myruns/01_this_model")) ## End(Not run)
List the Condor job queue.
condor_q(all = FALSE, count = FALSE, global = FALSE, user = "", session = NULL) condor_qq(all = TRUE, count = TRUE, global = TRUE, user = "", session = NULL)
condor_q(all = FALSE, count = FALSE, global = FALSE, user = "", session = NULL) condor_qq(all = TRUE, count = TRUE, global = TRUE, user = "", session = NULL)
all |
whether to list jobs from all users. |
count |
whether to only show the number of jobs. |
global |
whether to list jobs submitted from all submitter machines. |
user |
username to list jobs submitted by a given user. |
session |
optional object of class |
The default value of session = NULL
looks for a session
object
in the user workspace. This allows the user to run Condor functions without
explicitly specifying the session
.
Screen output from the condor_q
shell command, or a table if
count = TRUE
.
The condor_q
R function has the same defaults as the
condor_q
shell command, listing only jobs that were submitted by
the current user from the current submitter machine.
The condor_qq
alternative is the same function but with different
default argument values, convenient for a quick overview of the
queue.
Arni Magnusson.
condor_submit
, condor_q
, condor_dir
, and
condor_download
provide the main Condor interface.
condor_rm
stops Condor jobs and condor_rmdir
removes directories on the submitter machine.
condor-package
gives an overview of the package.
## Not run: # General workflow session <- ssh_connect("servername") condor_submit() condor_q() condor_dir() condor_download() # after job has finished # Alternatively, list number of jobs being run by each user condor_q(all=TRUE, count=TRUE) ## End(Not run)
## Not run: # General workflow session <- ssh_connect("servername") condor_submit() condor_q() condor_dir() condor_download() # after job has finished # Alternatively, list number of jobs being run by each user condor_q(all=TRUE, count=TRUE) ## End(Not run)
Stop Condor jobs.
condor_rm(job.id = NULL, all = FALSE, top.dir = "condor", session = NULL)
condor_rm(job.id = NULL, all = FALSE, top.dir = "condor", session = NULL)
job.id |
a vector of integers or directory names, indicating Condor jobs to stop. |
all |
whether to stop all Condor jobs owned by user. |
top.dir |
top directory on submitter machine that contains Condor run directories. |
session |
optional object of class |
The top.dir
argument only has an effect when job.id
is a vector
of directory names. For example, condor_rm("01_this")
will stop the
Condor job corresponding to directory condor/01_this
on the submitter
machine.
The default value of session = NULL
looks for a session
object
in the user workspace. This allows the user to run Condor functions without
explicitly specifying the session
.
No return value, called for side effects.
Nan Yao and Arni Magnusson.
condor_submit
, condor_q
,
condor_dir
, and condor_download
provide the main
Condor interface.
condor_rm
stops Condor jobs and condor_rmdir
removes
directories on the submitter machine.
condor-package
gives an overview of the package.
## Not run: # General workflow session <- ssh_connect("servername") condor_submit() condor_q() condor_dir() condor_download() # after job has finished # Stop one or multiple jobs condor_rm(123456) # stop one job (integer) condor_rm(c(123456, 123789)) # stop two jobs (integers) condor_rm("01_this") # stop one job (dirname) condor_rm(c("01_this", "02_that")) # stop two jobs (dirnames) condor_rm(all=TRUE) # stop all jobs ## End(Not run)
## Not run: # General workflow session <- ssh_connect("servername") condor_submit() condor_q() condor_dir() condor_download() # after job has finished # Stop one or multiple jobs condor_rm(123456) # stop one job (integer) condor_rm(c(123456, 123789)) # stop two jobs (integers) condor_rm("01_this") # stop one job (dirname) condor_rm(c("01_this", "02_that")) # stop two jobs (dirnames) condor_rm(all=TRUE) # stop all jobs ## End(Not run)
Remove directories on the submitter machine.
condor_rmdir(run.dir, top.dir = "condor", quiet = FALSE, session = NULL)
condor_rmdir(run.dir, top.dir = "condor", quiet = FALSE, session = NULL)
run.dir |
name of a Condor run directory inside |
top.dir |
top directory on submitter machine that contains Condor run directories. |
quiet |
whether to suppress messages. |
session |
optional object of class |
The default value of session = NULL
looks for a session
object
in the user workspace. This allows the user to run Condor functions without
explicitly specifying the session
.
No return value, called for side effects.
Arni Magnusson.
condor_submit
, condor_q
,
condor_dir
, and condor_download
provide the main
Condor interface.
condor_rm
stops Condor jobs and condor_rmdir
removes
directories on the submitter machine.
condor-package
gives an overview of the package.
## Not run: # General workflow session <- ssh_connect("servername") condor_submit() condor_q() condor_dir() condor_download() # after job has finished # Remove one or more directories condor_rmdir("01_this") # remove ~/condor/01_this (one run) condor_rmdir(c("01_this", "02_that")) # remove two model runs inside condor condor_rmdir("test_runs", top.dir=".") # remove ~/my_runs (many subdirs) ## End(Not run)
## Not run: # General workflow session <- ssh_connect("servername") condor_submit() condor_q() condor_dir() condor_download() # after job has finished # Remove one or more directories condor_rmdir("01_this") # remove ~/condor/01_this (one run) condor_rmdir(c("01_this", "02_that")) # remove two model runs inside condor condor_rmdir("test_runs", top.dir=".") # remove ~/my_runs (many subdirs) ## End(Not run)
Submit a Condor job.
condor_submit(local.dir = ".", run.dir = NULL, top.dir = "condor", unix = "\\.sh$", exclude = "condor_mfcl|tar.gz|End", session = NULL)
condor_submit(local.dir = ".", run.dir = NULL, top.dir = "condor", unix = "\\.sh$", exclude = "condor_mfcl|tar.gz|End", session = NULL)
local.dir |
local directory containing a Condor |
run.dir |
name of a Condor run directory to create inside
|
top.dir |
top directory on submitter machine that contains Condor run directories. |
unix |
pattern identifying files in |
exclude |
pattern identifying files in |
session |
optional object of class |
The default value of run.dir = NULL
runs the Condor job in
top.dir/
local.dir. For example, if
local.dir = "c:/yft/run01"
then the default run.dir
becomes
"condor/run01"
.
It can be practical to organize Condor runs inside the default
top.dir = "condor"
directory, to keep Condor runs separate from other
directories inside the user home. To organize Condor runs directly in the
home folder on the submitter machine, pass top.dir = ""
.
The default value of unix = "\.sh$"
ensures that shell scripts with a
‘.sh’ file extension have Unix line endings. Pass FALSE
to
disable conversion of line endings.
The default value of session = NULL
looks for a session
object
in the user workspace. This allows the user to run Condor functions without
explicitly specifying the session
.
Remote directory name with the job id as a name attribute.
This function performs two core tasks: (1) upload files from local.dir
to submitter machine, and (2) execute shell command condor_submit
on submitter machine to launch the Condor job.
Arni Magnusson.
condor_submit
, condor_q
, condor_dir
, and
condor_download
provide the main Condor interface.
condor_rm
stops Condor jobs and condor_rmdir
removes directories on the submitter machine.
dos2unix
converts line endings.
condor-package
gives an overview of the package.
## Not run: # General workflow session <- ssh_connect("servername") condor_submit() condor_q() condor_dir() condor_download() # after job has finished # Alternatively, submit a specific run condor_submit("c:/myruns/01_this_model") ## End(Not run)
## Not run: # General workflow session <- ssh_connect("servername") condor_submit() condor_q() condor_dir() condor_download() # after job has finished # Alternatively, submit a specific run condor_submit("c:/myruns/01_this_model") ## End(Not run)
Convert line endings in a text file between Dos (CRLF) and Unix (LF) format.
dos2unix(file, force = FALSE) unix2dos(file, force = FALSE)
dos2unix(file, force = FALSE) unix2dos(file, force = FALSE)
file |
a filename. |
force |
whether to proceed with the conversion when the file is not a standard text file. |
The default value of force = FALSE
is a safety feature that can avoid
corrupting files that are not standard text files, such as binary files. A
standard text file is one that can be read using readLines
without producing warnings.
No return value, called for side effects.
Arni Magnusson.
condor_submit
calls dos2unix
to convert the line endings
of shell scripts.
condor-package
gives an overview of the package.
## Not run: file <- "test.txt" write("123", file) dos2unix(file) file.size(file) unix2dos(file) file.size(file) file.remove(file) ## End(Not run)
## Not run: file <- "test.txt" write("123", file) dos2unix(file) file.size(file) unix2dos(file) file.size(file) file.remove(file) ## End(Not run)
Call ssh_exec_internal
and convert the standard output to characters.
ssh_exec_stdout(command, session = NULL, ...)
ssh_exec_stdout(command, session = NULL, ...)
command |
command or script to execute. |
session |
optional object of class |
... |
passed to |
The default value of session = NULL
looks for a session
object
in the user workspace. This allows the user to run Condor functions without
explicitly specifying the session
.
A "character"
vector containing the standard output.
Arni Magnusson.
ssh_exec_wait
runs a command or script and shows the
standard output in the R console, while returning the exit status.
ssh_exec_internal
runs a command or script and buffers the
standard output into a raw vector.
condor-package
gives an overview of the package.
## Not run: session <- ssh_connect("servername") ssh_exec_wait(session, "ls") # returns 0 ssh_exec_internal(session, "ls")$stdout # returns a raw vector ssh_exec_stdout("ls") # returns directory names ## End(Not run)
## Not run: session <- ssh_connect("servername") ssh_exec_wait(session, "ls") # returns 0 ssh_exec_internal(session, "ls")$stdout # returns a raw vector ssh_exec_stdout("ls") # returns directory names ## End(Not run)
Produce a summary of a Condor log file.
## S3 method for class 'condor_log' summary(object, ...)
## S3 method for class 'condor_log' summary(object, ...)
object |
an object of class |
... |
passed to |
Data frame with the following columns:
job.id |
job id. |
status |
text indicating whether job status is submitted, executing, aborted, or finished. |
submit.time |
date and time when job was submitted. |
runtime |
total duration of a job. |
disk |
disk space used by job (MB). |
memory |
memory used by job (MB). |
Arni Magnusson.
condor_log
shows Condor log file.
condor-package
gives an overview of the package.
## Not run: # Examine log files on submitter machine session <- ssh_connect("servername") condor_dir() condor_log() summary(condor_log()) #' # Alternatively, examine log files on local drive condor_dir(local.dir="c:/myruns") condor_log(local.dir="c:/myruns/01_this_model") summary(condor_log(local.dir="c:/myruns/01_this_model")) ## End(Not run)
## Not run: # Examine log files on submitter machine session <- ssh_connect("servername") condor_dir() condor_log() summary(condor_log()) #' # Alternatively, examine log files on local drive condor_dir(local.dir="c:/myruns") condor_log(local.dir="c:/myruns/01_this_model") summary(condor_log(local.dir="c:/myruns/01_this_model")) ## End(Not run)