Website Search
Find information on spaces, staff, and services.
Find information on spaces, staff, and services.
You will need to set up a working location on your local computer. This will be a folder/directory that will contain copies of the two primary code bases used in this tutorial. For the remainder of this tutorial we will assume that you have created a new folder called web-of-science-export within your computer’s Desktop folder. We will then refer to this location using the Unix-style path convention ~/Desktop/web-of-science-export.
If you are a Windows user, before cloning the wos-explorer, it is possible there will be some line feed encoding settings that can prevent your scripts from running correctly with the Unix-based CHTC systems and the Bash commands. To make sure your version of wos-explorer is using the correct encoding, run these two commands (you will not see anything happen, but if there is no error message it has likely worked):
git config --global core.autocrlf false
git config --global core.eol lf
On your computer, navigate in your terminal program to the project folder:
cd ~/Desktop/web-of-science-export
Clone the chtc-recipes project:
git clone https://github.com/UW-Madison-Library/chtc-recipes.git
We will only be using the recipe/project in the sub-folder wos-findbyexport.
The code in this CHTC Condor project in the wos-findbyexport project requires the use of a custom Python library developed by the Libraries called the Web of Science Explorer (TODO: insert link). You will need to clone the project from GitHub and build it as a distribution file to be included in the Condor project. Cloning the wos-explorer scripts and the creation of the tar file is a relatively quick and easy process.
While still in the web-of-science-export directory, next clone the wos-explorer project:
git clone https://github.com/UW-Madison-Library/wos-explorer.git
Next we will build this code base as a package so that it can be used by the CHTC recipe.
cd wos-explorer python setup.py sdist
A compressed tar file will appear in the dist directory in the new location and you can move it to the wos-findbyexport directory inside the chtc-recipes project:
cp dist/wos_explorer-0.2.1.tar.gz ~/Desktop/web-of-science-export/chtc-recipes/wos-findbyexport
To get started, log into the Web of Science database on the UW–Madison Library page.
To login to the CHTC server double check you are logged into the WiscVPN and then enter the credentials you received during your CHTC onboarding meeting:
ssh <netid>@submit-1.chtc.wisc.edu
Enter your password when prompted (it will not appear on the screen)
Once you are logged in you will see “CHTC” spelled out in oversized characters
Next make the directory where you will upload the wos-findbyexport scripts
mkdir wos-findbyexport
Type exit to return to your own computer.
Once back on your computer, navigate to the directory in which you have the wos-findbyexport scripts and use a secure FTP command to copy them to the CHTC submit server. A command like this will suffice:
scp -r ./* <netID>@submit-1.chtc.wisc.edu:/home/<netID>/wos-findbyexport
It will prompt you for your password
As it copies the files from your machine to the CHTC server they will print to the screen
You are now ready to run the jobs on the CHTC server. For your reference, here is the link to the CHTC instructions on starting your job on the CHTC server:
The first step is to login the submit server again, navigate to the wos-findbyexport directory you created and run the Condor submit command followed by the .dag file you want it to execute:
condor_submit_dag wos-findbywosexport.dag
The .dag file will schedule which jobs to run, so for now, all you need to do is check in on the process periodically to make sure it continues to run correctly.
There are several commands that allow you to check on the process, each with their own features. The most basic one is:
condor_q
The CHTC has created an extended guide on how to evaluate your jobs as they run using variations on the condor_q command. This guide lists the commands you can run to evaluate multiple aspects of the jobs as they run or to view them in certain formats to fit your needs.
Learning About Your Jobs Using condor_q
Once you are sure the process has completed, you can begin viewing the output files to check the results. You will be looking for the JSON output files located in the findbyexport
and findreferences
directories on the submit server.
Because the CHTC server is not intended for storage, it is best practice to download your output files and then remove them from the server. You can use any FTP program to do this. To fetch the files using the Unix secure copy utility, run the following command in your terminal window:
scp <netID>@submit-1.chtc.wisc.edu:/home/<netID>/wos-findbyexport/findbyexportfile/*.json .
After you have downloaded all the necessary files remove everything from the submit server.