Step-by-Step Tutorial

Step 1: Create a Working Directory on Your Computer

You will need to set up a working location on your local computer. This will be a folder/directory that will contain copies of the two primary code bases used in this tutorial. For the remainder of this tutorial we will assume that you have created a new folder called web-of-science-export within your computer’s Desktop folder. We will then refer to this location using the Unix-style path convention ~/Desktop/web-of-science-export.

Attention Windows Users

If you are a Windows user, before cloning the wos-explorer, it is possible there will be some line feed encoding settings that can prevent your scripts from running correctly with the Unix-based CHTC systems and the Bash commands. To make sure your version of wos-explorer is using the correct encoding, run these two commands (you will not see anything happen, but if there is no error message it has likely worked):

git config --global core.autocrlf false
git config --global core.eol lf

Step 2: Clone the CHTC Recipes Project

On your computer, navigate in your terminal program to the project folder:

cd ~/Desktop/web-of-science-export

Clone the chtc-recipes project:

git clone

We will only be using the recipe/project in the sub-folder wos-findbyexport.

Step 3: Clone & Build the Web of Science Explorer Package

The code in this CHTC Condor project in the wos-findbyexport project requires the use of a custom Python library developed by the Libraries called the Web of Science Explorer (TODO: insert link). You will need to clone the project from GitHub and build it as a distribution file to be included in the Condor project. Cloning the wos-explorer scripts and the creation of the tar file is a relatively quick and easy process.

While still in the web-of-science-export directory, next clone the wos-explorer project:

git clone

Next we will build this code base as a package so that it can be used by the CHTC recipe.

cd wos-explorer
python sdist

A compressed tar file will appear in the dist directory in the new location and you can move it to the wos-findbyexport directory inside the chtc-recipes project:

cp dist/wos_explorer-0.2.1.tar.gz ~/Desktop/web-of-science-export/chtc-recipes/wos-findbyexport

Step 4: Preparing Your Input Data: savedrecs.txt

To get started, log into the Web of Science database on the UW–Madison Library page.

  1. Perform your search
  2. At the top of the results you will see a button that says “Export”. From the menu select “Fast 5K”
  3. In the window that appears select “Tab Delimited (Mac)” from the File Format dropdown. Even if you are using a Windows operating system, you should select the Mac option so that the file uses Unix style line endings as this file will ultimately be used in a Linux computing environment within the CHTC servers.
  4. A file named savedrecs.txt will automatically download. Move that file to the ~/Desktop/web-of-science-export/chtc-recipes/wos-findbyexport directory on your computer.

Step 5: Prepare a Working Directory on the CHTC Server

To login to the CHTC server double check you are logged into the WiscVPN and then enter the credentials you received during your CHTC onboarding meeting:

ssh <netid>

Enter your password when prompted (it will not appear on the screen)

Once you are logged in you will see “CHTC” spelled out in oversized characters

Next make the directory where you will upload the wos-findbyexport scripts

mkdir wos-findbyexport

Type exit to return to your own computer.

Step 6: Uploading wos-findbyexport to the CHTC Server

Once back on your computer, navigate to the directory in which you have the wos-findbyexport scripts and use a secure FTP command to copy them to the CHTC submit server. A command like this will suffice:

scp -r ./* <netID><netID>/wos-findbyexport

It will prompt you for your password

As it copies the files from your machine to the CHTC server they will print to the screen

Step 7: Run the Jobs on the CHTC Server

You are now ready to run the jobs on the CHTC server. For your reference, here is the link to the CHTC instructions on starting your job on the CHTC server:

Running Your First CHTC Jobs

The first step is to login the submit server again, navigate to the wos-findbyexport directory you created and run the Condor submit command followed by the .dag file you want it to execute:

condor_submit_dag wos-findbywosexport.dag

The .dag file will schedule which jobs to run, so for now, all you need to do is check in on the process periodically to make sure it continues to run correctly.

There are several commands that allow you to check on the process, each with their own features. The most basic one is:


The CHTC has created an extended guide on how to evaluate your jobs as they run using variations on the condor_q command. This guide lists the commands you can run to evaluate multiple aspects of the jobs as they run or to view them in certain formats to fit your needs.

Learning About Your Jobs Using condor_q

Step 8: Download the Output Data

Once you are sure the process has completed, you can begin viewing the output files to check the results. You will be looking for the JSON output files located in the findbyexport and findreferences directories on the submit server.

Because the CHTC server is not intended for storage, it is best practice to download your output files and then remove them from the server. You can use any FTP program to do this. To fetch the files using the Unix secure copy utility, run the following command in your terminal window:

scp <netID><netID>/wos-findbyexport/findbyexportfile/*.json .

After you have downloaded all the necessary files remove everything from the submit server.