Web of Science Data Workshop

The UW-Madison Libraries offer a workshop that provides an overview of the data set and how to use it within UW-Madison’s Center for High Throughput Computing (CHTC).

This workshop is divided into three one hour sessions:

  1. An overview of the data set in which we write a few scripts that process it using the Python programming language.
  2. An overview of the Center for High Throughput Computing (CHTC).
  3. A tie-it-all-together session in which attendees will run jobs that process the Web of Science data using the CHTC’s high throughput cluster.

This workshop provides a guided tour of our Web of Science CHTC Tutorial.

Workshop Resources: Code & Data Samples

We have provided code and data samples that will be used in the workshop in a Google Shared Drive:

https://drive.google.com/drive/folders/1w62g97VP4svWx_rFxdhfsOj9_dd5Vc0H

Please note that you must login to this Google Shared Drive using your UW-Madison NetID username and password. The folder is not configured to allow logins using a personal Google account, such as a GMail login.

Web of Science Explorer

The UW-Madison Libraries maintain a small Python codebase that serves as reference implementation for working with the data set. The code must be downloaded and installed locally for some of the workshop code samples to be run on your computer. Access the codebase on GitHub:

https://github.com/UW-Madison-Library/wos-explorer

Installation

If you are comfortable using the git version control software you can clone the repository for local use.

$ git clone https://github.com/UW-Madison-Library/wos-explorer.git

If you are not familiar with git, it is possible to simply download a zipped version of the codebase. Click on the Code button on the repository web page and click on the Download Zip link.

After cloning or downloading a local version of the repository you will need to build the code for installation.

  1. Change into the root directory,
  2. Run the setup script to build the package from source, and
  3. Install the package using pip.
$ cd wos-explorer
$ python setup.py sdist
$ pip install dist/wos_explorer-0.8.0.tar.gz

Note that if you downloaded a zip file, the directory name will be wos-explorer-main.

This process also assumes that you have the Python package-management system pip and the package setuptools installed. Many Python installations will also install these utilities. If you see an error message while running the commands above first make sure that pip is installed. If it is not, see the installation instructions from the pip documentation website. Using pip, you can then install setuptools:

$ pip install setuptools

Prerequisites: Required Software & Development Environment

This workshop assumes you have basic experience working in a Linux/Unix command line environment and basic familiarity with the Python programming language. It is an intermediate level workshop that builds upon the skills learned, for example, in the Software Carpentry curriculum.

Terminal Program

Python scripts will be run from a Linux/Unix-style command line environment. This workshop does not use a Python environment like a Jupyter Notebook because attendees will need to use a terminal program to shell into the CHTC servers.

  • Mac: use the preinstalled Terminal app
  • Windows: install the Git for Windows suite, which will include the GitBash Unix emulator for Windows or the program PuTTY

Python 3

Any Python 3 installation should work as long as your terminal program can see the Python executable. The Anaconda suite provides a GUI installer for both Mac and Windows.

To run the code examples in this workshop you will also need a Python installation that includes the pip package-management system and the setuptools package. pip and setuptools are used to build a small code base the UW-Madison Libraries maintain from source and install it locally. See the Installation section under Web of Science Explorer above.

Programming Text Editor

Since we will not be using the same environment to edit and run our Python scripts, you will need a plain text editor. Ideally it should display line numbers for debugging and have syntax highlighting features. Microsoft’s Visual Studio Code is a great option for those who do not have one installed already.

Campus VPN Client

This will be required for connecting to CHTC servers.