Web of Science – A Dataset for Academic Research and Data Science

November 16, 2020

**Updated January 2022** (Original publication November 2020)

The UW-Madison Libraries have long had a subscription to the Web of Science database that tracks the citation patterns which demonstrate the major impact UW-Madison’s research community has on scholarship. However, in the past few years, this subscription has been enhanced with access to the raw data that sits behind the Web of Science user interface. This data set is licensed so that UW-Madison faculty, staff, and students can analyze citation patterns in their data science projects, thereby enabling an unprecedented look at how scholarly citation patterns and networks have evolved over more than a century. Learn more below!

The Web of Science is a platform consisting of several literature search databases designed to support scientific and scholarly research. “The primary point to emphasize is this data set that describes the scholarly literature includes data points that allow one to trace how one article cites others,” says Stephen Meyer, Data Strategist Librarian. “It makes it an excellent source for doing citation analysis and bibliometric research.” 

This allows for scholars and researchers to see all of the citations referenced in articles. It acts as an explicit linkage between papers with particular points in common; a never-ending web of knowledge. 

Though like many, databases of this size have not come without its challenges. “The primary challenge with this data set is its size. Even though our personal computers have enormous hard drives by historical standards, this data set, though it is only text-based metadata, stretches the limits of what can be processed on a personal computer. This is especially true when conducting an analysis that needs to search through or analyze all citation records, which date back to the start of the 20th century,” Meyer shares. “For this reason, it has been a wonderful opportunity to have access to large scale computing resources at the UW-Madison Center for High Throughput Computing (CHTC). The collaboration with campus colleagues at the CHTC provided the infrastructure and opportunity for experimenting with the data ourselves and for our patrons.”

In time, Meyer hopes that access to this data set will facilitate data science education and research projects. By diving into a subset of the data, one can find a manageable data set to teach the techniques used for network/graph analysis. He also hopes that researchers will be able to use the data set to develop new models that utilize both the graph/network connections afforded by the citation data and the textual descriptions of research found in the title, abstract and subject data that describes these articles.

The UW-Madison Libraries share these hopes with Meyer and encourage users to utilize the Web of Science database for their academic research needs. 

To learn more about the Web of Science data set and use it, visit this page here.

Portrait of Stephen Meyer

A little about Stephen Meyer

Meyer took his first professional position through North Carolina State University Libraries Fellows program, where he worked for the engineering library and systems department. It wasn’t until 2006 that he joined the UW-Madison Libraries as a Library Application Developer in LTG and SDG. Meyer worked on various web application projects in this role, including the first version of our discovery system, commonly referred to as Forward. After a brief stint from 2012 to 2014 working for OCLC as a technical product manager, he rejoined the UW-Madison Libraries in his current position as a Data Strategist.