GovScape Has Preserved Over 10 Million PDF Public Records Files in Searchable Service

GovScape is a collaboration between the University of Washington and Boston University, along with a number of other institutions, and is led by Ben Lee and Kyle Deeds. 

GovScape currently supports multimodal search over 10+ million PDFs (70+ million PDF pages).  Their research paper on the topic explains that they are working on making these kinds of records more searchable and accessible by supporting key word, semantic, and visual search techniques.  “Efforts over the past three decades have produced web archives containing billions of webpage snapshots and petabytes of data. The GovScape End of Term Web Archive alone contains, among other file types, millions of PDFs produced by the federal government. While preservation with web archives has been successful, significant challenges for access and discoverability remain.”

Read More Here
Previous
Previous

“Environmental watchdog sues Lincoln County over public records dispute”

Next
Next

Citing Public Records Makes News Stories More Credible