The Internet Archive wants to add a new string to its bow by becoming the official host of electronic records from U.S. courts.
Launched originally in 1988, PACER — an acronym for “Public Access to Court Electronic Records” — is a publicly accessible database of U.S. federal court documents and includes information from cases across district courts, courts of appeals, and bankruptcy courts. The database was originally available through terminals in libraries, among other public spaces, but PACER was moved online to be accessed by anyone in 2001. Today, anyone can pay $0.10 to download a single page, with a maximum charge of $3 per document.
The Internet Archive, for its part, has been documenting the web’s evolution since 1996, crawling millions of websites and documenting changes and edits at intermittent periods through the Wayback Machine. But the archive is also used to host a wealth of additional media assets and material, including classic MS-DOS games, videos of Congressional hearings, and government website data — it’s all about saving as much as possible for posterity.
Back in 2010, the Internet Archive launched RECAP (“PACER” in reverse) in collaboration with Princeton University’s Center for Information Technology Policy. Through RECAP, anyone can upload PACER documents they’ve purchased and make them available to others for free. “We hope that the government will eventually put all of these documents in an open archive, but until then this repository will grow with use,” said Internet Archive founder Brewster Kahle at the time. Today the repository holds around 1.5 million documents.
The U.S. Congress is scheduled to kick off a series of hearings today that explore how the PACER database is operated — and the Internet Archive has issued an open letter that includes an offer to become the official host of PACER data to “…make the works of our federal courts more readily available, to inform the citizenry, and to further the effective and fair administration of justice,” according to a blog post on the matter.
“By this submission, the Internet Archive would like to clearly state to the Judiciary Committee, as well as to the Administrative Office of the U.S. Courts and the Judicial Conference of the United States, that we would be delighted to archive and host — for free, forever, and without restriction on access to the public — all records contained in PACER,” says Kahle, in the letter.
It’s worth noting here that PACER hosts somewhere in the region of around one billion documents, according to the Free Law Project. But it’s well within the Internet Archive’s scope and capabilities — it already archives one billion web pages each week, millions of books, videos, and audio files, and it preserves a record of all websites belonging to Congress.
“At any given moment, we are delivering about 30 gigabits of data per second,” adds Kahle. “We host more than 20 petabytes of data in total. By comparison, the PACER corpus is a fraction of a petabyte and does not use a significant amount of bandwidth. We have the capacity to host this information, and I know there are many other organizations on the internet who would be able to make dramatic increases in the usability and utility of our Federal Judiciary’s database if it were made available in a more modern fashion and without artificial restrictions on use.”
It’s not yet clear whether Congress intends to bring down the great Pacer paywall and pass its entire database over into a free-to-use public domain. But the Internet Archive’s proposal is alluring — it’s offering to take PACER off the hands of the Administrative Office of the United States Court and manage the cost of running the system.
You can read Kahle’s open letter in its entirety below.
February 10, 2017
The Honorable Darrell Issa, Chairman
The Honorable Jerry Nadler, Ranking Member
Subcommittee on Courts, Intellectual Property and the Internet
Committee on the Judiciary
House of Representatives
Washington, DC 20515
Dear Chairman Issa and Ranking Member Nadler,
Thank you for the opportunity to submit comments on the Judiciary Committee’s hearing entitled “Judicial Transparency and Ethics.” I write on behalf of the Internet Archive, a non-profit digital library that is based in San Francisco with facilities throughout the world.
For more than 20 years, the Internet Archive has been archiving digital collections and making them available at no cost and with no restriction on the Internet. The Internet Archive works with the Library of Congress, the National Archives, and numerous national libraries around the world to collect, store, and provide permanent access to millions of books, videos, audio and hundreds of millions of pages of U.S. government documents, including over 14,000 hours of video of Congressional hearings.
By this submission, the Internet Archive would like to clearly state to the Judiciary Committee, as well as to the Administrative Office of the U.S. Courts and the Judicial Conference of the United States, that we would be delighted to archive and host—for free, forever, and without restriction on access to the public—all records contained in PACER.
People download more than 20 million books from the Internet Archive each month. We preserve 1 billion web pages each week for public access through the “Wayback Machine.” Indeed, the Wayback Machine is the only publicly accessible archive of all the websites of Congress. At any given moment, we are delivering about 30 gigabits of data per second. We host more than 20 petabytes of data in total.
By comparison, the PACER corpus is a fraction of a petabyte and does not use a significant amount of bandwidth. We have the capacity to host this information, and I know there are many other organizations on the Internet who would be able to make dramatic increases in the usability and utility of our Federal Judiciary’s database if it were made available in a more modern fashion and without artificial restrictions on use.
The stated purpose of PACER is to make public court records “freely available to the greatest extent possible.” Sixteen years ago, the United States Courts predicted that PACER would allow the public to “surf to the courthouse door on the Internet.” Today, anyone visiting a federal courthouse can view the public record for free. PACER, on the other hand, charges users per-page fees that are prohibitive for many members of the public. The Judiciary could resolve this unfortunate discrepancy—immediately—at no cost. This is our offer.
The Internet Archive has deep experience with collections of this kind. In fact, we already host the records from over a million federal court cases that have been donated by the public as part of the RECAP Project. However, a million cases is a small portion of the hundreds of millions of cases that PACER contains, and we are frustrated that it is so difficult to obtain and serve the workings of our federal courts to the public. This is fairly trivial technical task, and we would welcome the opportunity to make much more data available.
I must also note that the Internet Archive is not alone in being well-equipped to offer this service. There are other large digital repositories that similarly serve the public for free. I cannot speak for them, but I believe that once the corpus is available for no fee and without restriction, they too will replicate it and offer similar service. Indeed, others may build useful tools for reading, searching, and studying the corpus of public court records that makes up our federal case law.
In order to recognize the vision of universal free access to public court records, the Federal Judiciary would essentially have to do nothing. We are experts at “crawling” online databases in an efficient and careful fashion that does not burden those systems. We are already able to comprehensively crawl PACER from a technical perspective, but the resulting fees would be astronomical. The Federal Judiciary has a Memorandum of Understanding with both the Executive Office for US Trustees and with the Government Printing Office that gives each entity no-fee access for the public benefit. The collection we would provide to the public would be far more comprehensive than the GPO’s current court opinion program—although I must laud that program for providing a digitally-authenticated collection of many opinions.
By making federal judicial dockets available in this manner, the Federal Judiciary would enable free and unlimited public access to all records that exist in PACER, finally living up to the name of the program. In today’s world, public access means access on the Internet. Public access also means that people can work with big data without having to pass a cash register for each document.
This PACER collection we would maintain and improve would have far more detailed metadata and contextual information than the GPO service or the PACER Case Locator service. And, that’s just for starters, because we know that there are thousands of eager researchers, journalists, and government workers (including Congressional staff) who would immediately jump in and work with us.
By providing no-cost access to the Internet Archive to PACER and accepting our commitment to make this information available for use without restriction in perpetuity, we believe we can work with our government to make the workings of our court more usable to government attorneys, to members of the bar, and to the public at large.
Digital Librarian and Founder, Internet Archive