Check out all the on-demand sessions from the Intelligent Security Summit here.


Yesterday, the Irish Data Protection Commission (DPC) fined Facebook parent company Meta €265 million ($274 million USD) for breaching article 25 of the General Data Protection Regulation (GDPR) after hackers leaked the personal details of up to 533 million users on an online hacking forum. 

The hackers exploited data processing measures in Facebook’s contact importer feature (active between 25th May 2018 to September 2019) to conduct web scraping activities on public profiles and connect users’ profiles with email addresses.

In a statement released by a Meta spokesperson, the organization claims to have “made changes to our system during the time in question, including removing the ability to scrape our features in this way using phone numbers.”

Liabilities of web scraping 

The news comes amid reports of a leak of the data of 500 million WhatsApp users, although WhatsApp has insisted that there is “no evidence of a data leak.” 

Event

Intelligent Security Summit On-Demand

Learn the critical role of AI & ML in cybersecurity and industry specific case studies. Watch on-demand sessions today.

Watch Here

It also comes shortly after the DPC fined Meta €405 million ($419 million USD) for violating the GDPR and failing to prevent children from using business accounts, which made email addresses and/or phone numbers public by default. 

Meta’s latest fine stands out because it highlights the regulatory liabilities of failing to prevent web scraping of public data. 

“The fine itself shows that GDPR continues to be a powerful regulation that has consequences for non-adherence, and as reported, the section in question points to fundamental design ethos, protection by design and default, for compliance — which in turn also means security,” said Jon France, CISO of ISC2, “Privacy and security are a fundamental part of the development process, not a treatment to it.”

France added that, “web scraping has long been a tactic used by many to get data from public websites, and the implications are that the design of websites needs to incorporate measures that protect from en masse scraping at scale, such as rate limiting, etc.”

Why is web scraping so common? 

While web scraping is frowned upon, it actually is not illegal. Both market research firms and threat actors alike are free to harvest publicly available information on the internet. 

This was recently highlighted within the U.S. Ninth Circuit Court of Appeals, which ruled in hiQ Labs, Inc. v. LinkedIn Corp, that LinkedIn can’t prevent hiQ Labs from scraping LinkedIN users’ publicly available data. 

In this case, which dates back to 2017, LinkedIn argued that hiQ violated laws such as the Computer Fraud and Abuse Act (CFAA) and attempted to block the organization from scraping data from public LinkedIn profiles. 

Circuit Judge Marsha Berzon argued, at the time, that “there is little evidence that LinkedIn users who choose to make their profiles public maintain an expectation of privacy with respect to the information that they post publicly, and it is doubtful that they do.” 

Mitigating regulatory risk 

It’s important to note that web scraping presents regulatory risks when it relates to “data that’s covered by privacy law,” explained Mike Parkin, senior technical engineer at Vulcan Cyber

For this reason, organizations need to have a complete understanding of what information is publicly exposed. In practice, this comes down to reviewing all publicly available data exposed on their websites and completing a risk assessment to measure how this information could put user privacy at risk. 

“If your website makes information available, people will find it whether you want them to or not,” Parkin said. “Web scraping tools will follow any link they can find and can harvest any data they encounter. This can be a problem even with mundane data.” 

Another way to tackle web scraping is to add greater protections at the API-level, creating an inventory of APIs and increasing visibility over them.  

“To prevent malicious web scraping, site owners need visibility into every API endpoint and the data exposed,” said Scott Gerlach, cofounder and CSO at StackHawk. “Testing web interfaces and APIs for vulnerabilities frequently and early on improves overall security posture and provides insight to act quickly if needed.”

VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.