Meta's RSC supercomputer brings revolutionary power — and privacy and bias concerns

Last month, Meta (formerly Facebook) announced that it had developed a supercomputer known as the AI Research SuperCluster (RSC). The company claims that when completed by the end of the year, it will be one of the world’s fastest AI supercomputers.

The technology and social media giant says the RSC will build a strong backbone of compute capability and AI to power the metaverse. The company claims that the RSC's advanced compute capabilities will help address concerns the company has been criticized for, such as identifying harmful content and algorithmic bias.

To address these concerns, Meta detailed in a blog post how it plans to safeguard the privacy of user data that this AI-powered RSC will be trained on. Specifically, the company cites that the "RSC is isolated from the larger internet, with no direct inbound or outbound connections, and traffic can flow only from Meta’s production data centers."

The company goes on to state the following: "Before data is imported to RSC, it must go through a privacy review process to confirm it has been correctly anonymized, or alternative privacy safeguards have been put in place to protect the data. The data is then encrypted before it can be used to train AI models, and both the data and the decryption keys are deleted regularly to ensure older data is not still accessible. And since the data is only decrypted at one endpoint, in memory, it is safeguarded even in the unlikely event of a physical breach of the facility."

A supercomputer with comprehensive benefits

Amid the skepticism, Meta's RSC supercomputer comes with immense technological potential and opportunity, including, targeting hate-driven content. In 2020, the company reported that "97% of hate speech taken down from Facebook was spotted by our automated systems before any human flagged it."

The issue of harmful content became contentious related to election misinformation both in 2016, and 2020, and it’s a priority Meta assures it’s addressing.

Since the RSC will enable AI models to learn from trillions of examples, and understand hundreds of languages, it's designed to tackle goals like targeting hate speech and harmful content more quickly and thoroughly with the vast compute power.

The RSC was designed and built remotely during the global coronavirus pandemic, but despite the hurdles, Meta says the RSC transformed from an idea in a shared document to a functioning cluster in just one-and-a-half years' time — all while navigating the supply chain crisis, remote work, and new safety protocols that COVID-19 necessitated.

"While the high-performance computing community has been tackling scale for decades, we also had to make sure we have all the needed security and privacy controls in place to protect any training data we use. Unlike with our previous AI research infrastructure, which leveraged only open source and other publicly available data sets, RSC also helps ensure that our research translates effectively into practice by allowing us to include real-world examples from Meta’s production systems in model training," Meta explained in an announcement about the RSC's capabilities.

Blending identities and addressing bias

The metaverse, which the RSC is built to power, essentially seeks to blur boundaries between physical and digital realities, and as a result, will further intertwine users' physical identity and their algorithmically determined digital identity.

Though, this blending of physical and digital identities brings cause for concern.

"The problem starts with the philosophy that there's no need for privacy if you have complete transparency. And we've seen that ideology underline Facebook's decision-making, and we've seen the consequences of that. It underlines things like rigid real name policies … and associates that with some kind of fraud. We know who [policies like that have] hurt. It hurts sex workers. It hurts trans people. It hurts Native Americans and Indigenous people," said Os Keyes, an AI ethicist at the University of Washington. "These are individuals whose names and identities 'aren't real,' or 'don't make sense.'"

With a supercomputer that promises to become the backbone of an intricately complex metaverse, the algorithms and clusters of data the RSC learns to operate from could potentially further perpetuate massive AI-powered biases.

The above example points to a wider question of bias, both within the tech industry and AI. Who should control how AI-powered compute systems learn from algorithms that then define us based on our clicks, likes, friends, and content consumption? Should we have a say, if what is inferred by algorithms, paints a misleading picture of us?

"If Meta was more explicit about what exact information they were keeping and how easily you could fix it and correct it if you needed to, that would be ideal," said Charles Simon, CEO of FutureAI, a company working to design AI with human-like general intelligence.

"We all have a general idea that any expectation of privacy on a picture you post to Facebook or something you say on Facebook is unrealistic. But if they know that I click on a site about this topic and that topic, and then infer from that, that I must be X years old and male and, and have a specific kind of political persuasion — then that is definitely a reason they should be explicit about what they inferred, so [that] I could choose how to be defined digitally, and correct anything that isn't fact," Simon added.

Meta says it’s addressing bias concerns and that its pillars of responsible AI states that "our Responsible AI team has developed and is continually improving our Fairness Flow tools and processes to help our ML engineers detect certain forms of potential statistical bias in certain types of AI models and labels commonly used at Facebook, as described in our recent academic paper, with a goal of eventually scaling similar measurement to all our AI products."

Meta addresses the imperfections and biases of AI technologies and how they can be perpetuated by "not enough training data, a lack of features, a misspecified target of prediction, or a measurement error in the input features." It additionally recognizes how addressing biases is an ongoing challenge.

Can data truly be anonymized with the RSC?

Keyes says that Meta's new AI-powered supercomputer is ethically problematic in two ways related to privacy. First, is the type of privacy protections, which Meta calls "functionally protecting," but not inherently protecting user data.

"I don't trust what they're saying about protecting the data from a privacy perspective because a lot of the stuff they are explicitly talking about [in the blog post] is functionally protecting it from people who aren't Facebook," Keyes said. "They're talking about the machines and air gaps and everything that is functionally encrypted, … [but] unfortunately, the fundamental problem with this idea is the claim that they've truly anonymized the data."

Keyes says that a problem with large tech giants is the lack of transparency with publicly defining what they mean by "anonymized data," as well as with how they define the who, and what analysis takes place of that user data. Data can be de-anonymized depending on surrounding factors, if there are inherently unique qualities of a person's data that are known or can reasonably be inferred through other context, then that data becomes de-anonymized to someone based on that information.

"Truly, whether data is anonymous or not depends on context. Specifically, it depends on what else you can correlate with and determine from it whether or not it truly is anonymized," Keyes says.

"Anonymous data can't be used to identify a specific individual, but certain AI research is not possible to perform with anonymized data (e.g., speech recognition),” Meta shared in a statement related to data anonymity and the RSC. In these instances, Meta’s privacy review will ensure that other appropriate safeguards relating to data are put in place, including deletion safeguards and access control safeguards.

Keyes argues that it’s reasonable for users to be concerned about the paradox of "data shadows" — essentially the "data doppelgängers" of ourselves. Keyes said that these can become dangerous when the representation of us individually, within the data, can "live a life of its own." Specifically, Keyes referenced, how "data might go off places you don't expect, which can cause privacy violations."

Keyes is not alone. AI and privacy experts have raised several concerns related to the privacy and security of the RSC's calculations and data and Meta's privacy protocols and safeguards.

"Imagine that everything you put into your computer that went on the internet was a postcard. When someone delivers your mail, they get to read it on the way, the message isn't sealed," Simon said. "With technology, everything is like a postcard … So, we need to start thinking about truly secure ways for transferring secure information. … We [as an industry] need to get the sealed, secure delivery of information under control."

Meta responds that, "The RSC announcement is about infrastructure, but that infrastructure does not change our commitment to advancing responsible AI. Furthermore, we are investing in new research on explainability, such as Captum, among others, to address the AI opacity concerns," a spokesperson for Meta told VentureBeat via email.

Meta's bumpy road to improving privacy and gaining user trust

Since 2006, when Facebook's first privacy "hiccup" occurred, the company has increasingly been in the hot seat for not effectively protecting user's data privacy, or ethically analyzing it.

The company now is hoping to turn a new leaf on privacy and biases.

In a statement provided to VentureBeat, Meta explained that its current privacy is overseen by Meta’s Privacy Team, which "is at the center of our company’s efforts to build a comprehensive privacy program and is joined by thousands of people in different organizations and roles across Meta who are working to embed privacy into all facets of our company operations, including public policy, privacy strategy, and legal."

Industry experts agree that where Meta has largely failed in the past has been in handling user privacy in a reactionary way rather than proactively ensuring concrete measures are incorporated from the beginning.

Even though Meta asserts the "RSC has been designed from the ground up with privacy and security in mind," there are doubts in the minds of both experts and users, because of the lengthy timeline of privacy protection failures.

"Meta is committed to safeguarding data, and at times, we use methods such as de-identification along with other safeguards to further protect privacy," Meta said in response to VentureBeat’s request for comment on the specific examples of the protections that are in place for the RSC and data privacy.

While the RSC supercomputer may mean Meta will be equipped to attempt its aim of addressing bias, protecting privacy, and identifying harmful content and with the enhanced compute and AI capabilities, the company can approach these goals at a larger scale and more quickly, Simon explained that this doesn't necessarily mean the company will be able to accomplish this "any smarter." "That's an open question and Meta has certainly been called onto the carpet to do better. And, maybe, they're turning over a new leaf, but a new leaf is not their business model. Their business model is to sell data and clicks," Simon said. "Clicks come from the most inflammatory text, typically. So, are they going to be willing to scale back their clicks for the benefit of users and mankind? I'm dubious of that idea."

Keyes expressed similar sentiments, saying that Meta's current policy appears to lack explicit detail around what happens when, inevitably, there is a mishap with user data with the RSC. And additionally, if there is a mishap, what processes are in place to address that and to achieve some kind of justice?

"I feel that working to ensure those things is so antithetical to how Facebook operates that from my perspective, Facebook cannot adequately do it," Keyes said. "Ultimately, my fundamental concern with Facebook building the RSC, is however much they say that they are anonymizing things and swear that everything is safe because their entire company's purpose and profit margin is through coordinating and aligning and merging data and trying to assemble as much data on people as humanly possible. Why would anyone blindly trust that?"

A call to action for tech giants

Is it evident by Meta's first-ever drop in Facebook users, and thereafter, its stock which also followed suit, that more users are paying attention and leaving the platform for other forms of entertainment? Experts think that is very likely.

"As users, we get to opt out, and I think that's going to be the real form of regulation," Simon said. "If they overstep their boundaries in terms of privacy, people are going to opt out and go somewhere else or simply back off."

How could Meta do better?

Privacy expert Walter Harrison founded his company, Tapestri, with privacy transparency from the ground up. In fact, the entire company is based solely on transparently sharing how, where, and why, individuals' data is ending up and pays them for their consent to the sharing of their data.

While that may be the opposite of Meta's business model, Harrison shared some insights into how the company could, perhaps, win some public trust back doing so. "Our advice to bigger companies is to get explicit consent if you can. Obviously, users have already agreed to Facebook's terms and conditions and it's a mile long and the average person is never going to read it. But maybe reminding that consumer, again, of what exactly they're exchanging to use your service. That would be, I think, a step in the right direction."

Harrison also cautions individuals to do their diligence in understanding all the personal information, biometrics, and preferences they may be sharing with these tech giants without realizing. Particularly since, the metaverse that the RSC will power, may become a space to represent ourselves in a way that is even more true, ideally, to our identities, that it will also that as mentioned before, bring the physical and digital personas even closer together. "If I am giving up so much of myself in exchange for this service, am I OK with that and is that exchange worth it? And that's the question that we're all going to have to answer for ourselves," he said.

What can users anticipate trading in for this exchange with the new supercomputer? With its privacy protections in place from the ground up, the integrity of personal data will ideally remain intact, while simultaneously powering technological innovation. Meta expects the RSC to be a "function change in compute capability" which will in turn enable the company to both "create more accurate AI models for our existing services" and "enable completely new user experiences, especially in the metaverse." These advancements, the tech giant claims, will allow Meta to build "next-generation AI infrastructure" as well as to create the foundational technologies that will power the metaverse and further advance the AI community at large across industries.