How can AI better understand humans? Simple: by asking us questions

Anyone who has dealt in a customer-facing job — or even just worked with a team of more than a few individuals — knows that every person on Earth has their own unique, sometimes baffling, preferences.

Understanding the preferences of every individual is difficult even for us fellow humans. But what about for AI models, which have no direct human experience upon which to draw, let alone use as a frame-of-reference to apply to others when trying to understand what they want?

A team of researchers from leading institutions and the startup Anthropic, the company behind the large language model (LLM)/chatbot Claude 2, is working on this very problem and has come up with a seemingly obvious solution: get AI models to ask more questions of users to find out what they really want.

Entering a new world of AI understanding through GATE

Anthropic researcher Alex Tamkin, together with colleagues Belinda Z. Li and Jacob Andreas of the Massachusetts Institute of Technology's (MIT's) Computer Science and Artificial Intelligence Laboratory (CSAIL), along with Noah Goodman of Stanford, published a research paper earlier this month on their method, which they call "generative active task elicitation (GATE)."

Their goal? "Use [large language] models themselves to help convert human preferences into automated decision-making systems"

In other words: take an LLM's existing capability to analyze and generate text and use it to ask written questions of the user on their first interaction with the LLM. The LLM will then read and incorporate the user's answers into its generations going forward, live on the fly, and (this is important) infer from those answers — based on what other words and concepts they are related to in the LLM's database — as to what the user is ultimately asking for.

As the researchers write: "The effectiveness of language models (LMs) for understanding and producing free-form text suggests that they may be capable of eliciting and understanding user preferences."

The three GATES

The method can be applied in various ways, according to the researchers:

Generative active learning: The researchers describe this method as the LLM producing examples of the kind of responses it can deliver and asking how the user likes them. One example question they provide for an LLM to ask is: "Are you interested in the following article? The Art of Fusion Cuisine: Mixing Cultures and Flavors […] ." Based on what the user responds, the LLM will deliver more or less content along those lines.
Yes/no question generation: This method is as simple as it sounds (and gets). The LLM will ask binary yes or no questions such as: "Do you enjoy reading articles about health and wellness?" and then take into account the user's answers when responding going forward, avoiding information that it associates with those questions that received a "no" answer.
Open-ended questions: Similar to the first method, but even broader. As the researchers write, the LLM will seek to obtain "the broadest and most abstract pieces of knowledge" from the user, including questions such as "What hobbies or activities do you enjoy in your free time […], and why do these hobbies or activities captivate you?"

Promising results

The researchers tried out the GATE method in three domains — content recommendation, moral reasoning and email validation.

By fine-tuning Anthropic rival's GPT-4 from OpenAI and recruiting 388 paid participants at $12 per hour to answer questions from GPT-4 and grade its responses, the researchers discovered GATE often yields more accurate models than baselines while requiring comparable or less mental effort from users.

Specifically, they discovered that the GPT-4 fine-tuned with GATE did a better job at guessing each user's individual preferences in its responses by about 0.05 points of significance when subjectively measured, which sounds like a small amount but is actually a lot when starting from zero, as the researchers' scale does.

_{Fig. 3 chart from the paper "Eliciting Human Preferences With Language Models" published on arXiv.org dated Oct. 17, 2023.}

Ultimately, the researchers state that they "presented initial evidence that LMs can successfully implement GATE to elicit human preferences (sometimes) more accurately and with less effort than supervised learning, active learning, or prompting-based approaches."

This could save enterprise software developers a lot of time when booting up LLM-powered chatbots for customer or employee-facing applications. Instead of training them on a corpus of data and trying to use that to ascertain individual customer preferences, fine-tuning their preferred models to perform the Q/A dance specified above could make it easier for them to craft engaging, positive, and helpful experiences for their intended users.

So, if your favorite AI chatbot of choice begins asking you questions about your preferences in the near future, there's a good chance it may be using the GATE method to try and give you better responses going forward.

Entering a new world of AI understanding through GATE

The three GATES

Promising results

More