Apple asserts that their ReALM performs better than GPT-4 in this job. What is it?

Apple asserts that their ReALM performs better than GPT 4 in this job. What is it (2)

Apple developers have published a new study claiming that their ReALM language model outperforms OpenAI’s GPT-4 at “reference resolution.”

Apple asserts that their ReALM performs better than GPT 4 in this job. What is it (1)
Apple asserts that their ReALM performs better than GPT 4 in this job. What is it 

Apple’s ReALM Language Model Beats GPT-4 in Reference Resolution Benchmark

Apple researchers submitted a preprint document on Friday for their ReALM big language model, claiming that it can “substantially outperform” OpenAI’s GPT-4 in specific benchmarks. ReALM is supposed to understand and manage a variety of scenarios. In theory, this will enable users to point to something on the screen or in the background and ask the language model about it.

Reference resolution is a language problem that involves determining what a specific expression refers to. For example, when we speak, we employ pronouns like “they” and “that.” Now, what these words are referring to may be evident to humans who comprehend context. However, a chatbot like ChatGPT may not always understand what you’re saying.

Chatbots would benefit greatly from being able to grasp precisely what is being said. According to Apple, the ability for users to refer to something on a screen using “that” or “it” or another word and have a chatbot comprehend it precisely is critical to delivering a genuinely hands-free screen experience.

This is Apple’s third AI paper in recent months, and while it is still too early to forecast anything, these papers can be viewed as an early preview of capabilities that the firm intends to incorporate in its software offerings such as iOS and macOS.

According to the study, researchers intend to employ ReALM to recognize and identify three types of entities: onscreen entities, conversational entities, and background entities. Onscreen entities are things that appear on the user’s screen. Conversational entities are those that contribute to the discourse. For example, if you ask a chatbot “what workouts am I supposed to do today?” it should be able to determine from past discussions that you are on a three-day workout program and what your daily routine is.

Background entities are objects that do not fit within the first two categories but are nonetheless relevant. For example, there could be a podcast playing in the background or a notification that just went off. Apple wants ReALM to recognize when a user refers to these as well.

“We show significant improvements over an existing system with comparable capability across several sorts of references, with our smallest model achieving absolute benefits of more than 5% for on-screen references. We also benchmark against GPT-3.5 and GPT-4, with our smallest model performing similarly to GPT-4 and our larger models significantly outperforming it,” said the researchers in their report.

However, keep in mind that GPT-3.5 only accepts text, therefore the researchers’ input was limited to the prompt. However, in GPT-4, they also included a screenshot for the task, which significantly improved performance.

“Please keep in mind that our ChatGPT prompt and prompt+image formulation are, to the best of our knowledge, innovative in their own right. The researchers suggest additional examination of a more complicated strategy, such as sampling semantically related utterances till the prompt length, to improve results. They leave this for future work.

So, while ReALM outperforms GPT-4 in this particular benchmark, saying that the former is a better model than the latter is far from correct. It’s only that ReALM outperformed GPT in a benchmark it was explicitly built to excel at. It is also unclear when or how Apple intends to integrate ReALM into its devices.