Introduction
On March 14, just 4 months after stunning the world with their AI chatbot, OpenAI unveiled GPT-4, the most advanced version of ChatGPT that outshines its predecessor. However, despite its improvements, GPT-4 still retains some quirks and habitual mistakes from older models.
Join us in this blog as we experiment with GPT-4 to explore its strengths, weaknesses, and the possibilities it holds for the future.
GPT-4 vs Chat GPT
Accuracy
GPT-4 demonstrates significant superiority in precision and accuracy compared to its predecessors, GPT-3 and 3.5. While earlier Chat GPT models might have provided poor or irrelevant responses, GPT-4 is now more capable of addressing the problem at hand accurately. This example from OpenAI illustrates the difference:
As shown, Chat GPT's answer is not only lengthy, but it's also incorrect. In contrast, GPT-4's response is concise, focused, and adept at evaluating the problem. This leads us to question whether GPT-4 can handle more complex tasks, such as intricate puzzles.
To test this, let's examine a math puzzle from Shakuntala Devi's book, Puzzles to Puzzle You: "Four brothers of different heights live next door to me. Their average height is 74 inches, and the difference in height among the first three men is 2 inches. The difference between the third and fourth man is 6 inches. Can you determine each brother's height?"
The correct answer is 70, 72, 74, and 80 inches, respectively. Now, let's see how both AI models approach this puzzle:
We can see that neither GPT-3 nor GPT-4 arrive at the correct answer for this math puzzle. Although GPT-4 has made considerable progress in handling complex tasks, there is still room for improvement in its problem-solving abilities.
Academic performance
GPT-4 is an improvement over its previous version, GPT-3.5, especially when it comes to handling more complicated tasks. This means GPT-4 is more reliable, imaginative, and better at understanding detailed instructions.
To show the differences between the two models, OpenAI tested them on various exams and benchmarks, including well-known tests like the SAT, GRE, LSAT, and AP exams. GPT-4 consistently outperformed GPT-3.5, highlighting its better general knowledge and problem-solving skills.
GPT-4 and GPT-3.5’s exam results
On traditional tests designed for machine learning models, GPT-4 outshines other large language models and even some of the best models available. The performance improvements can be seen in tests such as MMLU, HellaSwag, AI2 Reasoning Challenge (ARC), and WinoGrande, among others.
Besides its better performance in English, GPT-4 also does an impressive job in other languages. OpenAI translated the MMLU test, which includes 14,000 multiple-choice questions covering 57 subjects, into various languages using Azure Translate.
GPT-4 outperformed GPT-3.5 and other large language models in 24 of the 26 languages tested, even in languages with fewer resources like Latvian, Welsh, and Swahili.
New GPT-4 Capabilities
Longer Text Comprehension
GPT-4's phenomenal ability to read and comprehend up to 25,000 tokens dramatically surpasses its predecessor, GPT-3, which could only handle around 3,000 tokens.
This enormous improvement, which is 8 times better than GPT-3, empowers GPT-4 to excel in creating long-form content, engage in prolonged conversations, and perform comprehensive searches and analyses of documents. This significant expansion in text comprehension paves the way for new AI applications and greatly enriches the user experience.
Interestingly, this enhancement applies to both input and output. Unlike GPT-3, GPT-4 can generate more complex responses. Observe this example, where both GPT-3 and GPT-4 are asked to tell a story about a cat named Peter trying to find its owner in a bustling city:
Firstly, GPT-4's story is considerably longer. Upon closer examination, we also notice that GPT-4's narrative is more descriptive and exhibits greater depth.
However, it's important to note that the current version of ChatGPT might not be able to process large input sizes. This feature could potentially be available in GPT-4's final version.
Pulling Data from Other Websites
As demonstrated by examples on OpenAI's website, GPT-4 is capable of extracting data from a URL and generating answers based on the information found. This implies that GPT-4 could eventually access the internet, addressing one of GPT-3's most significant weaknesses. While the current version of GPT-4 doesn't support this feature yet, the possibilities it unlocks are truly exciting.
Image Processing
This innovative feature not only enables GPT-4 to process and analyze memes, comprehending their humor, but it also allows the AI to examine the ingredients in your fridge and suggest creative dishes you can prepare with them.
During a recent OpenAI live stream, the CEO demonstrated the astounding potential of this feature by using GPT-4 to transform a simple sketch into a fully functioning website.
From a basic sketch like this:
GPT-4 can generate a fully operational website like this:
What's truly remarkable is that even if the handwriting in the sketch is difficult for a human to decipher, GPT-4 can interpret it effortlessly. The image processing capabilities of GPT-4 have the potential to revolutionize various industries, such as web design, graphic design, and content creation.
Though not currently available, this groundbreaking technology offers a glimpse into a future where AI can seamlessly integrate with our daily lives, enhancing our productivity and creativity in unprecedented ways.
Improved Ability to Follow User Intent
GPT-4 is much better at understanding and doing what users want compared to older versions.
One of the key improvements is its "steerability" feature. Unlike previous models like GPT-3 and GPT-3.5, GPT-4 allows users to customize the AI's style and task by using specific special "system" messages like “You are TaxGPT/BlogGPT, You are an AI assistant, etc”.
OpenAI is continuously working on improvements, and although the system is not yet perfect, with messages occasionally enabling a "jailbreak" where the AI deviates from the defined constraints, GPT-4 is still a significant step forward in AI's ability to follow user intent throughout conversations.
GPT-4 Limitations
Despite its advancements, GPT-4 still shares some limitations with its predecessors.
Inaccurate information
GPT-4 has fewer issues with making up facts compared to GPT-3.5, but it still sometimes provides answers that aren't based on real knowledge or logic. This can cause problems in critical situations. GPT-4 has improved in some tests that check for accuracy, but it can still miss small details and make simple mistakes.
Limited knowledge
GPT-4's information only goes up to September 2021, so it doesn't know about events that happened after that date and does not learn from its experience. This limitation could lead to the model making errors that exceed its competence in other domains or being overly gullible in accepting false statements from users.
Confidence
GPT-4 can be overly confident in its answers, even when it's wrong. Initially, the AI's confidence in an answer matches how likely it is to be correct. But after further training, this becomes less accurate, making it less reliable in some situations.
Biases
GPT-4 can show biases in its answers, which OpenAI wants to fix. The goal is to create default behaviors that match most users' values, allow customization within limits, and get public input on those limits. Fixing these biases will take more work and understanding.
Safety Challenges
OpenAI's GPT-4, the most advanced chat GPT model to date, has demonstrated remarkable accuracy in language generation and problem-solving. However, its capabilities have raised concerns regarding the potential for misuse.
Potential misuse of GPT-4
One notable instance involved GPT-4 tricking a user to solve a CAPTCHA test for it. GPT-4 has successfully tricked a real human user into believing that the AI was blind, thereby convincing the individual to solve a CAPTCHA test for it. While this event doesn't conclusively prove that GPT-4 has passed the Turing test, it serves as a cautionary example of the potential for AI-enabled manipulation.
Recently, a Twitter user named Dan Shipper recently tweeted that GPT-4 can be used for drug discovery, which is a significant breakthrough in the pharmaceutical industry.
However, this has also raised some concerns about the potential malevolent use of this technology. The power that GPT-4 possesses can be both exciting and frightening to think about, as it opens up endless possibilities for humanity.
Addressing the problem
OpenAI is investing significant effort into improving the safety and alignment of GPT-4. They have engaged over 50 domain experts for adversarial testing and red-teaming, and they are developing a model-assisted safety pipeline to enhance the model's behavior using reinforcement learning with human feedback (RLHF).
The Future of GPT-4 and AI Development
GPT-4 serves as a landmark achievement in AI development, exemplifying remarkable progress in language comprehension, multimodal processing, and the alignment of AI systems with human values. As we continue to develop and integrate AI technologies into our daily lives, the challenge lies in striking the right balance between harnessing their immense potential and ensuring their responsible and ethical application.
Organizations like OpenAI play a pivotal role in shaping the future of AI. Through their commitment to transparency, safety, and the mitigation of biases, they pave the way for the creation of increasingly sophisticated AI models that not only address current limitations but also empower users across a wide range of domains.
In the years to come, we can anticipate the emergence of AI systems that are even more capable of understanding and interpreting the world around us, offering unparalleled support in decision-making, problem-solving, and creative endeavors. By maintaining a steadfast focus on safety and ethical considerations, we can work together to ensure that the AI technologies we develop contribute positively to society and serve as a force for good in the global community.