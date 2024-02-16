wordpress blog stats
Explained: What does it mean for Google Gemini to have a 1 million token context window?

The context window is how much information an AI model can consider at any one time. Gemini Pro 1.5’s 1 million token context window means it remembers or can take into consideration 1 million tokens when responding to user queries.

Published

Source: Google Blog

A few days after rebranding most of its AI services to Gemini, Google on February 15 announced Gemini 1.5, the successor to the Gemini 1.0 AI model that powers all these AI services.

“Gemini 1.5 Pro, our mid-sized model, will soon come standard with a 128K-token context window, but starting today, developers + customers can sign up for the limited Private Preview to try out 1.5 Pro with a groundbreaking and experimental 1 million token context window! The 1M tokens feature unlocks huge possibilities for devs – upload hundreds of pages of text, entire code repos, and long videos and let Gemini reason across them.” — Google CEO Sundar Pichai

Surely, the 1 million token context window appears to be a big deal, but what does it really mean? I turned to the always-reliable subreddit Explain Like I’m Five for some clarity, and the folks there did not disappoint. While I encourage you to read all the responses I got to my questions, cause each one plays its part in improving your understanding of this subject, here is a summary of what I learnt:

What is a token? AI models convert your input into a string of tokens, which is what they process. A single token could represent a word, a few letters, a punctuation, etc. depending on the implementation of the model. Here is a handy tool from OpenAI that shows how it converts queries to tokens for ChatGPT. For example, the sentence “Nemo is a fish and likes to wander around” gets converted to 12 tokens.

What is a context window? The context window is how much information an AI model can consider at any one time. The model can use all the information in a context window to decide what it should output. It can also be thought of as the memory of the model for that particular conversation.

Let’s take the example above: “Nemo is a fish and likes to wander around.” Suppose a model only has a context window of say 5 tokens, that would mean the model only takes into consideration the second part of our input. If you now ask this model, “Who likes to wander around?”, it will give some random answer and not Nemo because it does not know the first part of this input, which is “Nemo is a fish…”

What does a 1 million token context window mean? Gemini Pro 1.5 has a 1 million token context window, meaning it remembers or can take into consideration 1 million tokens when responding to your queries. In other words, “the bigger the context window, the more information it can take in from a prompt — making its output more consistent, relevant and useful,” Google DeepMind explained.

Just how much can be represented through 1 million tokens? According to Google, roughly:

  • 1 hour of video,
  • 11 hours of audio,
  • 30,000 lines of code, or
  • 700,000 words

For comparison, the Lord of the Rings trilogy has a combined word count of around 500,000 words. This means you can provide Gemini 1.5 with these three books and more as context!

In another example shared by Sundar Pichai, the Google team fed it a 44-minute silent film:

Here’s another video shared by Google DeepMind CEO Demis Hassabis:

How does this compare to context windows of other AI models like OpenAI’s GPT 4.0?

OpenAI’s GPT 4 Turbo, which powers the paid ChatGPT Plus service, has a context window of 128,000 tokens. Here is a video comparing the context windows of different AI models.

One caveat, however: The 1 million token context window is only accessible to select developers via AI Studio and Vertex AI, while the default context window of Gemini Pro 1.5 is 128K, the same as GPT 4 Turbo.

What can we expect going forward? Details of Gemini Ultra 1.5, which is more advanced than Pro 1.5, are yet to be disclosed. But interestingly, Google shared that it has tested Gemini 1.5 with a 10 million token context window for text! It is hard to even fathom how much information can be stuffed into such a window. OpenAI’s next model, GPT 5.0, will also likely get a significant bump in its context window. All this is to say that this appears to just be the start and the future holds some impressive possibilities that we cannot understand fully yet.

What does Mixture-of-Experts (MoE) architecture mean? Apart from having a large context window, Google also announced that Gemini 1.5 has a new Mixture-of-Experts (MoE) architecture. This supposedly makes the model faster and more efficient because “depending on the type of input given, MoE models learn to selectively activate only the most relevant expert pathways in its neural network. This specialization massively enhances the model’s efficiency.” Here’s an explainer of what this means.

For more on Gemini 1.5, here’s Google’s press release.

