newcryptoplaytoearngames| Google I/O 2024: Opening a new generation of I/O

2024-05-15

Special topic: focus on Google's 2024 Ithumb O developer Conference: real-time interaction, video model debut

Source: Google Blackboard

Author: Sundar Pichai

Google and Alphabet CEO

Editor's note: the following is an edited version of Sundar Pichai's speech at the 2024 Ithumb O conference, adjusted to include more announcements on the stage.

Google has entered Gemini era in an all-round way.

Before going any further,NewcryptoplaytoearngamesI'd like to review the moment we are in first. For more than a decade, we have been investing in AI and innovating at all levels: research, products, infrastructure, and today we will have a comprehensive discussion of this.

Nevertheless, we are still in the early stages of the transformation of the AI platform. We sawNewcryptoplaytoearngamesIt brings great opportunities for creators, developers, startups and everyone. Helping to drive these opportunities is what our Gemini era is all about. Let's get started.

Gemini era

A year ago, at the IWeiO conference, we first shared Gemini's plan: a cutting-edge model built from the start as a native multimodal model that can reason across multiple data types such as text, images, video, code, and so on. It marks an important step in converting arbitrary input into arbitrary output-a new generation of "Imax O".

Since then, we have released the first Gemini models, which are by far our most powerful models. They perform well in every multimodal benchmark. Two months later, we launched Gemini 1 again.Newcryptoplaytoearngames.5 Pro, which has made a major breakthrough in dealing with long contexts, running 1 million Token stably in a production environment, more than any other large-scale infrastructure model today.

We hope that everyone can benefit from the features of Gemini. Therefore, we take immediate action to share these developments with you. Currently, more than 1.5 million developers are using the Gemini model in our various tools. You use it to debug code, gain new insights, and build the next generation of AI applications.

We are also constantly integrating the breakthrough features of Gemini into our products in a powerful way. Today, we will show examples from products such as search, Photos, Workspace, and Android.

Product progress

Today, all of our products with 2 billion users use Gemini.

We have also introduced new experiences, including on mobile devices, where people can now interact directly with Gemini through applications on Android and iOS, and Gemini Advanced allows users to use our most powerful models. In just three months, more than 1 million people have signed up for trial, and the momentum is still strong.

Extend AI Overviews in search

One of the most exciting changes brought about by Gemini is in Google search.

In the past year, we have answered billions of search queries as part of our spanning search experience (Search Generative Experience). People are using search in new ways, asking new types of questions, making longer and more complex queries, even searching through photos, and getting the best information on the web.

We have been testing this experience outside of Labs. We are encouraged to see not only an increase in search usage, but also an increase in user satisfaction.

I am pleased to announce that we will launch this new and revamped AI Overviews experience to all users in the United States this week. We will soon extend this experience to more countries.

Many innovations are taking place in the field of search. Thanks to Gemini, we are able to create a more powerful search experience, including in our products.

Introduction to Ask Photos

Google Photos is an example. We released this product about nine years ago, and since then, people have been using it to organize the most precious memories. Today, more than 6 billion photos and videos are uploaded every day.

People like to use Photos to search for details in their lives. With Gemini, we make it all easier.

Suppose you can't remember your license plate number when you pay in the parking lot. In the past, you had to search for keywords in Photos and then search for photos accumulated over the years to find license plates. But for now, all you have to do is ask Photos directly. It can identify the frequent vehicles, determine which one is yours through multi-party information cross-verification, and provide the license plate number.

Ask Photos can also help you relive memories in a more in-depth way. For example, you may be reliving the early important moments when your daughter Lucia grew up. Now, you can ask Photos directly: "when did Lucia learn to swim?"

You can even follow up and ask a more complex question: "show me how Lucia's swimming skills have improved."

Here, Gemini is no longer just doing a simple search, it recognizes different contexts-from flopping in the swimming pool to snorkeling in the ocean, to the text and date on her swimming certificate. Photos will put all this information together to form a summary so that you can fully understand it and relive those wonderful memories again. We will launch Ask Photos this summer and will continue to add more features.

Unlock more knowledge through multimodal and long context

To understand the variety of knowledge across different formats, we made Gemini multimodal from the start. It is a model with all the modes built in. Therefore, it can understand different types of inputs and find connections between them.

Multimodal fundamentally expands the questions we can ask and the answers we will get. The long text ability takes it a step further, allowing us to introduce more information: hundreds of pages of text, hours of audio or an hour of video, the entire code repository. Or, if you like, about 96 cheesecake factory restaurant menus.

To deal with such a large number of menus, you may need a 1 million token context window, and now through Gemini 1Newcryptoplaytoearngames.5 Pro can be realized. Developers have been using it in a variety of very interesting ways.

In the past few months, we have released a preview version of Gemini 1.5 Pro with long context capabilities, and we have made a series of improvements to the quality of translation, coding, and reasoning. Starting today, you will also see these updates in the model.

Now I am pleased to announce that we will release an improved version of Gemini 1.5 Pro to all developers around the world. In addition, starting today, Gemini 1.5 Pro with 1 million token context capabilities is also available directly to Gemini Advanced consumers in 35 languages.

Expand to 2 million tokens in a private preview

1 million tokens are opening up entirely new possibilities. This is already very exciting, but I think we can go further.

Today, we expand the context window to 2 million tokens and provide them to developers in a private preview.

I am very excited about the progress we have made over the past few months, which represents another step towards the ultimate goal of infinite context.

Apply Gemini 1.5 Pro to Workspace

So far, we have shared two technological advances: multimodal and long context. They are already very powerful, but the combination of the two can release deeper capabilities and more intelligence.

This is reflected more incisively and vividly in Google Workspace.

For a long time, people have been searching their emails on Gmail. And now we're making it stronger through Gemini. For example, as a parent, you want to know what's going on with your child at school, Gemini can help you!

Now, we can ask Gemini to summarize all the recent emails from the school. In the background, it can identify relevant emails and even analyze attachments such as PDF, and you can get a summary of key points and to-dos. Maybe you are on the road this week and will not be able to attend the parents' meeting, which will be recorded for an hour. If this recording is from Google Meet, you can ask Gemini to provide you with key content. If a parent group is looking for volunteers and you happen to be free that day, then of course, Gemini can also help you draft a reply email.

There are countless other examples of how Gemini can make life easier. Gemini 1.5 Pro has been used in Workspace Labs since today.

Audio output in NotebookLM

We just looked at an example of text output, but with a multimodal model, we can do more.

We have made progress in this area and there will be more in the future. The audio overview (Audio Overview) in NotebookLM shows the progress in this area: through Gemini 1.5 Pro, it can generate personalized and interactive audio conversations based on your source files.

This is the possibility of multimodal, and soon you will be able to mix and match the input and output, which is what we mean by the new generation of IWeiO. But what if we can go further?

Using AI agents to go a step further

Taking this one step further is one of the opportunities we see on AI agents (AI Agents). I think they are intelligent systems that can reason, plan and remember. They can "think" many steps ahead and work across software and systems, all to help you accomplish tasks and, most importantly, under your supervision.

We are still in the early stages, but let me show you some of the types of application cases that we are working on.

Let's take shopping as an example. It's interesting to buy shoes, but not so interesting when they don't fit and need to be returned.

Imagine if Gemini could complete all the steps for you:

Search your inbox for receipts.

Find the order number in your email.

Fill in the return form.

Even arrange UPS pickup.

Isn't that much easier?

Let's give a more complicated example.

Let's say you just moved to Chicago. Imagine that Gemini and Chrome can work together to help you do a lot of preparatory work-organization, reasoning, comprehensive analysis, etc.

For example, if you want to explore the city and find nearby services-from dry cleaners to dog walking services-you also have to update your new address on dozens of websites.

Now Gemini is up to the job and prompts you for more information when needed. So that things are always under your control.

This part is very important-when we prototype these experiences, we think carefully about how to do it in a way that is private, secure, and suitable for everyone.

These are simple application cases, but they give you a good idea of the types of problems we hope to solve by building intelligent systems that can think, reason, and plan ahead on your behalf.

What does this mean for our mission?

With its multimodal, long context and intelligence, Gemini brings us closer to our ultimate goal: to let AI help everyone.

We believe that this is the way we have made the greatest progress in achieving our mission: to integrate global information entered in various ways so that it can be obtained through any output, and to combine global information with information in your world to present it in a way that is really useful to you.

A new breakthrough

In order to realize the full potential of AI, we need to open up new areas, which the Google DeepMind team has been working on.

We have received enthusiastic feedback about 1.5 Pro and its long context window, but we have also learned from developers that they want to be faster and more cost-effective. So, tomorrow, we will introduce Gemini 1.5 Flash, a lighter model built for scale that is optimized for low-latency and cost-focused tasks. 1.5 Flash will be available on AI Studio and Vertex AI on Tuesday.

Looking to the future, we always hope to build a general-purpose agent that is useful in daily life. The Astra project demonstrates multimodal understanding and real-time dialogue capabilities.

We have also made progress in video and image generation with the launch of Veo and Imagen 3 and the launch of Gemma 2.0 AI, our next-generation open model for responsible AI innovation.

Infrastructure in the AI era: an introduction to Trillium

Training the most advanced models requires a lot of computing power. In the past six years, the industry's demand for machine learning computing power has increased 1 million times. Moreover, it grows at a tenfold rate every year.

Google has an advantage in this respect. For 25 years, we have been investing in world-class technology infrastructure, from cutting-edge hardware that supports search to custom tensor processing units (tensor processing units) that support our AI advances.

Gemini trains and serves entirely on our fourth and fifth generation TPU. Other leading AI companies, including Anthropic, have also trained their models on TPU.

Today, we are pleased to announce the launch of the sixth generation of TPU-- Trillium. Trillium is by far our most powerful and efficient TPU, with a 4.7-fold improvement in computing performance compared to the previous generation of TPU v5e.

We will provide Trillium to Cloud customers by the end of 2024.

In addition to our TPU, we also introduce CPU and GPU to support any workload. This includes the new Axion processor we announced last month, our first custom CPU based on Arm, which provides industry-leading performance and energy efficiency.

We are also proud to be one of the first Cloud providers to offer Nvidia's cutting-edge Blackwell GPU, which will be available in early 2025. We are lucky to have a long-term partnership with NVIDIA and are pleased to bring the breakthrough capabilities of Blackwell to our customers.

Chips are the foundation of our integrated end-to-end systems, from performance-optimized hardware and open software to flexible consumption patterns. All of this is brought together in our AI supercomputer (AI Hypercomputer), a groundbreaking supercomputer architecture.

Companies and developers are using it to deal with more complex challenges, more than twice as efficient as buying only original hardware and chips. The progress of our AI supercomputer is possible because we use liquid cooling in our data center.

We have been doing this for nearly 10 years, long before it became an advanced technology in the industry. Today, the total capacity of the liquid cooling system we deploy is close to 1 gigawatt and is growing-almost 70 times that of any other team.

The foundation behind this is the scale of our network, which connects our global infrastructure. Our network covers more than 2 million miles of land and undersea fiber optics: 10 times the number of cloud service providers that follow. ) above.

We will continue to make the necessary investments to advance AI innovation and provide state-of-the-art features.

Search for the most exciting chapters

One of our biggest areas of investment and innovation is our founding product, search. Twenty-five years ago, we created search to help people understand the surging tide of information on the Internet.

With each change of platform, we have made a breakthrough in helping to better answer your questions. On mobile devices, we use better context, location awareness and real-time information to unlock new questions and answers. With advances in natural language understanding and computer vision, we have implemented new ways to search, either by voice or humming to find your favorite new song, or with an image of the flower you see while walking. Now, you can even use Circle to Search to search for cool new shoes you might want to buy. Give it a try. You can always return it anyway!

Of course, search in the Gemini era will take all this to a whole new level, combining our infrastructure advantages, the latest AI features, high standards of information quality, and decades of experience connecting you to a wealth of networks. The result will be a product that works for you.

Google search is a generative AI that is large enough to satisfy human curiosity. This is our most exciting search chapter so far.

Smarter Gemini experienc

Gemini is more than just a chatbot; it aims to be your personal assistant who can help you deal with complex tasks and act on your behalf.

The interaction with Gemini should be conversational and intuitive. As a result, we have announced a new Gemini experience called Live, which allows you to have an in-depth conversation with Gemini using voice. We will also upgrade Gemini Advanced to 2 million tokens later this year to be able to upload and analyze ultra-dense files such as videos and long codes.

Gemini on Android

There are billions of Android users around the world, so we are pleased to integrate Gemini more deeply into the user experience. As your new AI assistant, Gemini can help you anytime, anywhere. We have integrated the Gemini model into Android, including our latest device-side model, the Gemini Nano Multimodal Model (Gemini Nano with Multimodality), which processes text, images, audio and voice, unlocking new experiences while ensuring the privacy of information stored on the device.

newcryptoplaytoearngames| Google I/O 2024: Opening a new generation of I/O

Our responsible AI approach

We continue to boldly and excitedly seize the opportunities brought about by AI. At the same time, we are also ensuring that we act responsibly. We are developing a cutting-edge technology called AI assisted Red team testing (AI-assisted red teaming), which takes advantage of Google DeepMind's breakthroughs in games such as AlphaGo to improve our model. In addition, we have extended the SynthID watermarking tool to two new modes of text and video, making it easier to identify the content generated by AI.

Work together to create the future

All of this shows the important progress we have made in getting AI to help everyone in a bold and responsible way.

We have been using the AI-first approach for a long time. Our decades of research leadership has created many modern breakthroughs that have powered AI progress for us and the industry as a whole. Most importantly, we have:

World-leading infrastructure built for the AI era

Cutting-edge innovation in search now supported by Gemini

Products that help on a very large scale-including 15 products with 500 million users

Let everyone-- partners, customers, creators, and everyone-- create a platform for the future.

This progress can be achieved because of our excellent developer community. You make all of this a reality through the experiences and applications you create every day. Here, I would like to pay tribute to all of you in Shoreline and millions of friends who watch online around the world: let us meet the infinite possibilities of the future and work together to create a better future.