Google is launching Jetstream, a new engine to run generative AI models, and MaxDiffusion, a collection of reference implementations of various diffusion models.
In a typical year, Cloud Next — one of Google’s two major annual developer conferences, the other being I/O — almost exclusively features managed and otherwise closed source, gated-behind-locked-down-APIs products and services.
But this year, whether to foster developer goodwill or advance its ecosystem ambitions (or both), Google debuted a number of open source tools primarily aimed at supporting generative AI projects and infrastructure.
“XLA” stands for Accelerated Linear Algebra, an admittedly awkward acronym referring to a technique that optimizes and speeds up specific types of AI workloads, including fine-tuning and serving.
“As customers bring their AI workloads to production, there’s an increasing demand for a cost-efficient inference stack that delivers high performance,” Mark Lohmeyer, Google Cloud’s GM of compute and machine learning infrastructure, wrote in a blog post shared with TechCrunch.
MaxText now includes Gemma 7B, OpenAI’s GPT-3 (the predecessor to GPT-4), Llama 2 and models from AI startup Mistral — all of which Google says can be customized and fine-tuned to developers’ needs.
“These improvements maximize GPU and TPU utilization, leading to higher energy efficiency and cost optimization.”
The original article contains 493 words, the summary contains 180 words. Saved 63%. I’m a bot and I’m open source!
This is the best summary I could come up with:
In a typical year, Cloud Next — one of Google’s two major annual developer conferences, the other being I/O — almost exclusively features managed and otherwise closed source, gated-behind-locked-down-APIs products and services.
But this year, whether to foster developer goodwill or advance its ecosystem ambitions (or both), Google debuted a number of open source tools primarily aimed at supporting generative AI projects and infrastructure.
“XLA” stands for Accelerated Linear Algebra, an admittedly awkward acronym referring to a technique that optimizes and speeds up specific types of AI workloads, including fine-tuning and serving.
“As customers bring their AI workloads to production, there’s an increasing demand for a cost-efficient inference stack that delivers high performance,” Mark Lohmeyer, Google Cloud’s GM of compute and machine learning infrastructure, wrote in a blog post shared with TechCrunch.
MaxText now includes Gemma 7B, OpenAI’s GPT-3 (the predecessor to GPT-4), Llama 2 and models from AI startup Mistral — all of which Google says can be customized and fine-tuned to developers’ needs.
“These improvements maximize GPU and TPU utilization, leading to higher energy efficiency and cost optimization.”
The original article contains 493 words, the summary contains 180 words. Saved 63%. I’m a bot and I’m open source!