In this new series – World of Tech This Week we take a look at how Flux’s GenAI blows people’s minds, Sarvam’s new made-in-India LLM, Humanoid Robot that works at BMW and more.
What’s next for OpenAI?
OpenAI, the most popular AI organization based in San Francisco, and its flagship product, ChatGPT, are awaiting a major upgrade. Before it could happen, a couple of rare updates happened to grab the attention this week. The first update was that one of the co-founders of OpenAI, Greg Brockman, announced that he is taking a break from his daily work to concentrate more on his family and friends for the time being. A couple of days later, to dismiss rumors of his ouster, he tweeted that he already feels left out from not being part of the daily grind at the most exciting upcoming projects at OpenAI.
At around the same time, Sam Altman posted a photo of strawberries growing in his garden, on X. This kickstarted a frenzy among OpenAI worshippers and AI-enthusiasts alike, for what it could mean, for the next GPT version. “GPT 5 confirmed within 4-6 weeks” one said, claiming to infer it from the size of the growth of those strawberries. Some others claimed that Sam was hinting at a certain “Project Strawberry” or “q star” which previous friend and colleague Ilya had apparently “seen”. Claiming to have inside info, an anonymous account, which many deride as fake, claimed that this LLM was “fine tuned to reason like a human” which has now become a running joke. Although there was a feverish wait for the next version of ChatGPT, the community still revelled in some humor and fun, but it also signalled a very important state in which the LLM based industry was in. Anthropic’s Claude Sonnet, Meta’s Llama 3 and Grok 2 are breathing down OpenAI’s necks, because Sora, their video model, which promises to be the best model for video and the 4o voice mode, are still in the process of being rolled out. They are also knee-deep in controversy since some feel they are too advanced, or have copyright issues, to be rolled out to the public. Is that the case? We will have to wait and see. All said and done, sufficiently advanced technology is indistinguishable from magic, isn’t it?
This article is brought to you in partnership with Truetalks Community by Truecaller, a dynamic, interactive network that enhances communication safety and efficiency. https://community.truecaller.com
TED talks that never happened
This week’s favorite tool for GenAI enthusiasts is “Flux” which made huge waves in this verse, as it clearly demonstrated stellar performance when it comes to creating “realistic, photoreal” synthetic images that matched the prowess of the leading GenAI product “MidJourney”. However, unlike Midjourney, which is paid, Flux is an open source model that can be hosted on any platform that offers compute capabilities. This new model started showing up with some amazing images that looked like they were taken on a phone, but the internet completely broke down when an image of a girl giving a TED talk appeared. It looked so real that it prompted everyone to give “Flux” a try and generate a lot of images of really good-looking people giving TED-talks and so many other kinds of images that looked barely fake. There are several ways to give flux a try. There are 3 open source models (pro, dev and schnell) and people are feverishly working out ways to make it easier than ever to use it.
No wonder this created a huge buzz on the webs. In fact, already mind-boggling video models like Runway and LumaLabs (the famed NeRF engine), catapulted this to stardom that continues till today, and that is of those insanely realistic images coming to life, with video. Some even went further and added voice and brought it to the realm of completely synthetic video that we can’t tell whether it is fake or real. In just one year, we have gone from absolute nonsense to almost real, and we are just getting started.
Casual robot works for BMW
When I say casual, I mean it. The “Figure 02” is the next iteration showcased by the company “Figure” which is funded by OpenAI. Their next robot has already started working at a BMW warehouse somewhere in Germany, and is showcasing some startling new features like “speech to speech reasoning”, the addition of “vision language model” understanding, a 20-hour battery life, six RGB cameras, new hands and 3x computational power. All these upgrades are leapfrogging their own 01, which was their first humanoid. The latest GPT model from OpenAI powers this new robot, which is casually walking around with its new hands, working out complex tasks with extreme fidelity and stability. In a video released by their founder, the robot is seen casually learning and performing important tasks at a BMW warehouse factory, assembling and moving things with superb precision. This marks a visible advancement in the field of robotics and the engine behind it, is of course, LLMs.
When we see the movie iRobot, we see science fiction, but in 2024, it is a reality. Of course, we are still far away from a situation like in the movies, but much closer to having a robot in our factories. This makes LLMs even more important, as it completely changes the game for “human labor” and everything related to it.
Enter Indic foundational Model “Sarvam”
The arena of LLMs is huge, and the players are many, but when you actually count them in nationalities, it is few. France has one, called Mistral, but Germany has none. LLMs are a key ingredient for adding “understanding” to machines, so every nation wants to own a piece of it, and not depend on other countries. Almost all advanced nations are working on their own LLMs, with the US leading the race, as expected. But India is not going to sit this one out, because “Sarvam 2B” is here. Their most recent launch is a foundational model on hugging face, and it is driven by the idea of having a “Sovereign AI”, for which Nvidia has become a key player in helping the players build out this model. Having native support for 10 Indic languages, the model is trained on 4 trillion tokens, with 50% of them coming from India. The rest of it is undoubtedly scrapped from the internet.
Additionally, the guys at Sarvam have also launched “Shuka” that understands Indic languages in audio form. “Mayura” is their new translation model with support for 10 major languages from India – Bengali, Gujarati, Kannada, Malayalam, Marathi, Oriya, Punjabi, Tamil and Telugu. There is also “BulBul” which is a text-to-speech model with colloquial voice data. “Saaras” is their speech-to-text model with, again, Indic language support. Ultimately, catering to all the basic needs, Sarvam is the only well-funded effort from India, backed by digital behemoths like Nandan Nilekani, who was instrumental in the nation’s “Aadhaar” system.
Are men lying about their heights in dating apps?
A fun consequence of having multi-modal capabilities is their ability to look at pictures and answer all kinds of questions. So, a very inquisitive bunch of girls found a way to check if men are lying about their heights on their dating profiles. Well, obviously if men say they are tall, they are tall, right? Apparently not so. This trend had been going on for a long time apparently, and these frustrated girls started downloading the profile pictures of men they like from dating sites and fed them to ChatGPT to ask “how tall is this man?” This trend then caught on with men as well, who started asking “how tall is this woman?” and ChatGPT answered dutifully, thanks to its insanely large datasets giving it enough patterns to match up and predict the measurements. Viola! It was almost right all the time! This has now become a trend that every person, wanting to test whether it is true, is now uploading their full-body profile pictures and asking ChatGPT what is their height, only to test whether it is able to say it right, or not. Would you do it?
So, that’s it for this week. Technology never waits for us. Let’s catch up in the next one!