Grok can create images now!
News as it breaks, unfiltered high quality image generations, a funnier fun mode, a mini model with great coding skills, Grok’s got it all now. Grok 2 (beta) is a sudden leap into giving its competition a run for its money. Last week, all premium users on X suddenly got the ability to generate images. It was so fast that they had to check whether they were on Grok or something else. To their disbelief, with no fanfare, a new blazing fast model had entered the scene. Controversial image generations, powered by the open source FLUX genAI model, drove the hype train for Grok 2. Some created hilariously edgy images, which cannot be shared here, prompting the “safety” team into action and censor some unfavorable requests from curious users. With the ability to summarize the latest events happening around the world, new coding skills and GenAI, Grok 2 stands out in benchmarks and gets added to the list of “multi-modal” LLMs that enrich our world right now. Best of all, it is still open source and will be available for use once the full version is out. Are you a premium user on X and use Grok?
Gemini Live is now your brainstorming partner
Gemini Live is Google’s exclusive conversational interface available to all “Gemini Advanced” users starting now. This is their response to industry leader OpenAI, which released a similar feature addition to their GPT4(omni) model. Google had its big moment this week, with the launch of the new Pixel 9 series, with the 9, 9 Pro, 9 Pro XL and 9 Pro Fold. Quite a mouthful, eh? Well, if you had taken the time to watch the presentation, you would already know that the time they spent on new hardware was only a small part of the whole. Although, we really enjoy the new Pixel experience, the Gemini features in it take the cake. Especially, Pixel Studio and Gemini Live. Pixel Studio lets you generate images on your device without much help from the server, but one step further, with Gemini Live you can talk to Gemini. What’s new? Imagine having a free-flowing conversation with AI, as if you’re talking to another person who is keenly listening to you, like over a regular phone call. Let’s say you have a new idea and want someone to discuss with, but your friends are way past their sleeping time, well, you can do it with Gemini now, because it can remember your personal context and even lets you interrupt in the middle of the conversation an steer it into an entirely new tangent. A new virtual friend with a casual demeanor is promised here, to help you accomplish some amazing things without feeling weird, but will it be good enough to be your long-term advisor? Only time can tell.
This article is brought to you in partnership with Truetalks Community by Truecaller, a dynamic, interactive network that enhances communication safety and efficiency. https://community.truecaller.com
TATA Curvv ADAS Level 2 demo
While Tesla broke the barriers when it comes to self-driven intelligent cars with accurate vision and Google’s Waymo becoming the first real self-driven taxi service, cruise control and “driving assistance” has been a feature in cars for a long time. Somehow we never got to see them in Indian brands, but that’s going to change and it promises to be good enough to someday, become intelligent enough. TATA, one of the biggest brand conglomerates in India, has started showing off “ADAS level 2” in its “Curvv” automotive. This new hatchback-like sedan comes with this feature that is aimed more at safety than intelligence. It will let you maintain a speed, maintain distance between cars, has autonomous emergency braking, can keep lanes and change lanes. It also supports blind spot detection, meaning there are sensors all around the body of the car, with a 360 degree view to help you park in 3-dimensional 3rd person point of view. Underneath the hood, there seems to be a lot of sensors here in the realm of Radar, Lidar and Ultrasonics. Not much is clear, but a live demo of this feature started doing the rounds on Twitter and became quite controversial, but it does give a great glimpse at this technology on live Indian roads. What do you feel about the evolution of Tata’s tech here? Is it advanced enough?
Halide – Process Zero
In times where AI has become the craze to follow for all kinds of marketing purposes, an anti movement has naturally sprung up, to declare that they are free of AI-features. New entry to the list, is surprisingly, a popular camera app on the iPhone, known as Halide. It’s extremely popular among artists, photographers and videographers who don’t like the default iPhone aesthetic. To those who feel that the iPhone camera app is too “basic”, Halide opens up a whole world of pro tools that lets you shoot whatever you want, exactly how you want it. Their new “Process Zero” takes it one step further. “Zero AI. Zero computational photography.” claims Halide, which introduces this mode as an option for pro photogs. According to their tweets, it completely bypasses the image processing algorithm on the iPhone, and “develops” photos from “raw sensor data”. Meaning, they have full access to the hardware and its data, which is how professionals like it. Of course, Apple doesn’t allow the entire access to raw megapixels without binning it down from 48 to 12, but from that point on, there is access. “Film-like” is how we can frame the quality of these shots, which, to film enthusiasts seems like great news. Halide – Process Zero is already out, with the 2.15 update and iPhone users can now take full advantage of their sensors and edit it to their liking.
Ark is a survival device with offline AI
The first prototype of an extremely interesting concept showed up this week, to much excitement of “survival enthusiasts”. Yes, you read that right, this device occupies an extremely small niche but totally redefines what an AI device can be. First of all, it is a “Survival companion for extreme conditions”. The extreme part means that internet is not available, electricity is rare, basically an apocalyptic scenario which demands that a gadget work completely OFFLINE. There are literally no network connectivity features on this small gadget , which features a compact touch screen. It is also extremely rugged and tough, protected by the highest grade armor-like materials, and a huge battery, which makes it look like a massive bar of soap. Or for some, it might look like an age-old portable FM radio. But the best thing about it, it has intelligence. Yes, there is a completely offline AI that can answer all kinds of “survival questions” with the entire knowledge of wikipedia and very specific survival-focused data, loaded on it. It also has fully offline maps of the entire world with street-level detail. It will never run out of power because it can charge itself on solar and last for really really long. Sounds crazy, right? Well, not many though it could be real, but that’s exactly what happened, with the first prototype shocking a lot of people, including us. We still don’t know what it runs on, or what kind of hardware is powering its innards, but all-in-all it does look like a great concept that uses AI in a way that we do not usually imagine.
McDonalds commissions an ad for fries with AI women
McDonalds Japan has become the first major entity in popular culture to embrace AI artists by creating a stunning advertisement video for its outlet’s products. The artist “kakudrop” is very well known in Twitter circles for creating stunning imagery using GenAI tools like “Midjourney” and “LumaLabs Dream Machine”. For example, he created this extremely realistic video of a Japanese girl taking a selfie in the middle of a busy road in Tokyo. Obviously, everyone wanted to know how he did it, but it became so big in Japan that he was invited to the National Art Center to exhibit his creations. This went even further when McDonalds approached to make a whole AD with beautiful Japanese women eating their world famous fries. Combined with a jingle, this AI ad has already garnered 12.7million views just on Twitter. Imagine the impact with entirely virtual characters, and all it took was an artist and his computer connected to the internet to create this stunning advertisement. Certainly, a new era of content creation has arrived with GenAI promising to deliver insanely high quality renderings of images that look as real or, like we expect in an advertisement, better than reality.
Dream Machine 1.5 now with text-to-video skills
Well, if you saw the ad above, you should know all this GenAI video madness is possible thanks to solutions like “LumaLabs” which recently unveiled their latest model “DreamMachine 1.5”. What makes DreamMachine different from other video based GenAI solutions is that it is a fully video model, and not an “image to animation” model. This is hard to understand if you are new to this space, so to break it down to simpler terms – Initially, the “GenAI video” apps, started off with an image and made animations “over the image” which created a lot of inconsistencies because every frame had to be generated based on the previous image. This made it nearly impossible to have consistency in detail. This is why some video models resorted to “text to video” which once again, started with an image and then “animated” parts of it using visual tricks that editors and vfx artists use. Meanwhile, LumaLabs went ahead with a completely different route, by taking an image, and developing a video out of it, using fully video-centric data. Like, for example, they generate “groups of frames” and add that motion onto the image, instead of animating each frame and stitching them together. This enables artists to take advantage of smooth motion and add “keyframes” from which they can continue their work to make an even longer video. This primary difference made “DreamMachine” from Luma Labs one of the best video AI models to date, and their dominance continues, with the latest update 1.5 which enables “text to video” which in GenAI parlance, means, type anything and get a video. Of course, its still not available for public access unlike in the beginning, but one can join the waitlist and try it when it becomes available. Meanwhile, AI enthusiasts are still waiting OpenAI’s fully video-centered model “Sora” which promises to be a game changer in itself.
Unitree G1 is now in mass production
This is the craziest robot video you will see this week. It’s a humanoid called “Unitree G1”. It’s dancing, jumping, taking leaps, turning around and doing all kinds of gymnastics that we have never seen before. Yes, it is also easily climbing stairs, crossing obstacles and running around like a normal person. At the end of this video, the infamous demo of a person trying to dislodge it, is also here. It maintains balance like a boss. This crazy video from “Unitree”, a company based in China, really shows us how far the robotics industry is going with the help of “Reinforced learning” or in other words, AI. The G1 is a bipedal humanoid robot, that has human-like abilities and it is made with the intention of helping us out in factories and possibly manage dangerous situations where humans are too valuable to send out to. We all know and have seen these scenarios in various science fiction movies where robots augment the human world with their abilities and intelligence. Sure, there might be concerns that humanoids and their neural systems gain too much intelligence about our world and start “talking to each other” but are we in any position to stop technology from advancing in to the future? And, are we in any position to deny great powers show their technology off? Not really, but technology always has its uses and it seems like the extremely agile and flexible Unitree G1 will have great uses in the real world, especially when it comes to assisting old people and labour in the dangerous factories and industries.
And that’s it for this week’s “World of Tech” which is a truly fascinating world full of science fictions becoming reality at a speed we have never seen before. Have fun, take care and see you in the next one.