SRI-Bangalore, Samsung’s largest R&D center outside Korea, has collaborated globally to develop AI language models for British, Indian, and Australian English, along with Thai, Vietnamese, and Indonesian.
Recently, engineers from various Samsung Research centers visited Bangalore to enhance Galaxy AI with Vietnamese, Thai, and Indonesian capabilities.
Development of Hindi Language Model
SRI-B also spearheaded the development of Hindi for Galaxy AI, a complex endeavor involving over 20 regional dialects, tonal variations, punctuation, and colloquialisms. The team conducted extensive AI model training using translated and transliterated data, addressing the linguistic nuances unique to Hindi.
Technical Challenges and Solutions
Giridhar Jakki, Head of Language AI at Samsung R&D Institute India – Bangalore, highlighted the challenges and rewards of integrating Hindi into Galaxy AI.
He emphasized the intricate phonetic structure of Hindi, which includes retroflex sounds not found in many languages. Collaborating with native linguists, the team developed specialized phonemes to support various Hindi dialects, ensuring accurate speech synthesis.
Academic Collaboration and Data Acquisition
The collaboration between Samsung and academic partners, such as the Vellore Institute of Technology, was pivotal. They curated nearly a million lines of audio data, crucial for incorporating Hindi into Galaxy AI, ensuring high-quality language support.
Galaxy AI now supports 16 languages, empowering users with offline translation capabilities in features like Live Translate, Interpreter, Note Assist, and Browsing Assist.
This project exemplifies Samsung’s philosophy of collaborative innovation, bridging cultural divides to deliver meaningful advancements in language technology.
Speaking about the collaboration, Giridhar Jakki, Head of Language AI at Samsung R&D Institute India – Bangalore, said:
I am extremely proud of what we have achieved with our partners. AI innovation through collaboration is central to our work. We will keep striving to understand, collect, and analyze language data so that more people can access AI tools in the future.