Dhwani: Audio blog posts

An expressive oil painting of high frequency sound waves moving through space by Midjourney

The past few months have been witness to some of the most hyped AI demos of recent times. Dall-e, stable diffusion, chatGPT, bing chat etc. have made the promise and fears about AI quite ubiquitous. The underlying technologies are incremental improvements over their previous versions, as with most things science, and not particularly groundbreaking. But the demos have captured the imagination of everyday people, especially those outside the machine learning universe. So much so that a FedEx delivery guy to my office was overheard saying “ChatGPT, am I right?”. The collapse of web3 also suspiciously coincides with the rise in myriad generative AI companies. But let’s ignore that for now.

That AI has been made topic of the month in the ever toxic culture war, comes as no surprise. It has wide-spread opposition from NYT editorials to LessWrong forum posts to DEI seminars, while Microsoft Bing engineers just want Sydney to stop having public meltdowns. Also Google? Are you there? I can hear you all go “sure, sure, the world as we know it will change, what’s new”. But I think we still don’t fully grasp that everything we hold familiar and commonplace will change and find its faithful place in a quaint movie about “the before times”. The world will change not because of better search, although that’s always great news. It will change the way the internet or the mobile phone changed the world. It is a “Britannica encyclopedia on the internet” vs “Google” kind of change. Every interface, every habit, every second of your existence will be shaped by AI, visibly and invisibly. AI will give us capabilities that we don’t know can exist. It won’t be your favourite science fiction plot, it will be a science fiction plot that has not been thought of yet.

The thought of these possibilities keeps me up at night, in a good way, but that’s enough AI schmoozing. Text to speech has long amazed me, and most out of the box TTS systems have been rudimentary at best, and outright stupid at worst. Recent AI progress has, as you may expect, blown competition out of the water. Dhwani, meaning “voice” in Sanskrit, is a side project I built that converts your blog posts into audio so that people can listen to them! It takes HTML files generated by hugo or jekyll and created wav files that you can embed in your markdowns. Check it out at https://github.com/nuwandavek/dhwani. Happy listening!