Using ChatGPT and Whisper: A New Approach to Blog Writing
An AI-generated image (using BlueWillow) with the prompt "A robot typing on a computer, simple design"
For my previous blog article on Excalidraw’s embedded scene, I wrote the post with the assistance of ChatGPT and Whisper. Today, I wanted to share more about how I approached this process and discuss the concept of grounding the model. Additionally, I’ll touch on the use of the Whisper model for accurate transcriptions. Let’s dive in!
To begin, I wanted to utilize ChatGPT to generate an entire blog post on a specific topic. However, it’s important to note that Large Language Models (LLM) (to which ChatGPT is) can occasionally produce false information, leading to what is known as hallucinations. Therefore, it’s crucial to double-check the content generated by the model to ensure accuracy.
To mitigate hallucination, one effective technique is to ground the model with fine-tunings (to note this article goes into much more depth on embeddings to which I didn’t use). At its core, the idea is by providing additional information/context to the LLM, we can guide the model’s responses and reduce the occurrence of false information.
ChatGPT has a 4096 tokens limit (approximately 3072 words), so you need to be able to provide all the upfront information within that limit. In my case, what I had was well within the limitations.
To ground my conversation with ChatGPT, I needed to provide text on the topic. I figured the best way would be to just talk about it, and turn that into a text transcription. Whisper is another AI tool I can leverage, which is a highly accurate speech recognition system.
I used a wonderful iOS app called Whisperboard on my iPhone (which is also opensource). This app allowed me to record my voice, and that audio was then processed locally on the device using a Whisper model. The result was an accurate transcription of the recording. While it took a few minutes for the process to complete, the quality of the transcription was impressive, in my opinion.
Here is a Gist linking to the Whisper transcription. It is worth noting that I used a Medium sized model, and I also passed the raw (single-line text dump) into ChatGPT and told it to reformat it into paragraphs.
I had my transcript to ground it, so I started chatting with ChatGPT to create the blog post:
Turn the following into a blog post (written in markdown):
It did a great job! Although, I wanted to focus on a different perspective and for it to include Markdown formatting.
Can we retry the blog post with markdown headings and the focus on the scene feature?
Bingo! I was happy with the results, except the title was long
Can you try a shorter title?
I then needed my teaser text for the post (shows on the homepage of my blog).
Can you write a post teaser (75~ words)?
Turns out that was too long.
Can you make it shorter?
At this point, most everything is done.
Naturally, I still exercised caution and double-checked the output produced by ChatGPT. However, 95% of the blog post was composed by ChatGPT, with only minor adjustments or clarifications from my side.
As an experiment, I also attempted to generate a blog post without fine-tuning the model, thus omitting to ground. In this case, I simply asked ChatGPT:
Write a blog post (in Markdown) using the following title “Using Excalidraw’s Embedded Scene Feature for Collaborative Diagramming”
Here is a Gist of what ChatGPT produced. There are some glaring issues with it, as it thinks embedded scenes are HTML embeddings. This piece has hallucinations as it fabricated incorrect details of Excalidraw. This highlights the need/importance of grounding.
Based on my experience, I believe this approach could be valuable for quickly transforming a brain dump into text. I’m going to continue to explore using LLMs and recent AI techniques for this blog and other areas of my life.