Create a Multilingual Chatbot using OpenAI APIs and Dash
This article provides an overview of how you can use OpenAI’s chat completions API and Whisper-1 API to develop a multilingual chatbot. Several links will be provided to my GitHub repo that stores that code for https://practicealanguage.xyz. Many tech stacks could be used to build this web app, but I selected Dash as the framework to keep the codebase small and make the UI interactive. This can be accomplished as Dash uses callbacks for simplified frontend-backend interactions and although Dash is built using React.js, the code can largely be written in Python.
To set up the conversation, dropdown menus are provided to specify the language that you speak, the language that you want to learn, and the conversation setting. This information is used to prompt GPT-3.5-Turbo (GPT). GPT-4 could also be used, but given its higher costs, and the quality of responses from GPT-3.5, I found that it was a suitable tradeoff to stick with GPT-3.5.
Writing a prompt that would reliably produce a suitable response from GPT took a few iterations as it could reply in the language that you want to learn as well as in the language that you speak. For example:
Bonjour, qu'est-ce que je peux vous servir aujourd'hui? (Hello, what can I serve you today?)
Fortunately, after adding an example response (one-shot learning), GPT hasn’t made this mistake again and offers varied statements to start the conversation. I’ve also increased the temperature to 1.5 (from the default of 1) to provide more diverse responses from GPT.
The user can respond using either text or audio. Providing a text response is pretty straightforward as the user can reply in the language that they want to learn, or in the language they know and it will be translated using GoogleTranslator (via the Python package deep-translator). Although this translation isn’t always perfect, it’s free, which makes it a great tool for keeping costs down.
If the user responds using an audio recording it’s a bit more complicated. A clientside_callback is required to access the microphone on the user’s device to create the audio recording. This is because when the app is deployed on to Google Cloud Run, a normal callback function, which operates on the server-side (backend), doesn't have access to a device's microphone. Using a clientside_callback can access a device's microphone as this code operates on the front-end of the app.
When the user makes a recording, it is sent to the server-side of the app using a POST request. The recording is then saved locally before being sent to OpenAI’s Whisper model for speech-to-text transformation. The transcript from the Whisper model populates the user response input field, which allows the user to verify that they have been understood correctly before submitting their response to GPT and continuing the conversation.
Continuing the conversation and keeping the costs down
Each message from GPT and the user is appended to a list. As the conversation grows, only the most recent portion of the list will be sent to GPT as the cost of the API call grows with each token. As this is a simple language-learning app, it’s not essential for GPT to be aware of the full conversation, so we can take this cost-saving measure.
Understanding GPT’s responses
Depending on your knowledge of the language that you want to learn, you might not fully understand what GPT has said in its response. The dash_selectable package allows you to know what text has been highlighted and this can be fed to GoogleTranslate to provide you with a translation (another good use case for free translation).
I hope that you’ve enjoyed this overview of developing a chatbot to help you practice a language. Feel free to clone the repo on my GitHub and run the app yourself, and if you would like to deploy your own chatbot to Google Cloud Run, take a look at this article to read how to do that. I'll be keeping https://practicealanguage.xyz free and open-source, so it'll be waiting for you whenever you want to take another look at it.