A Serverless API for ChatGPT text-to-speech answers
Lately, there has been a lot of noise around ChatGPT, for good reasons. Rather than just trying its service from OpenAI hosted web interface, I wanted to go one step further and combine ChatGPT with certain AWS serverless offerings.
This article presents a serverless API where consumers can POST a question towards ChatGPT. Its answer will convert text-to-speech as an MP3 audio file and notify users via email accordingly.
All resources will be provisioned via AWS Serverless Application Model (SAM). Let’s dive into it.
Architecture
The architecture consists of different serverless components, which will be provisioned via SAM CLI.
- An API Gateway acts as facade and entry point for our single RESTful endpoint. The “ask” POST endpoint accepts a JSON payload containing an English question.
- The Gateway forwards the payload to an asynchronous Step Function. The state machine consists of two lambda functions, responsible for using OpenAI API to get an answer and AWS Polly to convert text-to-speech as MP3 audio file stored to an S3 bucket.
- Finally, S3 event notifications trigger an SNS topic which sends an email with info about the newly uploaded file.
Provisioning resources
To provision resources via Infrastructure as Code (IaC), I will use SAM and the full template can be seen below. Let’s briefly analyse the file sections:
- Globals, contain common configuration to be shared across support SAM resources
- Parameters define dynamic data that can be provided as part of the sam deploy — guided command
- Resources include all AWS resources to be provisioned and mentioned in above architecture. Some details will follow shortly.
A state machine to manage them all
AWS Step Functions is a serverless orchestration service integrations various AWS services as tasks part of a state machine. At its core, the Amazon States Language provides a JSON-based language to define the state machine as a set of states.
Let’s see the two states in details via their Lambda function implementation:
- The Ask ChatGPT state will use OpenAI SDK to ask a question and receive an answer.
- The Text to Speech state will use Polly to convert the received answer into an audio file.
AskChatGPT Lambda function
This first Lambda function uses the question received by the API Gateway. OpenAI SDK package uses this info together with an AI model which does the magic. A list of available models can be seen below.
Using the SDK is very simple and just requires an API key which can be generated in your ChatGPT account. Below is the Lambda code:
For details about the request parameters, please refer to the below link.
TextToSpeech Lambda
The second function receives the answer from the previous Lambda function and leverages AWS Polly to perform an asynchronous conversion of the text string into an audio file via the Speech Synthesis Task.
In a nutshell, this creates an S3 synthesis scheduled task managed by Polly, which then saves its result into an S3 bucket. Below snippet will associate “Joanna” voice to the text and save the adutio MP3 file into S3.
Demo
By hitting the POST /ask endpoint from Postman, the result is an immediate 200 code as the call is asynchronous with a fire and forget style.
The Step Function starts and completes accordingly.
A S3 synthesis Polly task is scheduled.
Once the task completes, an MP3 file is saved into S3.
Finally an email gets sent with S3 Created event information:
Conclusion
OpenAI-based applications have the potential to be disruptive in many domains. This article just showcased the surface of how to combine ChatGPT with cloud native approaches with opportunities to automate smart text-to-speech applications in social networks apps.
Full code can be found on Github: https://github.com/aladevlearning/chatgpt-polly/tree/main/chatgtp-polly-api
Level Up Coding
Thanks for being a part of our community! Before you go:
- 👏 Clap for the story and follow the author 👉
- 📰 View more content in the Level Up Coding publication
- 🔔 Follow us: Twitter | LinkedIn | Newsletter
🚀👉 Join the Level Up talent collective and find an amazing job