A Serverless API for ChatGPT text-to-speech answers

Antonio Lagrotteria
Level Up Coding
Published in
4 min readDec 10, 2022

--

Lately, there has been a lot of noise around ChatGPT, for good reasons. Rather than just trying its service from OpenAI hosted web interface, I wanted to go one step further and combine ChatGPT with certain AWS serverless offerings.

This article presents a serverless API where consumers can POST a question towards ChatGPT. Its answer will convert text-to-speech as an MP3 audio file and notify users via email accordingly.

All resources will be provisioned via AWS Serverless Application Model (SAM). Let’s dive into it.

Architecture

The architecture consists of different serverless components, which will be provisioned via SAM CLI.

Serveless API for OpenAI text-to-speech answers
  • An API Gateway acts as facade and entry point for our single RESTful endpoint. The “ask” POST endpoint accepts a JSON payload containing an English question.
  • The Gateway forwards the payload to an asynchronous Step Function. The state machine consists of two lambda functions, responsible for using OpenAI API to get an answer and AWS Polly to convert text-to-speech as MP3 audio file stored to an S3 bucket.
  • Finally, S3 event notifications trigger an SNS topic which sends an email with info about the newly uploaded file.

Provisioning resources

To provision resources via Infrastructure as Code (IaC), I will use SAM and the full template can be seen below. Let’s briefly analyse the file sections:

  • Globals, contain common configuration to be shared across support SAM resources
  • Parameters define dynamic data that can be provided as part of the sam deploy — guided command
  • Resources include all AWS resources to be provisioned and mentioned in above architecture. Some details will follow shortly.
SAM template.yaml

A state machine to manage them all

AWS Step Functions is a serverless orchestration service integrations various AWS services as tasks part of a state machine. At its core, the Amazon States Language provides a JSON-based language to define the state machine as a set of states.

Step Function ASL

Let’s see the two states in details via their Lambda function implementation:

  1. The Ask ChatGPT state will use OpenAI SDK to ask a question and receive an answer.
  2. The Text to Speech state will use Polly to convert the received answer into an audio file.

AskChatGPT Lambda function

This first Lambda function uses the question received by the API Gateway. OpenAI SDK package uses this info together with an AI model which does the magic. A list of available models can be seen below.

ChatGPT Models

Using the SDK is very simple and just requires an API key which can be generated in your ChatGPT account. Below is the Lambda code:

Ask ChatGPT Lambda

For details about the request parameters, please refer to the below link.

TextToSpeech Lambda

The second function receives the answer from the previous Lambda function and leverages AWS Polly to perform an asynchronous conversion of the text string into an audio file via the Speech Synthesis Task.

In a nutshell, this creates an S3 synthesis scheduled task managed by Polly, which then saves its result into an S3 bucket. Below snippet will associate “Joanna” voice to the text and save the adutio MP3 file into S3.

Text-to-Speech Lambda

Demo

By hitting the POST /ask endpoint from Postman, the result is an immediate 200 code as the call is asynchronous with a fire and forget style.

1. Post request

The Step Function starts and completes accordingly.

2. Step Function execution

A S3 synthesis Polly task is scheduled.

Polly task

Once the task completes, an MP3 file is saved into S3.

Audio stored in S3

Finally an email gets sent with S3 Created event information:

Email received

Conclusion

OpenAI-based applications have the potential to be disruptive in many domains. This article just showcased the surface of how to combine ChatGPT with cloud native approaches with opportunities to automate smart text-to-speech applications in social networks apps.

Full code can be found on Github: https://github.com/aladevlearning/chatgpt-polly/tree/main/chatgtp-polly-api

Level Up Coding

Thanks for being a part of our community! Before you go:

🚀👉 Join the Level Up talent collective and find an amazing job

--

--

Engineering Manager | Full-Stack Architect | Team/Tech Lead with a passion for frontend, backend and cloud | AWS Community Builder