Bedrock unveiled: A RAG-based AI speaker assistant with Langchain

7 min readNov 25, 2023

This article presents the last part of enabling the Retrieval Augmented Generation (RAG) technique to an AI personal assistant named Johanna, whom I have been using for live talks and presentations.

The assistant’s evolution developed as the following:

1) Initially, the AI assistant was based on OpenAI ChatGpt

Build your personal speaker assistant with Amplify and ChatGPT

As a speaker, it can be difficult to keep track of all the information you need to convey during a presentation. In…

levelup.gitconnected.com

2) As Bedrock was announced, the assistant foundational model FM was moved to Bedrock A21Labs’ Jurassic Ultra foundation model to achieve similar behaviour.

Bedrock unveiled: A Quick Lambda example

alatech.medium.com

3) The next step focused on indexing external documents into embeddings. Those were eventually stored in a vector database for later consumption by the FM.

Bedrock unveiled: Indexing own data to OpenSearch Serverless via a Lambda

A journey to give the model a context to work with

aws.plainenglish.io

This article — the final puzzle piece — leverages Langchain to augment prompts towards Bedrock models with several sources, such as the stored embeddings and chat history.

Architecture

To architect such tool we need the following:

A NoSQL database, to store previous Chat conversations (DynamoDB).
A Vector Database, to store documents with contextual information in the form of embeddings (Amazon OpenSearch Serverless).
A Generative AI LLMs. This will use Amazon Bedrock models, both Amazon Titan and AI21Labs, for embeddings used as part of the vector search and FM glueing all the contextual documents, respectively.

The final architecture, outlining the scope of this article, is shown below. The dotted red area refers to the scope of this article:

Before jumping straight to implementation, let’s introduce some concepts/techniques, such as Langchain and RAG.

Langchain

LangChain is an open-source framework providing engineers with toolkits, libraries and abstractions to develop Large Language Model (LLM) applications.

LangChain

LangChain’s flexible abstractions and extensive toolkit allow developers to build context-aware, reasoning LLM…

www.langchain.com

At the time of writing, it comes supports Python and NodeJS. This article uses Python implementation because Python is the default choice for LLMs. I wanted to get out of my comfort zone and learn a toolkit that will be widely adopted for such use cases.

Retrieval Augmented Generation

RAG is an AI technique to retrieve data from outside an FM and augment users’ prompts by adding the relevant retrieved data in context.

This article will not cover the depths of the topic, but in a nutshell, the animation below should explain its flow and the AWS services used in this article quite intuitively.

How to ask questions to the assistant?

To ask questions to Johanna, we need to:

Provision DynamoDB to store conversations.
Build the AskLLM Lambda. This function will merge all document inputs (Dynamo, OpenSearch Serverless, created in the last article) and use Bedrock to ask the FM a specific question, leveraging the rich Langchain framework to do so.
Hook the frontend to consume the API

Provision the DynamoDB database

To add the NoSQL database via Amplify, you can use the Storage module, as shown below. This will create a DynamoDB table with a hash key (primary key) named «SessionId»:

alatech:~/environment $ amplify add storage
? Select from one of the below mentioned services: NoSQL Database
Welcome to the NoSQL DynamoDB database wizard
This wizard asks you a series of questions to help determine how to set up your NoSQL database table.
✔ Provide a friendly name · dynamodbMemories
✔ Provide table name · memories
You can now add columns to the table.
✔ What would you like to name this column · SessionId
✔ Choose the data type · string
✔ Would you like to add another column? (Y/n) · no
Before you create the database, you must specify how items in your table are uniquely organized. You do this by specifying a primary key. The primary key uniquely identifies each item in the table so that no two items can have the same key. This can be an individual column, or a combination that includes a primary key and a sort key.
To learn more about primary keys, see:
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.CoreComponents.html#HowItWorks.CoreComponents.PrimaryKey
Only one option for [Choose partition key for the table]. Selecting [SessionId].
✔ Do you want to add a sort key to your table? (Y/n) · no
You can optionally add global secondary indexes for this table. These are useful when you run queries defined in a different column than the primary key.
To learn more about indexes, see:
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.CoreComponents.html#HowItWorks.CoreComponents.SecondaryIndexes
✔ Do you want to add global secondary indexes to your table? (Y/n) · no
✔ Do you want to add a Lambda Trigger for your Table? (y/N) · no

I did change the table’s billing mode from provisioned (default behaviour and more expensive) to on-demand. To do this in Amplify, here is a reference on how to achieve it:

How to create DynamoDB tables in on-demand capacity mode on Amplify?

DynamoDB has two pricing models: provisioned capacity mode and on-demand capacity. Amplify always creates tables in…

stackoverflow.com

Prepare the “AskLLM” Lambda

The function code below, self-explanatory with inline comments, has four main tasks:

Retrieve past conversations from DynamoDB
Query the OpenSearch vector database
Build a proper prompt
Build a Langchain chain including all the above, which uses Bedrock LLM to ask the question.

import os
import json
import boto3
from langchain.llms.bedrock import Bedrock
from langchain.schema import Document
from langchain.chains import ConversationChain
from langchain.memory.chat_message_histories import DynamoDBChatMessageHistory
from langchain.vectorstores import OpenSearchVectorSearch
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationalRetrievalChain, RetrievalQA
from langchain.embeddings import BedrockEmbeddings
from langchain.prompts import PromptTemplate
from langchain.chains.question_answering import load_qa_chain

from opensearchpy import RequestsHttpConnection
from requests_aws4auth import AWS4Auth

service = 'aoss'
region = os.environ.get('REGION')  
index_name = os.environ.get('INDEX_NAME')        

bedrock_model_id = "ai21.j2-ultra-v1"
bedrock_embedding_model_id = "amazon.titan-embed-text-v1"

ssm = boto3.client('ssm')
endpoint = ssm.get_parameter(Name='/opensearch/serverless/endpoint')["Parameter"]["Value"]

def handler(event, context):
  print('received event:')
  print(event)
  
  # The question input of the end user
  question = json.loads(event.get("body")).get("input").get("question")
  
  # The Cognito Identity, used to associate past conversations for the logged user
  identity_id = json.loads(event.get("body")).get("input").get("identityId")

  # The Bedrock Runtime client, used to invoke the AI21Labs Jurassic model
  bedrock_client = get_bedrock_runtime_client(region)
    
  # The Bedrock invocation of the model
  bedrock_llm = get_bedrock_llm(bedrock_client, bedrock_model_id)
  
  # The Bedrock Embedding client, used to invoke the Amazon Titan embedding model
  bedrock_embeddings_client = get_bedrock_embedding_client(bedrock_client, bedrock_embedding_model_id)
    
  # 1) The past interaction with the user, aka memories
  memory = get_memory_from_dynamo(identity_id)
  
  # 2) Initialize the Vector database hosting all knowledge documents
  opensearch_vector_search_client = create_opensearch_vector_search_client(bedrock_embeddings_client)
  
  # 3) Build prompt to be used by Langchain chain
  PROMPT = build_prompt_template()
    
  # 4) The Langchain mergin LLMs, Vector database and memory, if provided  
  qa = RetrievalQA.from_chain_type(
    llm=bedrock_llm, 
    chain_type="stuff", 
    retriever=opensearch_vector_search_client.as_retriever(),
    return_source_documents=False,
    chain_type_kwargs={"prompt": PROMPT, "verbose": True},
    memory= memory,
    verbose=True
  )
    
  # The response may contain a few information, such as source documents if needed. 
  # In this case we are interested in the LLM reply, aka the result
  response = qa(question, return_only_outputs=False)
  answer = response.get('result')      
    
  print(f"The answer from Bedrock {bedrock_model_id} is: {response.get('result')}")
  
  return {
      'statusCode': 200,
      'headers': {
          'Access-Control-Allow-Headers': '*',
          'Access-Control-Allow-Origin': '*',
          'Access-Control-Allow-Methods': 'OPTIONS,POST,GET'
      },
      'body': json.dumps({ "Answer": answer })
  }
  
def get_bedrock_embedding_client(bedrock_client, bedrock_embedding_model_id):
    bedrock_embeddings_client = BedrockEmbeddings(
        client=bedrock_client,
        model_id=bedrock_embedding_model_id)
    return bedrock_embeddings_client
    
def get_bedrock_runtime_client(region):
    bedrock_client = boto3.client("bedrock-runtime", region_name=region)
    return bedrock_client
    
def get_bedrock_llm(bedrock_client, model_version_id):
    
    model_kwargs =  {
        "maxTokens": 1024, 
        "temperature": 0.8, 
        "topP": 1
    }
    
    bedrock_llm = Bedrock(
        model_id=model_version_id, 
        client=bedrock_client,
        model_kwargs=model_kwargs
    )
    
    return bedrock_llm

def get_memory_from_dynamo(session_id):
  message_history = DynamoDBChatMessageHistory(table_name="memories-dev", session_id=session_id)
  
  return ConversationBufferMemory(
    memory_key="history", 
    chat_memory=message_history, 
    return_messages=True,
    ai_prefix="A",
    human_prefix="H"
  )

def create_opensearch_vector_search_client(embedding_function):
    
    credentials = boto3.Session().get_credentials()
    awsauth = AWS4Auth(credentials.access_key, credentials.secret_key,
                   region, service, session_token=credentials.token)
      
    docsearch = OpenSearchVectorSearch(
        index_name=index_name,
        embedding_function=embedding_function,
        opensearch_url=endpoint,
        http_auth=awsauth,
        is_aoss=True,
        timeout = 300,
        use_ssl = True,
        verify_certs = True,
        connection_class = RequestsHttpConnection
    )

    return docsearch

    
def build_prompt_template():
  prompt_template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. don't include harmful content

    {context}

    Question: {question}
    Answer:"""
    
  return PromptTemplate(
        template=prompt_template, input_variables=["context", "question"]
  )

Hook the frontend to consume the API

The React client will consume the previously described logic via a REST API, as simple as that:

const {identityId} = await Auth.currentCredentials();

const completion = await API.post('bedrockapiassistant', '/ask-with-langchain', {
  body: {
    input: {
      question: transcript,
      identityId
    }
  },
});

Demo

A sneak pick of the interaction. The bot knew me because of my previous conversations and now has context provided by uploaded documents in the form of text files with information like:

Antonio is an AWS Community builder working in financial sector who enjoys to share technical content with crowds about frontend, backend and AWS cloud.
Today he is joining a Show and Tell session for AWS Reivent 2023: an event where AWS Community Builders and AWS Heroes showcase projects they have been working on. Presentation will be 10 minutes.

Finally, it is instructed to define itself as Joanna (the name comes from the selected Polly voice) because of this text:

Your task is to perform the following actions:
1- provide answers to my questions
2- act as an AI assistant who is gonna help the presenter by answering questions
3- your name is Joanna and you will help the presenter during an live talk around the evolution of AI assistant, from OpenAI to Retrieval Augmented Technique with Amazon Bedrock
4- provide no more than 2 sentence long answers
5- more important, always answer based on the context

Conclusion

The AI assistant is now complete. It can use external documents to use context when answering questions.

This series was fun, allowing me to get hands-on and learn a concrete use case for a topic that will be more pervasive in our IT (and non) life.

I presented a journey based on the ever-evolving AI technologies while bringing this to a practical example that I have. I will be using it in most of my public talk interactions.

Unless some crazy developments (don’t impossible!) emerge, this is it for the series. I hope you enjoyed the ride!