Dataster Documentation

Dataster helps you build Generative AI applications with better accuracy and lower latency.

Add an Amazon OpenSearch Serverless Vector Store

In this guide, we will walk you through the process of integrating an AWS-hosted Amazon OpenSearch Serverless instance as an asset in Dataster. This includes configuring the necessary IAM permissions, creating a vector index, and adding the vector store to the Dataster Vector Store catalog. Specifically, we will index a set of movies along with their titles, years, summaries, and vectorized representations of the summaries. By the end of this guide, you will have a fully functional vector store in Dataster that can be used to create a Retrieval-Augmented Generation (RAG) system.

Prerequisites

  1. A Vector search collection has been created in AWS.
  2. The collection is publicly accessible.

Step 1: Configure IAM Principal

  1. Ensure the IAM principal has an access key and a secret access key.
  2. Assign the necessary privileges in IAM and Amazon OpenSearch Serverless to read documents in the collection.
  3. Add the additional necessary privileges in IAM and Amazon OpenSearch if the index need to be created.


IAM permissions

Step 2: Create a Vector Index

  1. Create a vector index within the collection.
  2. Name the index "movies" and the vector field "vector".
  3. Index movies with their titles, years, summaries, and a vectorized representation of their summaries.
  4. Use the nmslib engine and set the dimensions to 1,024, as we will use the Amazon Titan Text Embedding V2 model. Note that dimensions may vary with different embedding models.
  5. Choose cosine distance for the vector similarity measure.


add a vector field

Step 3: Ingest Data

  1. Ingest the following movies and their vectorized summaries:
  2. [ { "title": "The Shawshank Redemption", "year": 1994, "summary": "Two imprisoned men bond over a number of years, finding solace and eventual redemption through acts of common decency." }, { "title": "The Godfather", "year": 1972, "summary": "The aging patriarch of an organized crime dynasty transfers control of his clandestine empire to his reluctant son." }, { "title": "The Dark Knight", "year": 2008, "summary": "When the menace known as the Joker emerges from his mysterious past, he wreaks havoc and chaos on the people of Gotham." }, { "title": "Pulp Fiction", "year": 1994, "summary": "The lives of two mob hitmen, a boxer, a gangster and his wife, and a pair of diner bandits intertwine in four tales of violence and redemption." }, { "title": "Inception", "year": 2010, "summary": "A thief who steals corporate secrets through the use of dream-sharing technology is given the inverse task of planting an idea into the mind of a CEO." } ]

Step 4: Add Vector Store

  1. Navigate to the Dataster Vector Store catalog.
  2. Add the Amazon OpenSearch Serverless Vector store.
  3. Specify the embedding model used to create the vectors.


Add an Amazon OpenSearch Serverless vector store

Step 5: Explore Chunks

  1. Use the explorer to examine the chunks.


Chunk Explorer

Conclusion

You have successfully set up Amazon OpenSearch Serverless as a Vector Store in Dataster. This setup is a preliminary step that will enable you to create a Retrieval-Augmented Generation (RAG) system by combining this Vector Store with the Large Language Model (LLM) of your choice and a system prompt.


If you encounter any issues or need further assistance, please contact our support team at support@dataster.com.