Dataster Documentation

Dataster helps you build Generative AI applications with better accuracy and lower latency.

Add an Azure AI Search Vector Store

In this guide, we will walk you through the process of integrating an Azure-hosted AI Search instance as an asset in Dataster. This includes configuring the necessary API permissions, creating a vector index, and adding the vector store to the Dataster Vector Store catalog. Specifically, we will index a set of movies along with their titles, years, summaries, and vectorized representations of the summaries. By the end of this guide, you will have a fully functional vector store in Dataster that can be used to create a Retrieval-Augmented Generation (RAG) system.

Prerequisites

  1. An AI Search resource has been created in Azure.
  2. The resource is publicly accessible.

Step 1: Create a Query Key

  1. Ensure to create a query key.
  2. Optionally generate an admin key if the index needs to be created.


Query Keys

Step 2: Create a Vector Index

  1. Create a vector index within the resource.
  2. Name the index "movies" and the vector field "vector".
  3. Index movies with their titles, years, summaries, and a vectorized representation of their summaries.
  4. Use the hnsw algorithm and set the dimensions to 1,536, as we will use the OpenAI text-embedding-3-small model for embeddings. Note that dimensions may vary with different embedding models.
  5. Choose cosine distance for the vector similarity measure.

{
  "name": "movies",
  "fields": [
    {
      "name": "id",
      "type": "Edm.String",
      "key": true,
      "retrievable": true,
      "stored": true,
      "searchable": false,
      "filterable": false,
      "sortable": false,
      "facetable": false,
      "synonymMaps": []
    },
    {
      "name": "title",
      "type": "Edm.String",
      "key": false,
      "retrievable": true,
      "stored": true,
      "searchable": true,
      "filterable": false,
      "sortable": false,
      "facetable": false,
      "analyzer": "standard.lucene",
      "synonymMaps": []
    },
    {
      "name": "summary",
      "type": "Edm.String",
      "key": false,
      "retrievable": true,
      "stored": true,
      "searchable": true,
      "filterable": false,
      "sortable": false,
      "facetable": false,
      "analyzer": "standard.lucene",
      "synonymMaps": []
    },
    {
      "name": "vector",
      "type": "Collection(Edm.Single)",
      "key": false,
      "retrievable": false,
      "stored": true,
      "searchable": true,
      "filterable": false,
      "sortable": false,
      "facetable": false,
      "synonymMaps": [],
      "dimensions": 1536,
      "vectorSearchProfile": "vector-profile-1735587575124"
    }
  ],
  "scoringProfiles": [],
  "suggesters": [],
  "analyzers": [],
  "tokenizers": [],
  "tokenFilters": [],
  "charFilters": [],
  "normalizers": [],
  "similarity": {
    "@odata.type": "#Microsoft.Azure.Search.BM25Similarity"
  },
  "vectorSearch": {
    "algorithms": [
      {
        "name": "vector-config-1735587576288",
        "kind": "hnsw",
        "hnswParameters": {
          "m": 4,
          "efConstruction": 400,
          "efSearch": 500,
          "metric": "cosine"
        }
      }
    ],
    "profiles": [
      {
        "name": "vector-profile-1735587575124",
        "algorithm": "vector-config-1735587576288"
      }
    ],
    "vectorizers": [],
    "compressions": []
  },
  "@odata.etag": "\"0x8DD2909B8B2459F\""
}

Step 3: Ingest Data

  1. Ingest the following movies and their vectorized summaries:
  2. [ { "title": "The Shawshank Redemption", "year": 1994, "summary": "Two imprisoned men bond over a number of years, finding solace and eventual redemption through acts of common decency." }, { "title": "The Godfather", "year": 1972, "summary": "The aging patriarch of an organized crime dynasty transfers control of his clandestine empire to his reluctant son." }, { "title": "The Dark Knight", "year": 2008, "summary": "When the menace known as the Joker emerges from his mysterious past, he wreaks havoc and chaos on the people of Gotham." }, { "title": "Pulp Fiction", "year": 1994, "summary": "The lives of two mob hitmen, a boxer, a gangster and his wife, and a pair of diner bandits intertwine in four tales of violence and redemption." }, { "title": "Inception", "year": 2010, "summary": "A thief who steals corporate secrets through the use of dream-sharing technology is given the inverse task of planting an idea into the mind of a CEO." } ]

Step 4: Add Vector Store

  1. Navigate to the Dataster Vector Store catalog.
  2. Add the Azure AI Search Vector store.
  3. Specify the embedding model used to create the vectors.


LLM catalog

Step 5: Explore Chunks

  1. Use the explorer to examine the chunks.


Chunk Explorer

Conclusion

You have successfully set up Azure AI Search as a Vector Store in Dataster. This setup is a preliminary step that will enable you to create a Retrieval-Augmented Generation (RAG) system by combining this Vector Store with the Large Language Model (LLM) of your choice and a system prompt.


If you encounter any issues or need further assistance, please contact our support team at support@dataster.com.