-
Notifications
You must be signed in to change notification settings - Fork 271
Add comprehensive AWS Bedrock embedders guide #3426
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
tbuatois
wants to merge
4
commits into
meilisearch:main
Choose a base branch
from
tbuatois:update-bedrock-embedders-guide
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,162 @@ | ||
| --- | ||
| title: Semantic Search with AWS Bedrock Embedding Models | ||
| description: This guide will walk you through the process of setting up Meilisearch with AWS Bedrock embedding models to enable semantic search capabilities. | ||
| --- | ||
|
|
||
| ## Introduction | ||
|
|
||
| This guide will walk you through the process of setting up Meilisearch with AWS Bedrock embedding models to enable semantic search capabilities. By leveraging Meilisearch's AI features and AWS Bedrock's embedding models, you can enhance your search experience and retrieve more relevant results using high-quality embedding models from Amazon and third-party providers available on Bedrock. | ||
|
|
||
| ## Requirements | ||
|
|
||
| To follow this guide, you'll need: | ||
|
|
||
| - A [Meilisearch Cloud](https://www.meilisearch.com/cloud) project running version >=1.11 or a self-hosted Meilisearch instance | ||
| - An AWS account with Bedrock access and API key for embedding generation. You can sign up for an AWS account at [AWS](https://aws.amazon.com/). | ||
| - Access to embedding models available on AWS Bedrock in your AWS account | ||
| - No backend required. | ||
|
|
||
| ## Setting up Meilisearch | ||
|
|
||
| To set up an embedder in Meilisearch, you need to configure it in your settings. You can refer to the [Meilisearch documentation](/reference/api/settings) for more details on updating the embedder settings. | ||
|
|
||
| AWS Bedrock provides access to several embedding models: | ||
|
|
||
| - `amazon.titan-embed-text-v1`: 1,536 dimensions (Amazon Titan Text Embeddings G1) | ||
| - `amazon.titan-embed-text-v2:0`: 1,024 dimensions (Amazon Titan Text Embeddings V2) | ||
| - `amazon.nova-2-multimodal-embeddings-v1:0`: 256/384/1024/3072 dimensions (Amazon Nova 2 Multimodal Embeddings - supports text, images, video, and audio) | ||
| - `cohere.embed-english-v3`: 1,024 dimensions (Cohere English embeddings) | ||
| - `cohere.embed-multilingual-v3`: 1,024 dimensions (Cohere Multilingual embeddings) | ||
|
|
||
| ### Getting a Bedrock API key | ||
|
|
||
| Before configuring the embedder, you'll need to obtain a Bedrock API key: | ||
|
|
||
| 1. Sign in to the [AWS Management Console](https://console.aws.amazon.com/) | ||
| 2. Navigate to the [Amazon Bedrock console](https://console.aws.amazon.com/bedrock/) | ||
| 3. In the left navigation, choose **API keys** | ||
| 4. Choose **Generate API key** | ||
| 5. Set an expiration period (recommended: 30 days for testing) | ||
| 6. Copy the generated API key | ||
|
|
||
| **Important**: Make sure to generate your API key in the same AWS region where you plan to use the Bedrock embedding models, as API keys are region-specific. | ||
|
|
||
| Here's an example of embedder settings for AWS Bedrock embedding models using Amazon Titan: | ||
|
|
||
| ```json | ||
| { | ||
| "bedrock-titan": { | ||
| "source": "rest", | ||
| "url": "https://bedrock-runtime.us-west-2.amazonaws.com/model/amazon.titan-embed-text-v2:0/invoke", | ||
| "apiKey": "<Your Bedrock API Key>", | ||
| "dimensions": 1024, | ||
| "documentTemplate": "<Custom template (Optional, but recommended)>", | ||
| "request": { | ||
| "inputText": "{{text}}" | ||
| }, | ||
| "response": { | ||
| "embedding": "{{embedding}}" | ||
| } | ||
| } | ||
| } | ||
|
|
||
| In this configuration: | ||
|
|
||
| - `source`: Specifies the source of the embedder, which is set to "rest" for using Bedrock's REST API. | ||
| - `url`: The Bedrock Runtime API endpoint for the specific model and region. | ||
| - `apiKey`: Replace `<Your Bedrock API Key>` with your actual Bedrock API key. | ||
| - `dimensions`: Specifies the dimensions of the embeddings. Set to 1024 for Titan V2 and Cohere models, or 1536 for Titan V1. | ||
| - `documentTemplate`: Optionally, you can provide a [custom template](/learn/ai_powered_search/getting_started_with_ai_search) for generating embeddings from your documents. | ||
| - `request`: The request format expected by the Bedrock model. | ||
| - `response`: The response format returned by the Bedrock model. | ||
|
|
||
| For different Bedrock embedding models, you'll need to adjust the URL and request/response formats: | ||
|
|
||
| **Cohere models** use a different format: | ||
| ```json | ||
| { | ||
| "cohere-english": { | ||
| "source": "rest", | ||
| "url": "https://bedrock-runtime.us-west-2.amazonaws.com/model/cohere.embed-english-v3/invoke", | ||
| "apiKey": "<Your Bedrock API Key>", | ||
| "dimensions": 1024, | ||
| "request": { | ||
| "texts": ["{{text}}"], | ||
| "input_type": "search_document" | ||
| }, | ||
| "response": { | ||
| "embeddings": ["{{embedding}}"] | ||
| } | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| **Amazon Nova 2 Multimodal Embeddings** uses a different request format: | ||
| ```json | ||
| { | ||
| "nova-multimodal": { | ||
| "source": "rest", | ||
| "url": "https://bedrock-runtime.us-west-2.amazonaws.com/model/amazon.nova-2-multimodal-embeddings-v1:0/invoke", | ||
| "apiKey": "<Your Bedrock API Key>", | ||
| "dimensions": 1024, | ||
| "request": { | ||
| "schemaVersion": "nova-multimodal-embed-v1", | ||
| "taskType": "SINGLE_EMBEDDING", | ||
| "singleEmbeddingParams": { | ||
| "embeddingPurpose": "GENERIC_INDEX", | ||
| "embeddingDimension": 1024, | ||
| "text": { | ||
| "truncationMode": "END", | ||
| "value": "{{text}}" | ||
| } | ||
| } | ||
| }, | ||
| "response": { | ||
| "embeddings": [{"embedding": "{{embedding}}"}] | ||
| } | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| Once you've configured the embedder settings, Meilisearch will automatically generate embeddings for your documents and store them in the vector store. | ||
|
|
||
| It's recommended to monitor the tasks queue to ensure everything is running smoothly. You can access the tasks queue using the Cloud UI or the [Meilisearch API](/reference/api/tasks). | ||
|
|
||
| ## Testing semantic search | ||
|
|
||
| With the embedder set up, you can now perform semantic searches using Meilisearch. When you send a search query, Meilisearch will generate an embedding for the query using the configured Bedrock embedding model and then use it to find the most semantically similar documents in the vector store. | ||
|
|
||
| To perform a semantic search, you simply need to make a normal search request but include the hybrid parameter: | ||
|
|
||
| ```json | ||
| { | ||
| "q": "<Query made by the user>", | ||
| "hybrid": { | ||
| "semanticRatio": 1, | ||
| "embedder": "bedrock-titan" | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| In this request: | ||
|
|
||
| - `q`: Represents the user's search query. | ||
| - `hybrid`: Specifies the configuration for the hybrid search. | ||
| - `semanticRatio`: Allows you to control the balance between semantic search and traditional search. A value of 1 indicates pure semantic search, while a value of 0 represents full-text search. You can adjust this parameter to achieve a hybrid search experience. | ||
| - `embedder`: The name of the embedder used for generating embeddings. Make sure to use the same name as specified in the embedder configuration, which in this case is "bedrock-titan". | ||
|
|
||
| You can use the Meilisearch API or client libraries to perform searches and retrieve the relevant documents based on semantic similarity. | ||
|
|
||
| ## Important considerations | ||
|
|
||
| **Setup order**: Configure embedders before indexing documents. Embeddings are only generated when documents are indexed with embedders already configured. If you indexed documents before configuring embedders, you must re-index them. | ||
|
|
||
| **Regional endpoints**: Bedrock is available in multiple AWS regions. Make sure to use the correct endpoint URL for your region (e.g., `us-east-1`, `us-west-2`, `eu-west-1`). Your API key must be generated in the same region as the endpoint you're using. | ||
|
|
||
| **Model availability**: Embedding models are generally available on Bedrock without requiring special access requests. | ||
|
|
||
| ## Conclusion | ||
|
|
||
| By following this guide, you should now have Meilisearch set up with AWS Bedrock embedding models, enabling you to leverage semantic search capabilities in your application. | ||
|
|
||
| To explore further configuration options for embedders, consult the [detailed documentation about the embedder setting possibilities](/reference/api/settings). | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tbuatois I think "```" is missing, and because it's the delimiter of GitHub suggestion, I cannot add it myself
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And what's the difference between what I have, which is working perfectly with our Flickr demo? I would rather use mine as I know it works.