Skip to content

Latest commit

 

History

History
386 lines (252 loc) · 23.9 KB

README.md

File metadata and controls

386 lines (252 loc) · 23.9 KB

Get Started with Chat Using Azure AI Foundry

MENU: FEATURES | GETTING STARTED | CONFIGURE YOUR ENVIRONMENT | DEPLOYMENT | RESOURCE CLEAN-UP | TRACING AND MONITORING | GUIDANCE

This solution contains a simple chat application that is deployed to Azure Container Apps. There are instructions for deployment through GitHub Codespaces, VS Code Dev Containers, and your local development environment.

Important Security Notice

This template, the application code and configuration it contains, has been built to showcase Microsoft Azure specific services and tools. We strongly advise our customers not to make this code part of their production environments without implementing or enabling additional security features.

For a more comprehensive list of best practices and security recommendations for Intelligent Applications, visit our official documentation.

Features

This solution creates an Azure AI Foundry hub, project and connected resources including Azure AI Services, AI Search and more. More details about the resources can be found in the resources documentation. There are options to enable Retrieval-Augmented Generation (RAG) and use logging, tracing, and monitoring.

Architecture diagram

Architecture diagram showing that user input is provided to the Azure Container App, which contains the app code. With user identity and resource access through managed identity, the input is used to form a response. The input and the Azure monitor are able to use the Azure resources deployed in the solution: Application Insights, Azure AI Project, Azure AI Services, Azure AI Hub, Storage account, Azure Container App, Container Registry, Key Vault, Log Analytics Workspace, and Search Service. The app code runs in Azure Container apps to process the user input and generate a response to the user. It leverages Azure AI projects and Azure AI services, including the model and search service.

Getting Started

Quick Deploy

Open in GitHub Codespaces Open in Dev Containers

Github Codespaces and Dev Containers both allow you to download and deploy the code for development. You can also continue with local development. Once you have selected your environment, follow the instructions below to customize and deploy your solution.

Prerequisites

Azure account

If you do not have an Azure Subscription, you can sign up for a free Azure account and create an Azure Subscription.

To deploy this Azure environment successfully, your Azure account (the account you authenticate with) must have the following permissions and prerequisites on the targeted Azure Subscription:

You can view the permissions for your account and subscription by going to Azure portal, clicking 'Subscriptions' under 'Navigation' and then choosing your subscription from the list. If you try to search for your subscription and it does not come up, make sure no filters are selected. After selecting your subscription, select 'Access control (IAM)' and you can see the roles that are assigned to your account for this subscription. If you want to see more information about the roles, you can go to the 'Role assignments' tab and search by your account name and then click the role you want to view more information about.

Additionally the following are required for successful deployment:

  • Sufficient quotas available to deploy the selected chat and embedding model.
  • Regional availability: The chosen model must be available in the Azure region where your Azure AI Foundry environment is created. Verify region availability here.

Required tools

Make sure the following tools are installed:

  1. Azure Developer CLI (azd) Install or update to the latest version. Instructions can be found on the linked page.
  2. Python 3.9+
  3. Git
  4. Docker Desktop

Configure your Environment

This section details the customizable options for this solution, including chat model, knowledge retrieval, logging, tracing, and quota recommendations. If you want to proceed with the default settings, continue to the deployment section.

Code

If you are using one of the Quick Deploy options, open the codespace now.

If you are not using any of the Quick Deploy options, download the project code:

git clone https://github.com/Azure-Samples/get-started-with-ai-chat.git

At this point you could make changes to the code if required. However, no changes are needed to deploy and test the app as shown in the next step.

Logging

To enable logging to a file, navigate to src/Dockerfile and edit the code to uncomment the following line:

# ENV APP_LOG_FILE=app.log

By default the file name app.log is used. You can provide your own file name by replacing app.log with the desired log file name.

NOTE! Any changes to the Dockerfile require a re-deployment in order for the changes to take effect.

The provided file logging implementation is intended for development purposes only, specifically for testing with a single client/worker. It should not be used in production environments after the R&D phase.

Tracing to Azure Monitor

To enable tracing to Azure Monitor, navigate to src/Dockerfile and modify the value of ENABLE_AZURE_MONITOR_TRACING environment variable to true:

ENV ENABLE_AZURE_MONITOR_TRACING=true

Note that the optional App Insights resource is required for tracing to Azure Monitor (it is created by default).

To enable message contents to be included in the traces, set the following environment variable to true in the same Dockerfile. Note that the messages may contain personally identifiable information.

ENV AZURE_TRACING_GEN_AI_CONTENT_RECORDING_ENABLED=true

Configurable Deployment Settings

When you start a deployment, most parameters will have default values. You can change the following default settings:

Setting Description Default value
Azure Region Select a region with quota which supports your selected model.
Model Choose from the list of models supported by Azure AI Agent Service for your selected region gpt-4o-mini
Model Format Choose from OpenAI or Microsoft, depending on your model OpenAI
Model Deployment Capacity Configure capacity for your model. Recommended value is 100k. 30k
Embedding Model Choose from text-embedding-3-large, text-embedding-3-small, and text-embedding-ada-002. text-embedding-3-small
Embedding Model Capacity Configure capacity for your embedding model. 30k

For a detailed description of customizable fields and instructions, view the deployment customization guide.

Quota Recommendations

The default for the model capacity in deployment is 30k tokens. For optimal performance, it is recommended to increase to 100k tokens. You can change the capacity by following the steps in setting capacity and deployment SKU.

  • Navigate to the home screen of the Azure AI Foundry Portal.
  • Select Quota Management buttom at the bottom of the home screen.
  • In the Quota tab, click the GlobalStandard dropdown and select the model and region you are using for this accelerator to see your available quota. Please note gpt-4o-mini and text-embedding-ada-002 are used as default.
  • Request more quota or delete any unused model deployments as needed.

Retrieval-Augmented Generation (RAG)

The Retrieval-Augmented Generation (RAG) feature helps improve the responses from your application by combining the power of large language models (LLMs) with extra context retrieved from an external data source. Simply put, when you ask a question, the application first searches through a set of relevant documents (stored as embeddings) and then uses this context to provide a more accurate and relevant response. If no relevant context is found, the application returns the LLM response directly.

This feature is disabled by default. To configure and enable the RAG feature in your application, please refer to the following detailed documentation:

Retrieval-Augmented Generation (RAG) Setup Guide

Deployment

Deployment Options

Pick from the options below to see step-by-step instructions for: GitHub Codespaces, VS Code Dev Containers, and Local Environment. If you encounter an issue with any of the following options, try a different one.

GitHub Codespaces

GitHub Codespaces

You can run this template virtually by using GitHub Codespaces. The button will open a web-based VS Code instance in your browser:

  1. Open the template (this may take several minutes):

    Open in GitHub Codespaces

  2. Open a terminal window

  3. Continue with the deploying steps

VS Code Dev Containers

VS Code Dev Containers

A related option is VS Code Dev Containers, which will open the project in your local VS Code using the Dev Containers extension:

  1. Start Docker Desktop (install it if not already installed Docker Desktop)

  2. Open the project:

    Open in Dev Containers

  3. In the VS Code window that opens, once the project files show up (this may take several minutes), open a terminal window.

  4. Continue with the deploying steps

Local Environment

Local Environment

  1. Confirm that you have the required tools installed from the prerequisites section and the code downloaded from the code section
  2. Open the project folder in your terminal or editor
  3. Continue with the deploying steps
Local Development Server

Local Development Server

You can optionally use a local development server to test app changes locally. Make sure you first deployed the app to Azure by following the deploying steps before running the development server.

  1. Create a Python virtual environment and activate it.

    On Windows:

    python -m venv .venv
    .venv\scripts\activate

    On Linux:

    python3 -m venv .venv
    source .venv/bin/activate
  2. Navigate to the src directory:

    cd src
  3. Install required Python packages:

    python -m pip install -r requirements.txt
  4. Duplicate src/.env.sample and name to .env.

  5. Fill in the environment variables in .env.

  6. Tracing and logging:

    To enable logging to a file, add the APP_LOG_FILE environment variable definition to the .env file in the src directory. See Logging for more information. As an example, to log to a file named app.log add the following to the .env file:

    APP_LOG_FILE=app.log
    

    To enable Azure Monitor tracing, add the ENABLE_AZURE_MONITOR_TRACING and AZURE_TRACING_GEN_AI_CONTENT_RECORDING_ENABLED environment variable definitions to the .env file in the src directory. See Tracing to Azure Monitor for more information. As an example, to enable tracing to Azure Monitor without tracing message contents, add the following to the '.env' file:

    ENABLE_AZURE_MONITOR_TRACING=true
    AZURE_TRACING_GEN_AI_CONTENT_RECORDING_ENABLED=false
    
  7. Run the local server:

    python -m uvicorn "api.main:create_app" --factory --reload
  8. Click 'http://127.0.0.1:8000' in the terminal, which should open a new tab in the browser.

  9. Enter your message in the box.

Deploying Steps

Once you've opened the project in Codespaces or in Dev Containers or locally, you can deploy it to Azure following the following steps.

  1. Login to Azure:

    azd auth login
  2. (Optional) If you would like to customize the deployment to disable resources, customize resource names, customize the models or increase quota, you can follow those steps now.

    ⚠️ NOTE! For optimal performance, the recommended quota is 100k tokens per minute. If you have the capacity, we recommend increasing the quota by running the following command:

    azd env set AZURE_AI_CHAT_DEPLOYMENT_CAPACITY 100

    ⚠️ If you do not increase your quota, you may encounter rate limit issues. If needed, you can increase the quota after deployment by editing your model in the Models and Endpoints tab of the Azure AI Foundry Portal.

  3. Provision and deploy all the resources by running the following in get-started-with-ai-chat directory:

    azd up
  4. You will be prompted to provide an azd environment name (like "azureaiapp"), select a subscription from your Azure account, and select a location which has quota for all the resources. Then, it will provision the resources in your account and deploy the latest code.

    • For guidance on selecting a region with quota and model availability, follow the instructions in the quota recommendations section and ensure that your model is available in your selected region by checking the list of models supported by Azure AI Agent Service
    • This deployment will take 7-10 minutes to provision the resources in your account and set up the solution with sample data.
    • If you get an error or timeout with deployment, changing the location can help, as there may be availability constraints for the resources. You can do this by running azd down and deleting the .azure folder from your code, and then running azd up again and selecting a new region.

    NOTE! If you get authorization failed and/or permission related errors during the deployment, please refer to the Azure account requirements in the Prerequisites section. If you were recently granted these permissions, it may take a few minutes for the authorization to apply.

  5. When azd has finished deploying, you'll see an endpoint URI in the command output. Visit that URI, and you should see the app! 🎉

    • From here, you can send messages in the chat. Send a greeting, ask for a joke, or just start a conversation! If you have enabled Retrieval Augmented Generation, try asking about the uploaded data.
    • You can view information about your deployment with:
      azd show
  6. (Optional) Now that your app has deployed, you can view your resources in the Azure Portal and your deployments in Azure AI Foundry.

    • In the Azure Portal, navigate to your environment's resource group. The name will be rg-[your environment name]. Here, you should see your container app, storage account, and all of the other resources that are created in the deployment.
    • In the Azure AI Foundry Portal, select your project. If you navigate to the Models and Endpoints tab, you should see your AI Services connection with your model deployments.
  7. (Optional) If you make further modification to the app code, you can deploy the updated version with:

    azd deploy

    You can get more detailed output with the --debug parameter.

    azd deploy --debug

    Check for any errors during the deployment, since updated app code will not get deployed if errors occur.

  8. (Optional) You can use a local development server to test app changes locally. To do so, follow the steps in local deployment server after your app is deployed.

Resource Clean-up

To prevent incurring unnecessary charges, it's important to clean up your Azure resources after completing your work with the application.

  • When to Clean Up:

    • After you have finished testing or demonstrating the application.
    • If the application is no longer needed or you have transitioned to a different project or environment.
    • When you have completed development and are ready to decommission the application.
  • Deleting Resources: To delete all associated resources and shut down the application, execute the following command:

    azd down

    Please note that this process may take up to 20 minutes to complete.

⚠️ Alternatively, you can delete the resource group directly from the Azure Portal to clean up resources.

Tracing and Monitoring

You can view console logs in Azure portal. You can get the link to the resource group with the azd tool:

azd show

Or if you want to navigate from the Azure portal main page, select your resource group from the 'Recent' list, or by clicking the 'Resource groups' and searching your resource group there.

After accessing you resource group in Azure portal, choose your container app from the list of resources. Then open 'Monitoring' and 'Log Stream'.

If you enabled logging to a file, you can view the log file by choosing 'Console' under the 'Monitoring' (same location as above for the console traces), opening the default console and then for example running the following command (replace app.log with the actual name of your log file):

more app.log

You can view the App Insights tracing in Azure AI Foundry. Select your project on the Azure AI Foundry page and then click 'Tracing'.

Guidance

Costs

Pricing varies per region and usage, so it isn't possible to predict exact costs for your usage. The majority of the Azure resources used in this infrastructure are on usage-based pricing tiers. However, Azure Container Registry has a fixed cost per registry per day.

You can try the Azure pricing calculator for the resources:

  • Azure AI Foundry: Free tier. Pricing
  • Azure AI Search: Standard tier, S1. Pricing is based on the number of documents and operations. Pricing
  • Azure Storage Account: Standard tier, LRS. Pricing is based on storage and operations. Pricing
  • Azure Key Vault: Standard tier. Pricing is based on the number of operations. Pricing
  • Azure AI Services: S0 tier, defaults to gpt-4o-mini and text-embedding-ada-002 models. Pricing is based on token count. Pricing
  • Azure Container App: Consumption tier with 0.5 CPU, 1GiB memory/storage. Pricing is based on resource allocation, and each month allows for a certain amount of free usage. Pricing
  • Azure Container Registry: Basic tier. Pricing
  • Log analytics: Pay-as-you-go tier. Costs based on data ingested. Pricing

⚠️ To avoid unnecessary costs, remember to take down your app if it's no longer in use, either by deleting the resource group in the Portal or running azd down.

Security guidelines

This template uses Azure AI Foundry connections to communicate between resources, which stores keys in Azure Key Vault. This template also uses Managed Identity for local development and deployment.

To ensure continued best practices in your own repository, we recommend that anyone creating solutions based on our templates ensure that the Github secret scanning setting is enabled.

You may want to consider additional security measures, such as:

Resources

This template creates everything you need to get started with Azure AI Foundry:

The template also includes dependent resources required by all AI Hub resources: