Llama - Developer Guide
Welcome to the Meetrix Llama Developer Guide! This guide is designed to assist you in seamlessly integrating Llama into your AWS environment. Whether you're new to AWS or an experienced developer, you'll discover step-by-step instructions, configuration details, and troubleshooting tips to ensure a smooth experience.
Brief Overview of Llama
LLaMa 2 is pre-trained generative text model. This Amazon Machine Images are fully optimized for developers eager to harness the power of OpenAI's advanced text generation capabilities. API-integrated and OpenAI-ready, the LLaMa 2 AMIs promise seamless deployment and unparalleled efficiency. Dive into the future of generative text operations with this cutting-edge integration tool.
Video
Blog
Prerequisites
Before you get started with the Llama AMIs, ensure you have the following prerequisites:
- Basic knowledge of AWS services, including EC2 instances and CloudFormation.
- An active AWS account with appropriate permissions.
- Enough vCPU limit to create g4dn type instances
(Follow https://meetrix.io/articles/how-to-increase-aws-quota/ blog to ensure this)
Launching the AMI
Step 1: Find and Select 'Llama' AMI
- Log in to your AWS Management Console.
- Follow the provided links to access the Llama product you wish to set up.
Step 2: Initial Setup & Configuration
- Click the "Continue to Subscribe" button.
- After subscribing, you will need to accept the terms and conditions. Click on "Accept Terms" to proceed.
- Please wait for a few minutes while the processing takes place. Once it's completed, click on "Continue to Configuration".
- Select your preferred region in "Configure this software" page and click "Continue to Launch" button.
- From the "Choose Action" dropdown menu in "Launch this software" page, select "Launch CloudFormation" and click "Launch" button.
Create CloudFormation Stack
Step1: Create stack
- Ensure the "Template is ready" radio button is selected under "Prepare template".
2. Click "Next".
Step2: Specify stack options
- Provide a unique "Stack name".
- Provide the "Admin Email" for SSL generation.
- For "DeploymentName", enter a name of your choice.
- Select your preferred "keyName".
- Provide a public domain name for "LlamaDomainName". (Llama will automatically try to setup SSL based on provided domain name, if that domain hosted on Route53. Please make sure your domain name hosted on route53. If its unsuccessful then you have to setup SSL manually)
- Choose an instance type, "LlamaInstanceType" (Recommended: g4dn.xlarge for Llama 7B and 13B, g4dn.12xlarge for Llama 70B).
- Set "SSHLocation" as "0.0.0.0/0".
- Keep "SubnetCidrBlock" as "10.0.0.0/24".
- Keep "VpcCidrBlock" as "10.0.0.0/16".
- Click "Next".
Step3: Configure stack options
- Under "Stack failure options", select "Roll back all stack resources".
- click "Next".
Step4: Review
- Review and verify the details you've entered.
2. Tick the box that says, "I acknowledge that AWS CloudFormation might create IAM resources with custom names".
3. Click "Submit".
Afterward, you'll be directed to the CloudFormation stacks page.
Please wait for 5-10 minutes until the stack has been successfully created. Afterward, you can click the "Refresh" button under the "Stacks" section.
Update DNS
Step1: Copy IP Address
- Copy the public Ip labeled "PublicIp" in the "Outputs" tab.
Step2: Update DNS
- Go to AWS Route 53 and navigate to "Hosted Zones".
- From there, select the domain you provided to "LlamaDomainName".
3. Click "Edit record" in the "Record details" and then paste the copied "PublicIp" into the "value" textbox.
4. Click "Save".
Access Llama
you can access the Llama application through the "DashboardUrl" provided in the "Outputs" tab.
(If you encounter a "502 Bad Gateway error", please wait for about 5 minutes before refreshing the page)
Generate SSL Manually
Llama will automatically try to setup SSL based on provided domain name, if that domain hosted on Route53. If its unsuccessful then you have to setup SSL manually.
Step1: Copy IP Address
- Proceed with the instructions outlined in the above "Update DNS" section, if you have not already done so.
2. Copy the Public IP address indicated as "PublicIp" in the "Outputs" tab.
Step2: Log in to the server
- Open the terminal and go to the directory where your private key is located.
- Paste the following command into your terminal and press Enter:
ssh -i <your key name> ubuntu@<Public IP address>
.
3. Type "yes" and press Enter. This will log you into the server.
Step3: Generate SSL
Paste the following command into your terminal and press Enter and follow the instructions:
sudo /root/certificate_generate_standalone.sh
Admin Email is acquiring for generate SSL certificates.
Shutting Down Llama
- Click the link labeled "Llama" in the "Resources" tab to access the EC2 instance, you will be directed to the Llama instance in EC2.
2. Select the Llama instance by marking the checkbox and click "Stop instance" from the "Instance state" dropdown. You can restart the instance at your convenience by selecting "Start instance".
Remove Llama
Delete the stack that has been created in the AWS Management Console under 'CloudFormation Stacks' by clicking the 'Delete' button.
API Documentation
1. Retrieve Completions from CoPilot Codex Engine
Retrieves completions from the CoPilot Codex Engine based on the provided prompt.
- Endpoint: /v1/engines/copilot-codex/completions
- Method: POST
- Request Body:
{
"prompt": "\n\n### Instructions:\nWhat is the capital of France \n\n### Response:\n",
"stop": ["\n", "###"]
}
- Response Body:
{
"id": "cmpl-91cb1316-fff4-450d-8c01-ad98737afb9d",
"object": "text_completion",
"created": 1697623333,
"model": "/root/models/llama-2-13b-chat.Q5_K_M.gguf",
"choices": [
{
"text": "The capital of France is Paris.",
"index": 0,
"logprobs": null,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 25,
"completion_tokens": 7,
"total_tokens": 32
}
}
2. Retrieve Completions
Retrieves completions based on the provided prompt.
- Endpoint: /v1/completions
- Method: POST
- Request Body:
{
"prompt": "\n\n### Instructions:\nWhat is the capital of France?\n\n### Response:\n",
"stop": ["\n", "###"]
}
- Response Body:
{
"id": "cmpl-b185009b-cc7a-4ecf-ba0c-e953ffbc6bb5",
"object": "text_completion",
"created": 1697625396,
"model": "/root/models/llama-2-13b-chat.Q5_K_M.gguf",
"choices": [
{
"text": "The capital of France is Paris.",
"index": 0,
"logprobs": null,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 25,
"completion_tokens": 7,
"total_tokens": 32
}
}
3. Retrieve Embeddings
Retrieves embeddings based on the provided input text.
- Endpoint: /v1/embeddings
- Method: POST
- Request Body:
{
"input": "The food was delicious and the waiter..."
}
- Response Body:
{
"object": "list",
"data": [
{
"object": "embedding",
"embedding": [
-0.07521496713161469,
0.44098934531211853,
0.6786724328994751,
...
],
"index": 0
}
],
"model": "/root/models/llama-2-13b-chat.Q5_K_M.gguf",
"usage": {
"prompt_tokens": 11,
"total_tokens": 11
}
}
4. Retrieve Chat Completions
Retrieves chat completions based on the provided chat messages.
- Endpoint: /v1/chat/completions
- Method: POST
- Request Body:
{
"messages": [
{
"content": "You are a helpful assistant.",
"role": "system"
},
{
"content": "What is the capital of France?",
"role": "user"
}
]
}
- Response Body:
{
"id": "chatcmpl-68a41146-ddc6-43e4-b12e-760be6e59e57",
"object": "chat.completion",
"created": 1697625763,
"model": "/root/models/llama-2-13b-chat.Q5_K_M.gguf",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": " Bonjour! 🇫🇷 The capital of France is Paris (pronounced \"pah-ree\"). 💖 It's a beautiful city known for its art, fashion, cuisine, and rich history. 🍽️🏰❤️ Do you have any other questions about France or Paris? 😊"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 37,
"completion_tokens": 85,
"total_tokens": 122
}
}
5. List Models
Retrieves a list of available models.
- Endpoint: /v1/models
- Method: GET
- Response Body:
{
"object": "list",
"data": [
{
"id": "/root/models/llama-2-13b-chat.Q5_K_M.gguf",
"object": "model",
"owned_by": "me",
"permissions": []
}
]
}
Testing the API
- Create a directory
- Create 3 files (Full codes are given below)
app.js
package.json
.env - Run the following command
npm install - Edit variable file (.env)
- Run the following command
npm start - You will get the responses
Check Server Logs
Step1: Log in to the server
- Open the terminal and go to the directory where your private key is located.
- Paste the following command into your terminal and press Enter:
ssh -i <your key name> ubuntu@<Public IP address>
3. Type "yes" and press Enter. This will log you into the server.
Step2: Check the logs
sudo tail -f /var/log/syslog
Upgrades
When there is an upgrade, we will update the product with a newer version. You can check the product version in AWS Marketplace. If a newer version is available, you can remove the previous version and launch the product again using the newer version. Remember to backup the necessary server data before removing.
Troubleshoot
- If you face the following error, please follow https://meetrix.io/articles/how-to-increase-aws-quota/ blog to increase vCPU quota.
2. If you face the following error (do not have sufficient <instance_type> capacity...) while creating the stack, try changing the region or try creating the stack at a later time.
3. If you face the below error, when you try to access the API dashboard, please wait 5-10 minutes and then try.
Conclusion
The Meetrix Llama Developer Guide is a vital resource for smoothly incorporating Llama into your AWS setup. Our tutorial offers step-by-step instructions, configuration details, and troubleshooting suggestions for a flawless experience, regardless of your level of experience with AWS as a developer. Developers can confidently take on the future of generative text operations with LLaMa 2's pre-trained generative text model, optimized Amazon Machine Images, and smooth integration with OpenAI. These features provide unmatched efficiency and a streamlined deployment procedure.
Technical Support
Reach out to Meetrix Support (support@meetrix.io) for assistance with Mixtral issues.