Amazon Web Services (AWS)

Llama 3 - Developer Guide

Discover how to install and configure Llama 3 on AWS with our comprehensive developer guide. From initial setup to advanced configurations, this guide equips you with everything you need to successfully deploy Llama 3 and maximize its potential for your applications.

Hiruna Kumara

Jun 3, 2024 • 9 min read

Welcome to the Llama 3 Developer Guide for AWS integration! Experience the cutting-edge performance of Llama 3, boasting enhanced scalability and refined post-training processes. Elevate your AI projects with its advanced capabilities in language understanding, translation, dialogue generation, reasoning, code generation, and more. Let's dive in and unlock the full potential of Llama 3 within your AWS environment.

Blog

Video

Prerequisites

Before you get started with the Llama 3 AMI, ensure you have the following prerequisites:

Basic knowledge of AWS services, including EC2 instances and CloudFormation.
An active AWS account with appropriate permissions.
Enough vCPU limit to create g4dn type instances
(Follow https://meetrix.io/articles/how-to-increase-aws-quota/ blog to ensure this)

Launching the AMI

Step 1: Find and Select 'Llama 3' AMI

Log in to your AWS Management Console.
Follow the provided links to access the 'Llama 3' product you wish to set up.
a. LLaMa 3 Meta AI 8B: OpenAI API Compatible AMI
b. LLaMa 3 Meta AI 70B: OpenAI API Compatible AMI

Step 2: Initial Setup & Configuration

Click the "Continue to Subscribe" button.
After subscribing, you will need to accept the terms and conditions. Click on "Accept Terms" to proceed.
Please wait for a few minutes while the processing takes place. Once it's completed, click on "Continue to Configuration".
Select the "CloudFormation Template for Llama 3 deployment" as the fulfilment option and choose your preferred region on the "Configure this software" page. Afterward, click the "Continue to Launch" button.
From the "Choose Action" dropdown menu in "Launch this software" page, select "Launch CloudFormation" and click "Launch" button.

Create CloudFormation Stack

Step1: Create stack

Ensure the "Template is ready" radio button is selected under "Prepare template".

2. Click "Next".

Step2: Specify stack options

Provide a unique "Stack name".
Provide the "Admin Email" for SSL generation.
For "DeploymentName", enter a name of your choice.
Provide a public domain name for "DomainName". (Llama 3 will automatically try to setup SSL based on provided domain name, if that domain hosted on Route53. Please make sure your domain name hosted on route53. If its unsuccessful then you have to setup SSL manually)
Choose an instance type, "InstanceType" (Recommended: g4dn.xlarge).
Select your preferred "keyName".
Set "SSHLocation" as "0.0.0.0/0".
Keep "SubnetCidrBlock" as "10.0.0.0/24".
Keep "VpcCidrBlock" as "10.0.0.0/16".
Click "Next".

Step3: Configure stack options

Choose "Roll back all stack resources" and "Delete all newly created resources" under the "Stack failure options" section.
click "Next".

Step4: Review

Review and verify the details you've entered.

2. Tick the box that says, "I acknowledge that AWS CloudFormation might create IAM resources with custom names".

3. Click "Submit".

Afterward, you'll be directed to the CloudFormation stacks page.

Please wait for 5-10 minutes until the stack has been successfully created.

Update DNS

Step1: Copy IP Address

Copy the public Ip labeled "PublicIp" in the "Outputs" tab.

Step2: Update DNS

Go to AWS Route 53 and navigate to "Hosted Zones".
From there click on Create record.

3. Add record name and then paste the copied "PublicIp" into the "value" textbox.

4. Click "Save".

Access Llama 3

You can access the Llama 3 application through the "DashboardUrl" or 'DashboardUrlIp' provided in the "Outputs" tab.

(If you encounter a "502 Bad Gateway error", please wait for about 5 minutes before refreshing the page)

Generate SSL Manually

Llama 3 will automatically try to setup SSL based on provided domain name, if that domain hosted on Route53. If its unsuccessful then you have to setup SSL manually.

Step1: Copy IP Address

Proceed with the instructions outlined in the above "Update DNS" section, if you have not already done so.

2. Copy the Public IP address indicated as "PublicIp" in the "Outputs" tab.

Step2: Log in to the server

Open the terminal and go to the directory where your private key is located.
Paste the following command into your terminal and press Enter: ssh -i <your key name> ubuntu@<Public IP address>.

3. Type "yes" and press Enter. This will log you into the server.

Step3: Generate SSL

Paste the following command into your terminal and press Enter and follow the instructions:

sudo /root/certificate_generate_standalone.sh

Admin Email is acquiring for generate SSL certificates.

Shutting Down Llama 3

Click the link labeled "Llama 3" in the "Resources" tab to access the EC2 instance, you will be directed to the Llama 3 instance in EC2.

2. Select the Llama 3 instance by marking the checkbox and click "Stop instance" from the "Instance state" dropdown. You can restart the instance at your convenience by selecting "Start instance".

Remove Llama 3

Delete the stack that has been created in the AWS Management Console under 'CloudFormation Stacks' by clicking the 'Delete' button.

API Documentation

1. Retrieve Completions

Retrieves completions based on the provided prompt.

Endpoint: /v1/completions
Method: POST
Request Body:

{
  "model": "llama3-8b",
  "prompt": "\n\n### Instructions:\nWhat is the capital of France?\n\n### Response:\n",
  "stop": [
    "\n",
    "###"
  ]
}

Response Body:

{
  "id": "cmpl-498760e1-2b50-47c3-95fb-98c8bff8b10a",
  "object": "text_completion",
  "created": 1717503179,
  "model": "llama3-8b",
  "choices": [
    {
      "text": "The correct answer is **Paris.**",
      "index": 0,
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 16,
    "completion_tokens": 8,
    "total_tokens": 24
  }
}

2. Retrieve Embeddings

Retrieves embeddings based on the provided input text.

Endpoint: /v1/embeddings
Method: POST
Request Body:

{
  "input": "The food was delicious and the waiter...",
  "model": "llama3-8b"
}

Response Body:

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "embedding": [
        -0.37700873613357544,
        1.3124240636825562,
        4.191315650939941,
        ...
      ],
      "index": 0
    }
  ],
  "model": "llama3-8b",
  "usage": {
    "prompt_tokens": 9,
    "total_tokens": 9
  }
}

3. Retrieve Chat Completions

Retrieves chat completions based on the provided chat messages.

Endpoint: /v1/chat/completions
Method: POST
Request Body:

{
  "messages": [
    {
      "content": "You are a helpful assistant.",
      "role": "system"
    },
    {
      "content": "What is the capital of France?",
      "role": "user"
    }
  ],
  "model": "llama3-8b",
  "stop": [
    "\n",
    "###"
  ]
}

Response Body:

{
  "id": "chatcmpl-8c5130ab-eca9-4760-8171-f7a3aee9b9ba",
  "object": "chat.completion",
  "created": 1717503343,
  "model": "llama3-8b",
  "choices": [
    {
      "index": 0,
      "message": {
        "content": "The capital city of France is Paris.<|im_end|>",
        "role": "assistant"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 50,
    "completion_tokens": 13,
    "total_tokens": 63
  }
}

4. List Models

Retrieves a list of available models.

Endpoint: /v1/models
Method: GET
Response Body:

{
  "object": "list",
  "data": [
    {
      "id": "llama3-8b",
      "object": "model",
      "owned_by": "me",
      "permissions": []
    },
    {
      "id": "llama3-8b-instruct",
      "object": "model",
      "owned_by": "me",
      "permissions": []
    },
    {
      "id": "llama-guard-2-8b",
      "object": "model",
      "owned_by": "me",
      "permissions": []
    }
  ]
}

5. Use different Models

To change model,
Run "List Models"
Select the preferred model and copy "id" from the response
Replace the "model" variable in the request body of your preferred endpoint

Note that changing the model will take a bit more time to give the response of the endpoint

Testing the API

Create a directory
Create 3 files (Full codes are given below)
app.js
package.json
.env
Run the following command
npm install
Edit variable file (.env)
Run the following command
npm start
You will get the responses

const axios = require('axios');
require('dotenv').config();

const makePostRequest = async (url, data, timeout) => {
  try {
    const response = await axios.post(url, data, { timeout });
    return { success: response.status === 200, data: response.data };
  } catch (error) {
    return { success: false, error: error.message };
  }
};

const makeGetRequest = async (url, timeout) => {
  try {
    const response = await axios.get(url, { timeout });
    return { success: response.status === 200, data: response.data };
  } catch (error) {
    return { success: false, error: error.message };
  }
};

const printResponseData = (endpoint, data) => {
  console.log(`Response for ${endpoint}:`);
  console.log(JSON.stringify(data, null, 2));
  console.log('');
};

const checkEndpoints = async () => {
  const baseUrl = process.env.BASE_URL;
  const model = process.env.MODEL;

  const endpoints = [
    { path: '/completions', method: makePostRequest, data: { "model": model, "prompt": process.env.PROMPT1 }, printEnv: 'PRINT_COMPLETIONS_RESPONSE' },
    { path: '/embeddings', method: makePostRequest, data: { "input": process.env.PROMPT2, "model": model }, printEnv: 'PRINT_EMBEDDINGS_RESPONSE' },
    { path: '/chat/completions', method: makePostRequest, data: { "messages": [{ "content": "You are a helpful assistant.", "role": "system" }, { "content": process.env.PROMPT1, "role": "user" }], "model": model }, printEnv: 'PRINT_CHAT_COMPLETIONS_RESPONSE' },
    { path: '/models', method: makeGetRequest, printEnv: 'PRINT_MODELS_RESPONSE' }
  ];

  for (const endpoint of endpoints) {
    const url = `${baseUrl}${endpoint.path}`;
    const { success, data, error } = await endpoint.method(url, endpoint.method === makePostRequest ? endpoint.data : null, process.env.REQUEST_TIMEOUT || 50000);
    const printResponse = process.env[endpoint.printEnv] === 'true';

    if (success) {
      console.log(`*** Endpoint ${endpoint.path} is reachable.`);
      if (printResponse) {
        printResponseData(endpoint.path, data);
      }
      console.log('');
    } else {
      console.log(`*** Endpoint ${endpoint.path} is not reachable. Error:`, error);
    }
  }
};

checkEndpoints();

app.js

{
  "name": "test-llama",
  "version": "1.0.0",
  "description": "",
  "main": "index.js",
  "scripts": {
    "start": "node app.js",
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "author": "",
  "license": "ISC",
  "dependencies": {
    "axios": "^1.6.7",
    "dotenv": "^16.4.1"
  }
}

package.json

# Base URL for the API
BASE_URL=https://mixtral-test-prod.meetrix.io/v1

# Model to be used in requests
MODEL=mixtral-8x7b-instruct-v0.1

# Prompts for different endpoints
# /completions and /chat/completions
PROMPT1=What is the capital of France?
# /embeddings
PROMPT2=The food was delicious and the waiter...

# Whether to print responses for each endpoint
PRINT_COMPLETIONS_RESPONSE=true
PRINT_EMBEDDINGS_RESPONSE=false
PRINT_CHAT_COMPLETIONS_RESPONSE=true
PRINT_MODELS_RESPONSE=true

# Timeout for requests in milliseconds (default is 50000)
REQUEST_TIMEOUT=50000

.env

Check Server Logs

Step1: Log in to the server

Open the terminal and go to the directory where your private key is located.
Paste the following command into your terminal and press Enter:
ssh -i <your key name> ubuntu@<Public IP address>

3. Type "yes" and press Enter. This will log you into the server.

Step2: Check the logs

sudo tail -f /var/log/syslog

Upgrades

When there is an upgrade, we will update the product with a newer version. You can check the product version in AWS Marketplace. If a newer version is available, you can remove the previous version and launch the product again using the newer version. Remember to backup the necessary server data before removing.

Troubleshoot

If you face the following error, please follow https://meetrix.io/articles/how-to-increase-aws-quota/ blog to increase vCPU quota.

2. If you face the following error (do not have sufficient <instance_type> capacity...) while creating the stack, try changing the region or try creating the stack at a later time.

3. If you face the below error, when you try to access the API dashboard, please wait 5-10 minutes and then try.

4. If the llama service got stuck, you can follow the steps below

Log into the server (Find the steps in Check Server Logs section)
Run the below command.

sudo systemctl restart llama.service

Wait for several minutes and reload the dashboard URL.

5. Check whether the instance storage is full.

Log into the server and run the below command

df -h

root volume

If the root volume is between 90-100%, it is better to resize EBS volume. Please follow AWS documentation to increase the EBS volume.
Then reboot the instance and restart the llama service.

Conclusion

In conclusion, the Llama 3 Developer Guide equips you with everything you need for a seamless integration of Llama 3 into your AWS environment. Whether you're a novice or an experienced developer, our guide offers detailed, step-by-step instructions to ensure a smooth setup process. If you're studying AI research or textbooks alongside development, turn book into mind map to retain and revisit complex concepts quickly. Llama 3 represents a leap forward in AI sophistication, seamlessly integrating with AWS to offer unparalleled power and simplicity. From language understanding to code generation, Llama 3 empowers you to explore the frontiers of artificial intelligence effortlessly.

Technical Support

Reach out to Meetrix Support (support@meetrix.io) for assistance with Llama 3 issues.