HomeBlogBlog Detail

Anyscale Endpoints: Embedding endpoint, Llama-2 70B fine-tuning and improved sign-up experience

By The Anyscale Team   

Update June 2024: Anyscale Endpoints (Anyscale's LLM API Offering) and Private Endpoints (self-hosted LLMs) are now available as part of the Anyscale Platform. Click here to get started on the Anyscale platform.

LinkEmbedding Endpoints

Retrieval-augmented generation, or RAG applications are among the most popular applications built with LLMs. Embedding endpoints enables developers to use open-source embedding models. Today, we are starting with gte-large, and developers can access it at $0.05/MTokens. We plan to add more models in the future, and users can request newer embedding models by filling out this google form. For more info visit here.

Example usage:

1import openai
2client = openai.OpenAI(
3    base_url = "https://api.endpoints.anyscale.com/v1",
4    api_key = "esecret_YOUR_API_KEY"
5)
6embedding = client.embeddings.create(
7    model="thenlper/gte-large",
8    input="Your text string goes here",
9)
10print(embedding.model_dump())
11

The output:

1{
2    'data': [
3        {'embedding': [...],
4         'index': 0,
5         'object': 'embedding'
6         }
7     ],
8     'model': 'thenlper/gte-large'
9   ...
10}

LinkLlama-2 70B fine tuning

Fine tuning is a popular technique to allow for model personalization and optimization, making it possible to improve model quality for specific uses, while also reducing costs and improving performance.

We have seen good traction on Llama-2 7B and 13B fine-tuning API. Today we are extending the fine-tuning functionality to the Llama-2 70B model. Llama-2 70B is the largest model in the Llama 2 series of models, and starting today, you can fine-tune it on Anyscale Endpoints with a $5 fixed cost per job run and $4/M tokens of data. You can start inference on the fine-tuned model at $1/M tokens. For more info visit here.

LinkFine-tuning Pricing

Model

Fixed Cost/Run

Price ($/M tokens)

Llama-2-7b-chat-hf

5

1

Llama-2-13b-chat-hf

5

2

Llama-2-70b-chat-hf

5

4

LinkFine-tuned model inference Pricing

Model

Price ($/M tokens)

Llama-2-7b-chat-hf

0.25

Llama-2-13b-chat-hf

0.50

Llama-2-70b-chat-hf

1.00

Example usage:

1import openai
2
3client = openai.OpenAI(
4    base_url = "https://api.endpoints.anyscale.com/v1",
5    api_key = "esecret_yourAuthTokenHere"
6)
7
8# Upload the file
9file_name = "train.jsonl"
10file = client.files.create(
11  file=open(file_name, "rb"),
12  purpose="fine-tune",
13  user_provided_filename=file_name,
14)
15
16# Launch the finetuning job
17client.fine_tuning.jobs.create(
18    model="meta-llama/Llama-2-70b-chat-hf",
19    training_file="file_123",
20)

LinkImprovements to user experience

Users can now get started with Anyscale Endpoints without a credit card. Get started with free credits and add payment information on the account later.

Ready to try Anyscale?

Access Anyscale today to see how companies using Anyscale and Ray benefit from rapid time-to-market and faster iterations across the entire AI lifecycle.