Update June 2024: Anyscale Endpoints (Anyscale's LLM API Offering) and Private Endpoints (self-hosted LLMs) are now available as part of the Anyscale Platform. Click here to get started on the Anyscale platform.
Retrieval-augmented generation, or RAG applications are among the most popular applications built with LLMs. Embedding endpoints enables developers to use open-source embedding models. Today, we are starting with gte-large, and developers can access it at $0.05/MTokens. We plan to add more models in the future, and users can request newer embedding models by filling out this google form. For more info visit here.
Example usage:
1import openai
2client = openai.OpenAI(
3 base_url = "https://api.endpoints.anyscale.com/v1",
4 api_key = "esecret_YOUR_API_KEY"
5)
6embedding = client.embeddings.create(
7 model="thenlper/gte-large",
8 input="Your text string goes here",
9)
10print(embedding.model_dump())
11
The output:
1{
2 'data': [
3 {'embedding': [...],
4 'index': 0,
5 'object': 'embedding'
6 }
7 ],
8 'model': 'thenlper/gte-large'
9 ...
10}
Fine tuning is a popular technique to allow for model personalization and optimization, making it possible to improve model quality for specific uses, while also reducing costs and improving performance.
We have seen good traction on Llama-2 7B and 13B fine-tuning API. Today we are extending the fine-tuning functionality to the Llama-2 70B model. Llama-2 70B is the largest model in the Llama 2 series of models, and starting today, you can fine-tune it on Anyscale Endpoints with a $5 fixed cost per job run and $4/M tokens of data. You can start inference on the fine-tuned model at $1/M tokens. For more info visit here.
Model | Fixed Cost/Run | Price ($/M tokens) |
---|---|---|
Llama-2-7b-chat-hf | 5 | 1 |
Llama-2-13b-chat-hf | 5 | 2 |
Llama-2-70b-chat-hf | 5 | 4 |
Model | Price ($/M tokens) |
---|---|
Llama-2-7b-chat-hf | 0.25 |
Llama-2-13b-chat-hf | 0.50 |
Llama-2-70b-chat-hf | 1.00 |
Example usage:
1import openai
2
3client = openai.OpenAI(
4 base_url = "https://api.endpoints.anyscale.com/v1",
5 api_key = "esecret_yourAuthTokenHere"
6)
7
8# Upload the file
9file_name = "train.jsonl"
10file = client.files.create(
11 file=open(file_name, "rb"),
12 purpose="fine-tune",
13 user_provided_filename=file_name,
14)
15
16# Launch the finetuning job
17client.fine_tuning.jobs.create(
18 model="meta-llama/Llama-2-70b-chat-hf",
19 training_file="file_123",
20)
Users can now get started with Anyscale Endpoints without a credit card. Get started with free credits and add payment information on the account later.