Learn how to set up and use Llama 3.1 models with Continue using Ollama, Groq, Together AI, Replicate, SambaNova, or Cerebras Inference for local and cloud-based development
If you haven’t already installed Continue, you can do that here for VS Code or here for JetBrains. For more general information on customizing Continue, read our customization docs.Below we share some of the easiest ways to get up and running, depending on your use-case.
Ollama is the fastest way to get up and running with local language models. We recommend trying Llama 3.1 8b, which is impressive for its size and will perform well on most hardware.
Download Ollama here (it should walk you through the rest of these steps)
Check if your chosen model is still supported by referring to the model
documentation. If a model has been
deprecated, you may encounter a 404 error when attempting to use it.
Groq provides the fastest available inference for open-source language models, including the entire Llama 3.1 family.
How to Use Llama 3.1 and 3.3 with Cerebras Inference
Check if your chosen model is still supported by referring to the model
status. If a model has been
deprecated, you may encounter a 404 error when attempting to use it.
Cerebras Inference uses specialized silicon to provides fast inference for the Llama3.1 8B and Llama3.3 70B.