Agentbrisk

Continue.dev Model Not Loading with Ollama: Fix the Error

May 2, 2026 · Editorial Team · 5 min read · continue-devtroubleshootingerror-fix

You installed Continue.dev, configured Ollama as your backend, pulled a model like codellama:13b or deepseek-coder:6.7b, and then opened VS Code expecting local AI completions to just work. Instead, the Continue sidebar shows "Model not loading" or "Connection refused" or sometimes just spins indefinitely without giving you any feedback. Ollama is running (you can see it in the menu bar or systemctl status ollama), the model is downloaded, and yet Continue won't connect. This is one of those setups where four things all have to be right simultaneously, and when one is off, the failure message tells you almost nothing useful.

What this error actually means

Continue.dev communicates with Ollama through its HTTP API, which by default runs on http://localhost:11434. The "model not loading" message in Continue can mean any of the following: the API endpoint is unreachable, the model name in Continue's config doesn't match what Ollama has loaded, the model is in the process of loading (Ollama is still pulling it), or the Ollama server is running but bound to a different address.

Continue.dev's config file at ~/.continue/config.json holds the model provider settings, and errors in that file can cause models to fail loading silently. An incorrect API base URL, a mismatched model name string, or an unsupported parameter can all produce the same generic "model not loading" error.

Quick fix (when you need it working in 60 seconds)

  1. Verify Ollama is actually responding:

    curl http://localhost:11434/api/tags

    You should see a JSON list of installed models. If you get "Connection refused," Ollama isn't running or isn't bound to localhost.

  2. Start Ollama if it's not running:

    ollama serve
  3. Verify your model name exactly:

    ollama list

    Copy the model name exactly as shown, including any version tags like :13b-instruct-q4_0.

  4. Open ~/.continue/config.json and make sure the model name matches exactly what ollama list shows. No extra spaces, no version mismatches.

  5. Reload the VS Code window with Developer: Reload Window from the command palette.

Why this happens

The model name mismatch is far more common than people expect. If you pulled codellama:13b but Continue's config has codellama without the tag, Ollama will refuse the request because it doesn't know which tag you want. Ollama requires the full model specifier. The same applies in reverse: if you have llama3.1:8b-instruct-q4_K_M installed and your Continue config says llama3, the connection will fail.

Ollama's listen address is another common cause. By default Ollama listens on 127.0.0.1:11434, but in some Docker or WSL2 setups the service binds to 0.0.0.0 or a specific container IP. Continue.dev defaults to http://localhost:11434. On systems where localhost resolves to ::1 (IPv6 loopback) but Ollama is only listening on 127.0.0.1 (IPv4), the connection fails because IPv6 and IPv4 loopback are different interfaces. This is a common gotcha on macOS Sonoma and Ventura.

The model genuinely not being loaded is another simple cause that's easy to miss. Ollama has two states: a model can be downloaded but not resident in memory, or it can be actively loaded. When Continue sends the first request, Ollama loads the model into memory, which can take 10-30 seconds for a 13B model on a machine without dedicated GPU. If Continue's connection timeout is shorter than the model load time, it gives up and reports "model not loading" even though Ollama was about to be ready.

Continue.dev versions below 0.9.185 had a bug where models with certain parameter configurations (specifically num_ctx values above 8192) would fail to load in Ollama even if the model supported them, because Ollama would return an error that Continue wasn't handling correctly.

Context length mismatches between what Continue requests and what the model actually supports are another cause. If your config.json sets contextLength: 128000 but your model only supports 32K tokens, Ollama will reject the load request.

Permanent fix

  1. Set the Ollama API base explicitly in Continue's config:

    // ~/.continue/config.json
    {
      "models": [
        {
          "title": "CodeLlama 13B",
          "provider": "ollama",
          "model": "codellama:13b",
          "apiBase": "http://127.0.0.1:11434"
        }
      ]
    }

    Using 127.0.0.1 instead of localhost bypasses IPv6/IPv4 resolution issues.

  2. Pre-warm the model before starting your Continue session:

    ollama run codellama:13b ""

    This loads the model into memory. Press Ctrl+D immediately to exit the interactive session. The model stays loaded for Continue to use.

  3. Set OLLAMA_KEEP_ALIVE to keep the model warm:

    export OLLAMA_KEEP_ALIVE=60m

    Add this to your shell's startup file so Ollama keeps models loaded for 60 minutes after the last request.

  4. Match the context length to what your model actually supports:

    {
      "model": "codellama:13b",
      "contextLength": 16384
    }

    Check the model card on Ollama's library for the supported context length.

  5. Update Continue.dev to at least 0.9.185: Open the VS Code Extensions panel, find Continue, and click Update if available.

  6. If you're on WSL2, configure Ollama to listen on all interfaces and set the API base to the WSL2 host IP:

    # In WSL2, get the host IP
    cat /etc/resolv.conf | grep nameserver | awk '{print $2}'
    # Use that IP in Continue's apiBase
  7. Add verbose logging to Ollama to see exactly what request Continue is sending:

    OLLAMA_DEBUG=1 ollama serve 2>&1 | tee /tmp/ollama-debug.log

    Then check the log when Continue shows the error. The request and error response are both logged.

Prevention

Pin both Continue.dev and Ollama to versions that you've verified work together. The two projects release independently and occasionally introduce incompatibilities. I keep notes on which pairs work in a small TOOLS.md file in my home directory.

Keep a script for warming up your local models at the start of a dev session. Something like:

#!/bin/bash
# warm-models.sh
for model in codellama:13b deepseek-coder:6.7b; do
  echo "Warming $model..."
  ollama run "$model" "" < /dev/null 2>&1 | tail -1
done

Running this when you sit down means your models are ready before you open VS Code.

On machines where memory is tight (16GB RAM with a 13B model is marginal), don't run the model and VS Code simultaneously until the model is fully loaded. The load-time timeout issue goes away if you give the model a head start.

When the fix doesn't work

If curl http://localhost:11434/api/tags works but Continue still won't connect, check the Continue extension logs directly. In VS Code, open Output panel and select "Continue" from the dropdown. The full error from the Ollama API response will be there, including any parameter validation errors that the UI summarizes as "model not loading."

File issues at github.com/continuedev/continue. The project is active and Ollama integration is a first-class concern. Include your Ollama version (ollama --version), the model you're using, your Continue config (sanitized), and the Continue extension log output.

For teams running Ollama as a shared server (rather than locally), make sure the apiBase in Continue's config points to the server IP and that the Ollama server's firewall allows connections on port 11434.

Search