Skip to content

Exposing Ollama LLM

Ollama is a local LLM runner that lets you serve AI models on your own machine. Inspectr makes it easy to expose your Ollama API to the internet for demos, integrations, or remote collaboration.

This guide walks through exposing your Ollama instance securely using Inspectr.


  • Share a live AI model without deploying it
  • Inspect and debug request/response cycles
  • Secure access with a one-time code
  • Replay or log incoming requests

Start Ollama locally:

Terminal window
ollama serve

Finally, in a separate shell, run a model:

Terminal window
ollama run llama3.2

Ollama listens on:

http://localhost:11434

Terminal window
inspectr \
--listen=:8080 \
--backend=http://localhost:11434 \
--expose \
--channel=ollama-demo \
--channel-code=ollama123

Inspectr will:

  • Forward all traffic from https://ollama-demo.in-spectr.dev to your local Ollama
  • Log and show all requests/responses in the App UI
  • Require callers to include the access code ollama123

With Inspectr running, you can now send prompts remotely:

Terminal window
curl https://ollama-demo.in-spectr.dev/api/generate \
-d '{ "model": "llama3.2", "prompt": "Why is the sky blue?" }'
Ollama Request

Use the Inspectr App UI (http://localhost:4004) to:

  • Monitor prompts sent to Ollama
  • Inspect request headers, bodies, and responses
  • Replay past requests to test model behavior

Inspectr + Ollama is ideal for:

  • Sharing demos with teammates or clients
  • Testing integration pipelines
  • Observing and replaying LLM behavior remotely