Skip to main content
This tutorial shows you how to set up Ollama, a platform for running large language models, on a Runpod GPU Pod. By the end, you’ll have Ollama running with HTTP API access for external requests.

What you’ll learn

In this tutorial, you’ll learn how to:
  • Deploy a Pod with the PyTorch template.
  • Install and configure Ollama for external access.
  • Run AI models and interact via the HTTP API.

Requirements

  • A Runpod account with credits.

Step 1: Deploy a Pod

  1. Navigate to Pods and select Deploy.
  2. Choose a GPU (for example, A40).
  3. Select the latest PyTorch template.
  4. Under Pod Template, select Edit:
  • Under Expose HTTP Ports (Max 10), add port 11434.
  • Under Environment Variables, add an environment variable with key OLLAMA_HOST and value 0.0.0.0.
  1. Click Set Overrides and then Deploy On-Demand.

Step 2: Install Ollama

  1. Once the Pod is running, click the Pod to open the connection options panel and select Enable Web Terminal and then Open Web Terminal.
  2. Update packages and install dependencies:
    apt update && apt install -y lshw && apt-get install zstd
    
  3. Install Ollama and start the server in the background:
    (curl -fsSL https://ollama.com/install.sh | sh && ollama serve > ollama.log 2>&1) &
    

Step 3: Run a model

Download and run a model using the ollama run command:
ollama run llama2
Replace llama2 with any model from the Ollama library. You can now interact with the model directly from the terminal.

Step 4: Make HTTP API requests

With Ollama running, you can make HTTP requests to your Pod from external clients. Try running the following commands, replacing OLLAMA_POD_ID with your actual Pod ID: List available models:
curl https://OLLAMA_POD_ID-11434.proxy.runpod.net/api/tags
Generate a response:
curl -X POST https://OLLAMA_POD_ID-11434.proxy.runpod.net/api/generate -d '{
  "model": "llama2",
  "prompt": "Tell me a story about llamas"
}'
Ollama returns streaming responses by default. To get a non-streaming response, add the stream: false parameter to the request body:
curl -X POST https://OLLAMA_POD_ID-11434.proxy.runpod.net/api/generate -d '{
  "model": "llama2",
  "prompt": "Tell me a story about llamas",
  "stream": false
}'
Congratulations! You’ve set up Ollama on a Runpod Pod and made HTTP API requests to it. For more API options, see the Ollama API documentation.

Next steps