Set up Ollama on a Pod

This tutorial shows you how to set up Ollama, a platform for running large language models, on a Runpod GPU Pod. By the end, you’ll have Ollama running with HTTP API access for external requests.

What you’ll learn

In this tutorial, you’ll learn how to:

Deploy a Pod with the PyTorch template.
Install and configure Ollama for external access.
Run AI models and interact via the HTTP API.

Requirements

A Runpod account with credits.

Step 1: Deploy a Pod

Navigate to Pods and select Deploy.
Choose a GPU (for example, A40).
Select the latest PyTorch template.
Under Pod Template, select Edit:

Under Expose HTTP Ports (Max 10), add port 11434.
Under Environment Variables, add an environment variable with key OLLAMA_HOST and value 0.0.0.0.

Click Set Overrides and then Deploy On-Demand.

Step 2: Install Ollama

Once the Pod is running, click the Pod to open the connection options panel and select Enable Web Terminal and then Open Web Terminal.

Update packages and install dependencies:

apt update && apt install -y lshw && apt-get install zstd

Install Ollama and start the server in the background:

(curl -fsSL https://ollama.com/install.sh | sh && ollama serve > ollama.log 2>&1) &

Step 3: Run a model

Download and run a model using the ollama run command:

ollama run llama2

Replace llama2 with any model from the Ollama library. You can now interact with the model directly from the terminal.

Step 4: Make HTTP API requests

With Ollama running, you can make HTTP requests to your Pod from external clients. Try running the following commands, replacing OLLAMA_POD_ID with your actual Pod ID: List available models:

curl https://OLLAMA_POD_ID-11434.proxy.runpod.net/api/tags

Generate a response:

curl -X POST https://OLLAMA_POD_ID-11434.proxy.runpod.net/api/generate -d '{
  "model": "llama2",
  "prompt": "Tell me a story about llamas"
}'

Ollama returns streaming responses by default. To get a non-streaming response, add the stream: false parameter to the request body:

curl -X POST https://OLLAMA_POD_ID-11434.proxy.runpod.net/api/generate -d '{
  "model": "llama2",
  "prompt": "Tell me a story about llamas",
  "stream": false
}'

Congratulations! You’ve set up Ollama on a Runpod Pod and made HTTP API requests to it. For more API options, see the Ollama API documentation.

Next steps

Learn about exposing ports on Pods.
Connect VSCode to Runpod for remote development.
Explore more models in the Ollama library.

Introduction

Serverless

Pods

What you’ll learn

Requirements

Step 1: Deploy a Pod

Step 2: Install Ollama

Step 3: Run a model

Step 4: Make HTTP API requests

Next steps

Introduction

Serverless

Pods

​What you’ll learn

​Requirements

​Step 1: Deploy a Pod

​Step 2: Install Ollama

​Step 3: Run a model

​Step 4: Make HTTP API requests

​Next steps

What you’ll learn

Requirements

Step 1: Deploy a Pod

Step 2: Install Ollama

Step 3: Run a model

Step 4: Make HTTP API requests

Next steps