Deploying llama3.2 on Marlin TEEs

GlitchVortex · March 19, 2025, 12:50pm

Marlin’s Oyster CVM

Confidential Virtual Machines (CVMs) provide secure and private computation in untrusted environments. By leveraging hardware-based encryption and secure enclaves, CVMs ensure that sensitive data and applications remain fully protected, even when running on third-party infrastructure. With oyster CVM tooling, you can create secure execution environments that preserve data privacy while enabling complex computations and operations.

AI Agents & The Challenge of On-Chain Execution

Running AI agents directly on-chain presents significant challenges:

High computational demands.
Non-deterministic execution.
Dependence on general-purpose programming languages

Since on-chain execution isn’t feasible, the next-best solution is to run AI models off-chain while ensuring their execution is verifiable on-chain.

Ensuring Verifiable AI Execution

To guarantee trust in off-chain AI execution, we implemented an HTTP proxy server that verifies whether responses are generated within a trusted enclave.

When the enclave boots, it generates a public-private key pair that remains inside the enclave and is lost upon reboot.
This key cryptographically signs all model-generated responses, ensuring verifiability.

Deploying Llama3.2 Securely in Oyster

We deployed Llama3.2 inside Oyster using the Ollama framework:

An HTTP proxy forwards external prompts to the model running inside the enclave.
The model processes the request and generates a response.
Instead of streaming the response (as seen in AI chatbots like ChatGPT), it is fully collected, signed, and timestamped before being sent to the user.

This approach ensures that every response is both secure and cryptographically verifiable, maintaining the highest level of trust in AI-driven computations.

Deploying Llama3.2 in Oyster with Docker-Compose

The following Docker Compose configuration sets up Llama3.2 inside Oyster, ensuring secure execution within a Confidential Virtual Machine (CVM):

services:
  # Llama Proxy Service
  llama_proxy:
    image: kalpita888/ollama_arm64:0.0.1
    container_name: llama_proxy
    init: true
    network_mode: host
    volumes:
      - /app/ecdsa.sec:/app/secp.sec

  # Ollama Server
  ollama_server:
    image: ollama/ollama:0.6.0
    container_name: ollama_server
    init: true
    network_mode: host
    healthcheck:
      test: ["CMD-SHELL", "ollama --version"]
      interval: 10s
      retries: 3

  # Ollama Model Instance
  ollama_model:
    image: ollama/ollama:0.6.0
    container_name: ollama_model_llama3.2
    command: pull llama3.2
    init: true
    network_mode: host
    depends_on:
      ollama_server:
        condition: service_healthy
    healthcheck:
      test: ["CMD-SHELL", "ollama show llama3.2"]
      start_period: 2m30s
      interval: 30s
      retries: 3

Service Breakdown

Llama Proxy Service:
- Forwards prompts to the model
- Signs responses for cryptographic verification
- Proxy Code & Dockerfile available for further customization
Ollama Server:
- Runs Ollama without a desktop application
- Ensure Ollama server is running
Ollama Model Instance:
- Pulls and initializes Llama3.2 (3B parameters, ~2GB size)
- Ensures the correct model setup for AI inference

Note: If using a different model, check the Ollama Model Library and update the Docker Compose file accordingly.

Deploying the Enclave

Set up a wallet with 0.001 ETH and 1 USDC on the Arbitrum One network

Deploy the enclave using the following command:

For amd64 systems:

# Replace <key> with the private key of your wallet
oyster-cvm deploy --wallet-private-key <key> \
  --docker-compose ./docker-compose.yml \
  --instance-type c6a.4xlarge \
  --region ap-south-1 \
  --operator 0xe10Fa12f580e660Ecd593Ea4119ceBC90509D642 \
  --duration-in-minutes 20 \
  --pcr-preset base/blue/v1.0.0/amd64 \
  --image-url https://artifacts.marlin.org/oyster/eifs/base-blue_v1.0.0_linux_amd64.eif

For arm64 systems:

# Replace <key> with the private key of your wallet
oyster-cvm deploy --wallet-private-key <key> \
  --docker-compose ./docker-compose.yml \
  --instance-type c6g.4xlarge \
  --region ap-south-1 \
  --operator 0xe10Fa12f580e660Ecd593Ea4119ceBC90509D642 \
  --duration-in-minutes 20 \
  --pcr-preset base/blue/v1.0.0/arm64 \
  --image-url https://artifacts.marlin.org/oyster/eifs/base-blue_v1.0.0_linux_arm64.eif

Deployment & Execution Time

Enclave Setup: ~3 minutes
Model Download & Initialization: ~4 minutes (varies for larger models)

Testing the Setup

Once the deployment is complete, you can verify the model by running the following cURL command:

curl http://{{instance-ip}}:5000/api/generate -d '{
  "model": "llama3.2",
  "prompt": "Why is the sky blue?"
}'

Since the HTTP proxy collects, signs, and timestamps the response before sending the final output, expect a wait time of ~2 minutes.

Verifying the Response

Running the cURL command with the -v flag will display two critical headers for verification:

x-oyster-timestamp: 1741620242
x-oyster-signature: 8781e472b0f8e3693c1c6cec60b1ae0f5fed4c574d24e3bfcc6cc23f02a918a8785709ceb8a464a7d1dbbb8809ba73047acaa3ff5f1918ba565d82d177e123801b

That’s it! You now have Llama3.2 securely running inside Oyster, with cryptographic verification ensuring trusted communication.