February 25, 2026·By MCPCore Teammcpdeploymentproductiondevops

How to Deploy an MCP Server to Production: A Complete Guide

Everything you need to know to take an MCP server from localhost to a production deployment — HTTPS, authentication, rate limiting, process management, and zero-downtime updates.

Getting an MCP server running locally is the easy part. Taking it to production — where real AI assistants make real calls at unpredictable times — is where most developers run into friction. This guide covers the full production deployment checklist: HTTPS, authentication, process management, rate limiting, and keeping the server running reliably over time.

Why Production Deployment Is Different

A local MCP server on http://localhost:3000 only needs to work for one person on one machine. A production deployment needs to handle:

HTTPS — AI clients like Claude Desktop won't connect to plain HTTP endpoints in production, and most Streamable HTTP transport implementations require TLS.
Persistent process — The server has to keep running if your terminal closes or the machine reboots.
Authentication — You need to control who can call your tools, especially if they access databases, send emails, or modify data.
Rate limiting — Without limits, a runaway AI loop or a poorly-written prompt can exhaust your API quotas or trigger thousands of database operations.
Observability — When something goes wrong at 2am, you need logs that tell you what the AI called, with what parameters, and what failed.

Option 1: Self-Host on a VPS

The most flexible approach. Rent a VPS (DigitalOcean, Hetzner, Linode — any will do), deploy your Node.js server, and put Nginx in front of it.

Server setup

# Install Node.js (via nvm for version control)
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.0/install.sh | bash
nvm install 22
nvm use 22

# Clone and install your server
git clone https://github.com/your-org/your-mcp-server.git
cd your-mcp-server
npm install --production

Process management with PM2

PM2 keeps your process running and auto-restarts it after crashes or reboots.

npm install -g pm2

# Start the server
pm2 start server.js --name "mcp-server"

# Save the process list so it survives reboots
pm2 save
pm2 startup

PM2 also logs stdout and stderr to ~/.pm2/logs/, which is useful for debugging.

HTTPS with Nginx and Let's Encrypt

server {
    listen 443 ssl;
    server_name mcp.yourcompany.com;

    ssl_certificate /etc/letsencrypt/live/mcp.yourcompany.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/mcp.yourcompany.com/privkey.pem;

    location /mcp {
        proxy_pass http://localhost:3000;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection 'upgrade';
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;

        # Needed for SSE streaming
        proxy_buffering off;
        proxy_cache off;
        proxy_read_timeout 300s;
    }
}

The proxy_buffering off directive is critical. Streamable HTTP can return responses as SSE streams, and Nginx's default buffering will break those. Always disable buffering on your MCP endpoint proxy.

Get a free certificate with Certbot:

apt install certbot python3-certbot-nginx
certbot --nginx -d mcp.yourcompany.com

Rate limiting at the application level

In your Node.js server, use a middleware like express-rate-limit:

import rateLimit from "express-rate-limit";

const limiter = rateLimit({
  windowMs: 60 * 1000, // 1 minute
  max: 60,              // 60 requests per minute per IP
  standardHeaders: true,
  legacyHeaders: false,
});

app.use("/mcp", limiter);

For authenticated servers, rate limit by API key rather than IP — otherwise all calls from the same AI client host look like the same IP.

Option 2: Serverless / Edge Functions

If your tools are stateless (no persistent connections, no in-memory state), you can deploy them as serverless functions on Vercel, Cloudflare Workers, or AWS Lambda.

The main advantage is zero infrastructure management: no VPS, no Nginx, no PM2. The tradeoff is cold starts (first request after idle is slow) and no persistent WebSocket connections.

Vercel works well for Node.js MCP servers:

npm install -g vercel
vercel deploy --prod

Your endpoint becomes https://your-project.vercel.app/mcp. Vercel handles TLS automatically.

Watch out: SSE streaming responses from Streamable HTTP may hit function timeout limits on serverless platforms. For tools that return results quickly (under 10 seconds), this isn't an issue. For long-running tools, self-hosting is the safer choice.

Option 3: Docker

Containerizing your MCP server makes deployments reproducible and portable — same image runs on your laptop, on a VPS, on Kubernetes, or anywhere else.

FROM node:22-alpine

WORKDIR /app
COPY package*.json ./
RUN npm install --production

COPY . .

EXPOSE 3000
CMD ["node", "server.js"]

Build and run:

docker build -t my-mcp-server .
docker run -d \
  -p 3000:3000 \
  -e OPENWEATHER_KEY=your-key \
  -e API_KEY=your-bearer-token \
  --restart unless-stopped \
  --name mcp-server \
  my-mcp-server

Secrets are passed as environment variables — never baked into the image.

Option 4: Skip the Infrastructure Entirely

Building and deploying the infrastructure described above takes time — probably a full day for someone doing it for the first time. If your goal is to write tools that connect AI assistants to your APIs and data, not to become a DevOps engineer, there's a faster path.

MCPCore is a hosted platform where you write tool code in a browser-based JavaScript editor, and the server is deployed instantly. HTTPS, process management, rate limiting, and authentication are all handled automatically. You get a live endpoint at https://your-subdomain.mcpcore.io/mcp in under a minute.

MCPCore is a good fit when you want to ship tools quickly without the infrastructure overhead. Self-hosting is the better choice when you have specific compliance requirements or need to keep all computation inside your own network perimeter.

Environment Variables and Secrets

Never hardcode API keys or database passwords in your server code. Use environment variables:

// Good
const apiKey = process.env.STRIPE_SECRET_KEY;

// Bad — never do this
const apiKey = "sk_live_AbCdEf123456...";

For a self-hosted deployment, use a .env file (excluded from version control via .gitignore) loaded with dotenv, or use your hosting provider's secrets management (Vercel's Environment Variables, Railway's Variables panel, AWS Secrets Manager, etc.).

For team deployments where multiple people need to run tools, a secrets vault (HashiCorp Vault, Doppler, AWS SSM Parameter Store) lets you manage credentials centrally without sharing .env files over Slack.

Monitoring and Alerting

Once deployed, set up basic monitoring so you know when the server goes down before your users do.

A simple uptime check (UptimeRobot, BetterUptime) that pings your /health endpoint every minute is usually enough. Add a GET /health endpoint to your server:

app.get("/health", (req, res) => {
  res.json({ status: "ok", timestamp: new Date().toISOString() });
});

For deeper observability — request logs, error tracking, latency metrics — you'll either integrate a logging service (Datadog, Grafana, Logtail) or use a platform that provides this built in. MCPCore includes real-time traffic logs, error logs with full stack traces, and per-server analytics as part of the dashboard, which removes the need to wire up external monitoring tools.

Zero-Downtime Updates

When you update tool code, you don't want to drop in-flight requests. With PM2:

pm2 reload mcp-server

reload performs a rolling restart — the old process stays up until the new one is ready, then the old one is cleanly shut down. Requests in flight are gracefully completed.

With Docker, use rolling updates: start the new container, verify it's healthy, then stop the old one.

Production Checklist

Before going live, confirm all of the following:

HTTPS is enabled and valid (no self-signed certificates)
Authentication is required (public-access tools are an intentional choice, not the default)
Rate limiting is configured
Secrets are in environment variables, not source code
Process manager is running and configured to survive reboots
Error logging is in place
A /health endpoint exists for uptime monitoring
The endpoint URL has been tested from an actual AI client before announcing it to users

MCP deployment patterns are evolving quickly. The Streamable HTTP transport (introduced in protocol version 2025-03-26) replaced the older HTTP+SSE transport — make sure any libraries or tooling you use support the current spec.