Table of Contents
- Introduction
- Understanding the Foundations
- Setting Up Your Django Project
- Integrating LLM Models with Django
- Building Views and API Endpoints
- Database Design for LLM Applications
- Frontend Integration with HTMX
- Advanced Patterns and Best Practices
- Scaling and Performance Optimization
- Deployment to Production
- Resources and Further Learning
Introduction
Building web applications that leverage Large Language Models (LLMs) has become increasingly accessible to Django developers. Whether you’re creating an AI-powered chatbot, content generation tool, or intelligent assistant, Django provides a robust framework for integrating LLMs into production applications.
This comprehensive guide walks you through every step—from initial project setup to deploying a fully functional LLM-powered Django application. You’ll learn how to structure your code, handle API calls efficiently, manage costs, and ensure your application scales reliably under production loads.
Understanding the Foundations
What Are Large Language Models?
Large Language Models are sophisticated neural networks trained on vast amounts of text data. They can perform tasks like text generation, summarization, question-answering, and code completion. Popular LLMs include GPT-4, Mistral, LLaMA, and open-source alternatives.
Why Django for LLM Applications?
Django excels at building LLM applications because it provides:
- Batteries-included framework: Built-in admin panel, ORM, authentication, and security features
- Rapid development: Clean, pragmatic design patterns accelerate development
- Scalability: Proven in production at scale across thousands of applications
- Rich ecosystem: Extensive third-party packages for AI integration
- Security: Built-in protection against common vulnerabilities
Key Technologies You’ll Use
- Django: Web framework for building the application
- Django REST Framework: Optional but recommended for API development
- LangChain: Simplifies LLM orchestration and chains
- Transformers library: Access to thousands of pretrained models
- HTMX: Modern frontend interactivity without heavy JavaScript
- PostgreSQL: Recommended database for production
- Celery: For asynchronous task processing
- Docker: Containerization for consistent deployment
Setting Up Your Django Project
Installation and Initial Setup
Start by creating a virtual environment and installing Django:
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install Django
pip install Django
Create a new Django project:
django-admin startproject myproject
cd myproject
Create a new Django app for your LLM functionality:
python manage.py startapp llm_app
Project Structure
Organize your project with a clear structure:
myproject/
├── myproject/
│ ├── __init__.py
│ ├── settings.py
│ ├── urls.py
│ └── wsgi.py
├── llm_app/
│ ├── migrations/
│ ├── __init__.py
│ ├── admin.py
│ ├── apps.py
│ ├── models.py
│ ├── views.py
│ ├── urls.py
│ ├── serializers.py
│ ├── services.py
│ └── templates/
├── static/
│ ├── css/
│ ├── js/
│ └── htmx.min.js
├── templates/
│ └── base.html
├── manage.py
├── requirements.txt
└── .env
Installing Essential Dependencies
Create a requirements.txt file with all necessary packages:
Django==4.2.0
djangorestframework==3.14.0
python-dotenv==1.0.0
langchain==0.1.0
langchain-openai==0.0.5
openai==1.3.0
transformers==4.35.0
torch==2.1.0
psycopg2-binary==2.9.9
celery==5.3.4
redis==5.0.0
requests==2.31.0
python-decouple==3.8
Install all dependencies:
pip install -r requirements.txt
Configuring Django Settings
Update your settings.py with essential configurations:
import os
from pathlib import Path
from decouple import config
BASE_DIR = Path(__file__).resolve().parent.parent
SECRET_KEY = config('SECRET_KEY', default='your-secret-key-here')
DEBUG = config('DEBUG', default=False, cast=bool)
ALLOWED_HOSTS = config('ALLOWED_HOSTS', default='localhost,127.0.0.1', cast=lambda v: [s.strip() for s in v.split(',')])
INSTALLED_APPS = [
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'django.contrib.staticfiles',
'rest_framework',
'llm_app',
]
MIDDLEWARE = [
'django.middleware.security.SecurityMiddleware',
'django.contrib.sessions.middleware.SessionMiddleware',
'django.middleware.common.CommonMiddleware',
'django.middleware.csrf.CsrfViewMiddleware',
'django.contrib.auth.middleware.AuthenticationMiddleware',
'django.contrib.messages.middleware.MessageMiddleware',
'django.middleware.clickjacking.XFrameOptionsMiddleware',
]
ROOT_URLCONF = 'myproject.urls'
TEMPLATES = [
{
'BACKEND': 'django.template.backends.django.DjangoTemplates',
'DIRS': [BASE_DIR / 'templates'],
'APP_DIRS': True,
'OPTIONS': {
'context_processors': [
'django.template.context_processors.debug',
'django.template.context_processors.request',
'django.contrib.auth.context_processors.auth',
'django.contrib.messages.context_processors.messages',
],
},
},
]
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql',
'NAME': config('DB_NAME', default='llm_db'),
'USER': config('DB_USER', default='postgres'),
'PASSWORD': config('DB_PASSWORD', default='password'),
'HOST': config('DB_HOST', default='localhost'),
'PORT': config('DB_PORT', default='5432'),
}
}
# REST Framework Configuration
REST_FRAMEWORK = {
'DEFAULT_PAGINATION_CLASS': 'rest_framework.pagination.PageNumberPagination',
'PAGE_SIZE': 10,
'DEFAULT_AUTHENTICATION_CLASSES': [
'rest_framework.authentication.TokenAuthentication',
],
'DEFAULT_PERMISSION_CLASSES': [
'rest_framework.permissions.IsAuthenticated',
],
}
# Celery Configuration
CELERY_BROKER_URL = config('CELERY_BROKER_URL', default='redis://localhost:6379')
CELERY_RESULT_BACKEND = config('CELERY_RESULT_BACKEND', default='redis://localhost:6379')
CELERY_ACCEPT_CONTENT = ['json']
CELERY_TASK_SERIALIZER = 'json'
# LLM Configuration
OPENAI_API_KEY = config('OPENAI_API_KEY', default='')
LLM_MODEL = config('LLM_MODEL', default='gpt-3.5-turbo')
LLM_TEMPERATURE = config('LLM_TEMPERATURE', default=0.7, cast=float)
STATIC_URL = '/static/'
STATIC_ROOT = BASE_DIR / 'staticfiles'
STATICFILES_DIRS = [BASE_DIR / 'static']
MEDIA_URL = '/media/'
MEDIA_ROOT = BASE_DIR / 'media'
Create a .env file in your project root:
SECRET_KEY=your-super-secret-key-change-in-production
DEBUG=True
ALLOWED_HOSTS=localhost,127.0.0.1
DB_NAME=llm_db
DB_USER=postgres
DB_PASSWORD=your_password
DB_HOST=localhost
DB_PORT=5432
OPENAI_API_KEY=your-openai-api-key
LLM_MODEL=gpt-3.5-turbo
CELERY_BROKER_URL=redis://localhost:6379
Integrating LLM Models with Django
Understanding LLM Integration Approaches
There are several ways to integrate LLMs with Django:
- API-based: Call external LLM services (OpenAI, Anthropic, Mistral)
- Local models: Run open-source models locally using Transformers
- Hybrid approach: Use external APIs for complex tasks, local models for simple ones
Creating an LLM Service Layer
The service layer pattern separates LLM logic from your views, making code more maintainable:
# llm_app/services.py
import os
from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from decouple import config
class LLMService:
def __init__(self):
self.api_key = config('OPENAI_API_KEY')
self.model_name = config('LLM_MODEL', default='gpt-3.5-turbo')
self.temperature = config('LLM_TEMPERATURE', default=0.7, cast=float)
self.chat_model = ChatOpenAI(
api_key=self.api_key,
model=self.model_name,
temperature=self.temperature
)
def generate_response(self, prompt: str, context: str = "") -> str:
"""Generate a response from the LLM"""
template = """
You are a helpful assistant. Use the provided context to answer questions accurately.
Context: {context}
User Query: {prompt}
Answer:
"""
prompt_template = ChatPromptTemplate.from_template(template)
chain = prompt_template | self.chat_model | StrOutputParser()
response = chain.invoke({
"context": context,
"prompt": prompt
})
return response
def summarize_text(self, text: str) -> str:
"""Summarize provided text"""
template = "Please provide a concise summary of the following text:\n\n{text}"
prompt_template = ChatPromptTemplate.from_template(template)
chain = prompt_template | self.chat_model | StrOutputParser()
summary = chain.invoke({"text": text})
return summary
def extract_entities(self, text: str) -> dict:
"""Extract named entities from text"""
template = """Extract all named entities (people, places, organizations) from the following text.
Return as JSON format.
Text: {text}"""
prompt_template = ChatPromptTemplate.from_template(template)
chain = prompt_template | self.chat_model | StrOutputParser()
result = chain.invoke({"text": text})
return result
# Create a singleton instance
llm_service = LLMService()
Using Transformers for Local Models
For on-premises deployments or cost savings, use local models:
# llm_app/local_llm_service.py
from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM
import torch
class LocalLLMService:
def __init__(self, model_name: str = "gpt2"):
"""Initialize with a local model"""
self.device = "cuda" if torch.cuda.is_available() else "cpu"
self.model_name = model_name
# Load tokenizer and model
self.tokenizer = AutoTokenizer.from_pretrained(model_name)
self.model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.float16 if self.device == "cuda" else torch.float32
).to(self.device)
# Create pipeline for text generation
self.generator = pipeline(
'text-generation',
model=self.model,
tokenizer=self.tokenizer,
device=0 if self.device == "cuda" else -1
)
def generate_response(self, prompt: str, max_length: int = 100) -> str:
"""Generate response using local model"""
result = self.generator(
prompt,
max_length=max_length,
num_return_sequences=1,
temperature=0.7,
top_p=0.9
)
return result['generated_text']
def summarize_text(self, text: str) -> str:
"""Summarize using extractive summarization"""
from transformers import pipeline
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
summary = summarizer(text, max_length=100, min_length=30, do_sample=False)
return summary['summary_text']
# Initialize with a specific model
local_llm = LocalLLMService(model_name="gpt2")
Building Views and API Endpoints
Creating Django Models for LLM Data
Define models to store conversations and results:
# llm_app/models.py
from django.db import models
from django.contrib.auth.models import User
from django.utils import timezone
class Conversation(models.Model):
"""Store conversation history"""
user = models.ForeignKey(User, on_delete=models.CASCADE, related_name='conversations')
title = models.CharField(max_length=255, blank=True)
created_at = models.DateTimeField(auto_now_add=True)
updated_at = models.DateTimeField(auto_now=True)
is_archived = models.BooleanField(default=False)
class Meta:
ordering = ['-updated_at']
def __str__(self):
return self.title or f"Conversation {self.id}"
class Message(models.Model):
"""Store individual messages in conversations"""
ROLE_CHOICES = [
('user', 'User'),
('assistant', 'Assistant'),
]
conversation = models.ForeignKey(
Conversation,
on_delete=models.CASCADE,
related_name='messages'
)
role = models.CharField(max_length=10, choices=ROLE_CHOICES)
content = models.TextField()
tokens_used = models.IntegerField(default=0)
created_at = models.DateTimeField(auto_now_add=True)
class Meta:
ordering = ['created_at']
def __str__(self):
return f"{self.role}: {self.content[:50]}"
class APIUsageLog(models.Model):
"""Track API usage for cost monitoring"""
user = models.ForeignKey(User, on_delete=models.CASCADE)
model_used = models.CharField(max_length=100)
input_tokens = models.IntegerField()
output_tokens = models.IntegerField()
cost = models.DecimalField(max_digits=10, decimal_places=6)
timestamp = models.DateTimeField(auto_now_add=True)
class Meta:
ordering = ['-timestamp']
def __str__(self):
return f"{self.user.username} - {self.model_used} - {self.timestamp}"
class Document(models.Model):
"""Store documents for RAG (Retrieval-Augmented Generation)"""
user = models.ForeignKey(User, on_delete=models.CASCADE)
title = models.CharField(max_length=255)
content = models.TextField()
file = models.FileField(upload_to='documents/', null=True, blank=True)
created_at = models.DateTimeField(auto_now_add=True)
embedding_vector = models.JSONField(null=True, blank=True)
def __str__(self):
return self.title
Run migrations:
python manage.py makemigrations
python manage.py migrate
Creating API Serializers
# llm_app/serializers.py
from rest_framework import serializers
from .models import Conversation, Message, Document
class MessageSerializer(serializers.ModelSerializer):
class Meta:
model = Message
fields = ['id', 'role', 'content', 'tokens_used', 'created_at']
read_only_fields = ['id', 'created_at', 'tokens_used']
class ConversationSerializer(serializers.ModelSerializer):
messages = MessageSerializer(many=True, read_only=True)
class Meta:
model = Conversation
fields = ['id', 'title', 'created_at', 'updated_at', 'is_archived', 'messages']
read_only_fields = ['id', 'created_at', 'updated_at']
class DocumentSerializer(serializers.ModelSerializer):
class Meta:
model = Document
fields = ['id', 'title', 'content', 'file', 'created_at']
read_only_fields = ['id', 'created_at']
class ChatRequestSerializer(serializers.Serializer):
"""Validate incoming chat requests"""
message = serializers.CharField(max_length=4000)
conversation_id = serializers.IntegerField(required=False)
context = serializers.CharField(required=False, allow_blank=True)
Building REST API Views
# llm_app/views.py
from rest_framework import viewsets, status
from rest_framework.decorators import action
from rest_framework.response import Response
from rest_framework.permissions import IsAuthenticated
from django.shortcuts import get_object_or_404
from .models import Conversation, Message, Document
from .serializers import (
ConversationSerializer,
MessageSerializer,
DocumentSerializer,
ChatRequestSerializer
)
from .services import llm_service
from .tasks import process_llm_request
class ConversationViewSet(viewsets.ModelViewSet):
"""Handle conversation CRUD operations"""
serializer_class = ConversationSerializer
permission_classes = [IsAuthenticated]
def get_queryset(self):
return Conversation.objects.filter(user=self.request.user)
def perform_create(self, serializer):
serializer.save(user=self.request.user)
@action(detail=True, methods=['post'])
def send_message(self, request, pk=None):
"""Send a message in a conversation"""
conversation = self.get_object()
serializer = ChatRequestSerializer(data=request.data)
if serializer.is_valid():
message_text = serializer.validated_data['message']
context = serializer.validated_data.get('context', '')
# Save user message
user_message = Message.objects.create(
conversation=conversation,
role='user',
content=message_text
)
# Generate LLM response asynchronously
process_llm_request.delay(
conversation_id=conversation.id,
user_message_id=user_message.id,
context=context
)
return Response({
'status': 'processing',
'message_id': user_message.id
}, status=status.HTTP_202_ACCEPTED)
return Response(serializer.errors, status=status.HTTP_400_BAD_REQUEST)
@action(detail=True, methods=['get'])
def messages(self, request, pk=None):
"""Retrieve all messages in a conversation"""
conversation = self.get_object()
messages = conversation.messages.all()
serializer = MessageSerializer(messages, many=True)
return Response(serializer.data)
@action(detail=True, methods=['post'])
def archive(self, request, pk=None):
"""Archive a conversation"""
conversation = self.get_object()
conversation.is_archived = True
conversation.save()
return Response({'status': 'archived'})
class DocumentViewSet(viewsets.ModelViewSet):
"""Handle document uploads and management"""
serializer_class = DocumentSerializer
permission_classes = [IsAuthenticated]
def get_queryset(self):
return Document.objects.filter(user=self.request.user)
def perform_create(self, serializer):
serializer.save(user=self.request.user)
URL Configuration
# llm_app/urls.py
from django.urls import path, include
from rest_framework.routers import DefaultRouter
from .views import ConversationViewSet, DocumentViewSet
router = DefaultRouter()
router.register(r'conversations', ConversationViewSet, basename='conversation')
router.register(r'documents', DocumentViewSet, basename='document')
urlpatterns = [
path('api/', include(router.urls)),
]
# In myproject/urls.py
from django.contrib import admin
from django.urls import path, include
from django.conf import settings
from django.conf.urls.static import static
urlpatterns = [
path('admin/', admin.site.urls),
path('api-auth/', include('rest_framework.urls')),
path('llm/', include('llm_app.urls')),
] + static(settings.MEDIA_URL, document_root=settings.MEDIA_ROOT)
Database Design for LLM Applications
Optimizing Database Schema
For optimal performance with LLM applications:
# llm_app/models.py - Enhanced with indexing
from django.db import models
class Message(models.Model):
conversation = models.ForeignKey('Conversation', on_delete=models.CASCADE)
role = models.CharField(max_length=10, db_index=True)
content = models.TextField()
tokens_used = models.IntegerField()
created_at = models.DateTimeField(auto_now_add=True, db_index=True)
class Meta:
indexes = [
models.Index(fields=['conversation', 'created_at']),
models.Index(fields=['role', 'created_at']),
]
class APIUsageLog(models.Model):
user = models.ForeignKey(User, on_delete=models.CASCADE, db_index=True)
model_used = models.CharField(max_length=100, db_index=True)
input_tokens = models.IntegerField()
output_tokens = models.IntegerField()
cost = models.DecimalField(max_digits=10, decimal_places=6)
timestamp = models.DateTimeField(auto_now_add=True, db_index=True)
class Meta:
indexes = [
models.Index(fields=['user', 'timestamp']),
models.Index(fields=['model_used', 'timestamp']),
]
Implementing Pagination and Caching
# llm_app/views.py - With caching
from django.views.decorators.cache import cache_page
from django.core.cache import cache
from rest_framework.pagination import PageNumberPagination
class MessagePagination(PageNumberPagination):
page_size = 20
page_size_query_param = 'page_size'
max_page_size = 100
class ConversationViewSet(viewsets.ModelViewSet):
pagination_class = MessagePagination
@action(detail=True, methods=['get'])
def messages(self, request, pk=None):
conversation = self.get_object()
# Check cache first
cache_key = f'conversation_{pk}_messages'
cached_messages = cache.get(cache_key)
if cached_messages:
return Response(cached_messages)
messages = conversation.messages.all()
serializer = MessageSerializer(messages, many=True)
# Cache for 5 minutes
cache.set(cache_key, serializer.data, 300)
return Response(serializer.data)
Frontend Integration with HTMX
Setting Up HTMX
HTMX enables dynamic interactions without writing complex JavaScript:
<!-- templates/base.html -->
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Django LLM App</title>
<script src="https://unpkg.com/htmx.org@1.9.10"></script>
<style>
* {
margin: 0;
padding: 0;
box-sizing: border-box;
}
body {
font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, sans-serif;
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
min-height: 100vh;
padding: 20px;
}
.container {
max-width: 1200px;
margin: 0 auto;
background: white;
border-radius: 12px;
box-shadow: 0 20px 60px rgba(0,0,0,0.3);
overflow: hidden;
}
.chat-container {
display: flex;
height: 600px;
}
.messages {
flex: 1;
overflow-y: auto;
padding: 20px;
background: #f8f9fa;
}
.message {
margin-bottom: 15px;
padding: 12px 16px;
border-radius: 8px;
max-width: 70%;
word-wrap: break-word;
}
.message.user {
background: #667eea;
color: white;
margin-left: auto;
}
.message.assistant {
background: white;
border: 1px solid #e0e0e0;
}
.input-area {
padding: 20px;
border-top: 1px solid #e0e0e0;
display: flex;
gap: 10px;
}
.input-area input {
flex: 1;
padding: 12px;
border: 1px solid #ddd;
border-radius: 8px;
font-size: 14px;
}
.input-area button {
padding: 12px 24px;
background: #667eea;
color: white;
border: none;
border-radius: 8px;
cursor: pointer;
font-weight: 600;
}
.input-area button:hover {
background: #5568d3;
}
.htmx-request .htmx-indicator {
display: inline-block;
}
.htmx-indicator {
display: none;
}
.spinner {
border: 3px solid #f3f3f3;
border-top: 3px solid #667eea;
border-radius: 50%;
width: 20px;
height: 20px;
animation: spin 1s linear infinite;
}
@keyframes spin {
0% { transform: rotate(0deg); }
100% { transform: rotate(360deg); }
}
</style>
</head>
<body>
<div class="container">
{% block content %}{% endblock %}
</div>
</body>
</html>
Creating HTMX-Powered Chat Interface
<!-- templates/chat.html -->
{% extends 'base.html' %}
{% block content %}
<div class="chat-container">
<div class="messages" id="messages">
{% for message in conversation.messages.all %}
<div class="message {{ message.role }}">
{{ message.content }}
</div>
{% endfor %}
</div>
<div class="input-area">
<form hx-post="{% url 'conversation-send-message' conversation.id %}"
hx-target="#messages"
hx-swap="beforeend"
hx-on::after-request="if(event.detail.xhr.status===202) this.reset()">
{% csrf_token %}
<input type="text"
name="message"
placeholder="Type your message..."
required>
<button type="submit">
Send
<span class="htmx-indicator">
<div class="spinner"></div>
</span>
</button>
</form>
</div>
</div>
<script>
// Auto-scroll to latest message
const messagesDiv = document.getElementById('messages');
messagesDiv.scrollTop = messagesDiv.scrollHeight;
document.body.addEventListener('htmx:afterSwap', function() {
messagesDiv.scrollTop = messagesDiv.scrollHeight;
});
</script>
{% endblock %}
HTMX View for Message Rendering
# llm_app/views.py - Add template view
from django.views.generic import TemplateView
from django.contrib.auth.mixins import LoginRequiredMixin
class ChatTemplateView(LoginRequiredMixin, TemplateView):
template_name = 'chat.html'
def get_context_data(self, **kwargs):
context = super().get_context_data(**kwargs)
conversation_id = self.kwargs.get('conversation_id')
context['conversation'] = get_object_or_404(
Conversation,
id=conversation_id,
user=self.request.user
)
return context
Advanced Patterns and Best Practices
Implementing Asynchronous Task Processing with Celery
For long-running LLM requests, use Celery:
# llm_app/tasks.py
from celery import shared_task
from django.core.mail import send_mail
from .models import Conversation, Message, APIUsageLog
from .services import llm_service
import logging
logger = logging.getLogger(__name__)
@shared_task(bind=True, max_retries=3)
def process_llm_request(self, conversation_id, user_message_id, context=''):
"""Process LLM request asynchronously"""
try: