In today's rapidly evolving web landscape, the rise of agent programming and the integration of large language models (LLMs) are transforming how we build, test, and monitor applications. As systems become more autonomous and complex, ensuring robust performance and real-time observability is no longer optional—it's essential.
This article walks you through setting up a modern performance testing and monitoring stack using K6, InfluxDB, and Grafana, all within Docker containers. We'll also discuss why these tools matter in the context of agent-driven architectures and LLM-powered applications.
Why Performance Monitoring Matters in the Age of Agents and LLMs
Agent programming—where autonomous software agents interact, learn, and adapt—demands systems that are not only functional but also resilient under unpredictable loads. LLMs, meanwhile, introduce new performance variables: inference latency, API throughput, and dynamic user interactions.
Key reasons to invest in performance monitoring:
- Scalability: Agents and LLMs can generate bursty, unpredictable traffic.
- Reliability: Automated systems must recover gracefully from failures.
- User Experience: Slow responses from LLM-powered features can degrade trust.
- Continuous Improvement: Real-time metrics enable rapid iteration and optimization.
Setting Up Your Performance Testing and Monitoring Stack
Let's get hands-on! Here's how to deploy K6, InfluxDB, and Grafana using Docker, and connect them for seamless performance testing and visualization.
1. Directory Structure
Organize your files for clarity:
tests/performance/
├── docker-compose.yml
├── k6/
│ └── your_test_script.js
└── grafana-provisioning/
├── datasources/
│ └── datasource.yml
└── dashboards/
└── k6-dashboard.json
2. Docker Compose File
Spin up InfluxDB and Grafana with this docker-compose.yml:
version: '3.7'
services:
influxdb:
image: influxdb:1.8
container_name: influxdb
ports:
- "8086:8086"
environment:
- INFLUXDB_DB=k6
- INFLUXDB_ADMIN_USER=admin
- INFLUXDB_ADMIN_PASSWORD=admin123
volumes:
- influxdb-data:/var/lib/influxdb
grafana:
image: grafana/grafana:latest
container_name: grafana
ports:
- "3000:3000"
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
volumes:
- grafana-data:/var/lib/grafana
- ./grafana-provisioning/datasources:/etc/grafana/provisioning/datasources
- ./grafana-provisioning/dashboards:/etc/grafana/provisioning/dashboards
volumes:
influxdb-data:
grafana-data:
3. Provision Grafana Data Source
Create grafana-provisioning/datasources/datasource.yml:
apiVersion: 1
datasources:
- name: InfluxDB
type: influxdb
access: proxy
url: http://influxdb:8086
database: k6
isDefault: true
4. Start the Monitoring Stack
From your tests/performance directory, run:
docker-compose up -d
5. Install and Run K6
Option 1: Local Installation
brew install k6
Option 2: Run K6 in a Docker Container
docker run -i --rm \
-v $(pwd)/k6:/scripts \
loadimpact/k6 run /scripts/your_test_script.js \
--out influxdb=http://influxdb:8086/k6
6. Configure K6 Output
When running K6, direct the results to InfluxDB:
k6 run your_test_script.js --out influxdb=http://influxdb:8086/k6
Or, if running from a container in the same Docker network:
k6 run your_test_script.js --out influxdb=http://influxdb:8086/k6
7. Access Grafana
- Open http://localhost:3000
- Default login:
admin/admin - Import or use a K6 dashboard (JSON file) for real-time visualization.
Summary Table of Components
| Component | Command/Config Example |
|---|---|
| InfluxDB | Docker Compose service, port 8086, DB: k6 |
| Grafana | Docker Compose service, port 3000, provisioned with InfluxDB |
| K6 | k6 run your_test_script.js --out influxdb=http://influxdb:8086/k6 |
Real-World Scenarios: What to Test
- LLM API Endpoints: Measure latency and throughput under concurrent requests.
- Agent Coordination: Simulate multiple agents interacting with your backend.
- User Workflows: Test login, dashboard, and data retrieval flows for bottlenecks.
Sample K6 Test Script
Here's a basic K6 script to get you started:
import http from 'k6/http';
import { check, sleep } from 'k6';
export const options = {
vus: 10, // 10 virtual users
duration: '30s',
thresholds: {
http_req_duration: ['p(95)<500'], // 95% of requests must complete below 500ms
'http_req_duration{name:healthcheck}': ['p(99)<50'], // 99% of healthchecks must complete below 50ms
},
};
export default function() {
// Test API endpoints
let loginRes = http.post('http://localhost:5001/api/login', {
username: 'testuser',
password: 'password123',
});
check(loginRes, {
'login successful': (r) => r.status === 200,
'has auth token': (r) => r.json('token') !== '',
});
// Extract token for authenticated requests
let token = loginRes.json('token');
// Health check endpoint
let healthCheck = http.get('http://localhost:5001/api/health', {
tags: { name: 'healthcheck' },
});
check(healthCheck, {
'status is up': (r) => r.json('status') === 'healthy',
});
// Test main dashboard endpoint
let dashboardRes = http.get('http://localhost:5001/api/dashboard', {
headers: { 'Authorization': `Bearer ${token}` },
});
check(dashboardRes, {
'dashboard loaded': (r) => r.status === 200,
'has dashboard data': (r) => r.json('data') !== null,
});
sleep(1);
}
Advanced Tips
- Thresholds: Set performance thresholds in your K6 scripts (e.g., 95% of requests < 500ms).
- Custom Metrics: Track business-specific KPIs alongside system metrics.
- Alerting: Configure Grafana alerts for anomalies or SLA breaches.
- Progressive Load Testing: Start with few virtual users and gradually increase to find breaking points.
- Distributed Testing: For high-load scenarios, run K6 in distributed mode across multiple machines.
Troubleshooting
Common issues and solutions:
- CORS errors: If K6 can't connect to InfluxDB, check network settings in Docker.
- Missing metrics: Ensure your test script is correctly tagged and structured.
- High memory usage: Consider batching results or using streaming output for long tests.
As agent programming and LLM integration become mainstream, robust performance testing and monitoring are critical. By combining K6, InfluxDB, and Grafana in a Dockerized environment, you gain a scalable, repeatable, and insightful workflow for ensuring your systems are ready for the demands of modern web development.
Ready to level up your observability? Try this stack on your next project and see the difference!
Have questions or want to see more example test scripts and dashboards? Drop an email to [email protected]