SDLC.md is a platform that provides structured markdown templates for documenting your software development lifecycle. It helps teams version their requirements, architecture decisions, and domain knowledge as infrastructure alongside their code.

How does SDLC.md help AI coding assistants?

SDLC.md templates are formatted so AI coding assistants can automatically consume them from your repository root. This gives AI deep understanding of your project structure, conventions, and constraints before generating any code.

Is SDLC.md free to use?

Yes, SDLC.md is completely free. You can copy or download any template and drop it into your repository immediately.

What SDLC phases does SDLC.md cover?

SDLC.md covers planning, design, implementation, testing, and deployment phases. Each phase gets its own markdown section with guidance on ownership, review cadence, and content structure.

Can SDLC.md replace my project management tools?

SDLC.md complements your existing tools rather than replacing them. It provides the context layer that lives in your repository, giving both developers and AI assistants a single source of truth for project knowledge.

How do I add SDLC.md to my project?

Download the SDLC.md template and place it in your repository root. AI coding assistants like Claude and Copilot automatically detect and consume markdown context files from the root directory.

What makes SDLC.md different from a wiki?

Unlike wikis that live outside your codebase, SDLC.md files are versioned in git, reviewed in pull requests, and travel with your code. This ensures context stays current and is accessible to both humans and AI assistants.

SDLC.md - Context as Infrastructure

SDLC Context Best Practices

Proven patterns for structuring your software development lifecycle documentation so AI assistants and team members get maximum value from every file.

Structure by Phase

Organize context files by SDLC phase - planning, design, implementation, testing, deployment. Each phase gets its own markdown section with clear ownership and review cadence.

Version Decision Records

Capture every architecture decision as a versioned ADR in markdown. Include the context, options considered, decision rationale, and consequences. Future developers and AI assistants need the why, not just the what.

Define Stakeholder Context

Document who owns each phase, who reviews, and who approves. Clear ownership in your SDLC context prevents decisions from falling through cracks and helps AI assistants route questions appropriately.

Keep Context Fresh

Stale documentation is worse than no documentation - it actively misleads. Set review cadences per file type: architecture docs quarterly, API specs with every release, onboarding guides monthly.

Make Context Discoverable

Use consistent naming conventions and a root-level index file that maps context files to their purpose. AI assistants and new team members should find the right file in seconds, not minutes.

Layer Your Context

Build context in layers - project overview at the top, domain details in the middle, implementation specifics at the bottom. AI assistants perform best when context flows from broad to narrow.

Test Your Context

Give your context files to a new team member or AI assistant and measure how quickly they produce correct output. If the context does not enable accurate work, iterate on it like you would iterate on code.

Guard Sensitive Context

Mark sections that contain security constraints, compliance requirements, or access control rules. AI assistants need to know what they cannot do as much as what they should do.

The SDLC Principle

Context is not documentation - it is infrastructure. The teams that move fastest are the ones who treat their markdown files with the same rigor as their code: versioned, reviewed, tested, and continuously improved. When your SDLC context is strong, AI assistants write better code, new developers onboard faster, and architectural decisions compound rather than contradict.

The SDLC Template

SDLC.md

# SDLC.md - Software Development Lifecycle Context
<!-- Template for AI coding assistants (CLAUDE.md, .cursorrules, etc.) -->
<!-- Provides full project lifecycle context: architecture, stack, workflows, deployment -->
<!-- Last updated: YYYY-MM-DD -->

## Project Overview

**Project**: Meridian - Customer Analytics Platform
**Version**: 3.2.1
**Status**: Production
**Repository**: https://github.com/acme-corp/meridian
**Team**: Platform Engineering (6 developers, 2 QA)

### Elevator Pitch

Meridian is a real-time customer analytics platform that ingests event data from web and mobile clients, processes it through a streaming pipeline, and serves interactive dashboards to business users. It replaces our legacy batch-processing system with sub-second query performance across 2TB+ of event data.

### Key Business Context

- Serves 340 internal business users across marketing, product, and sales
- Processes 12M events/day from 3 client applications
- SLA: 99.9% uptime, dashboard queries under 2 seconds
- Revenue impact: directly supports $4.2M ARR through customer insights

## Architecture Decisions (ADRs)

### ADR-001: ClickHouse for Analytics Storage
**Status**: Accepted
**Date**: 2026-02-12
**Context**: Our PostgreSQL analytics tables hit 800M rows. Aggregate queries were taking 30+ seconds even with materialized views and proper indexing. Business users complained about dashboard load times daily.
**Decision**: Adopt ClickHouse as the primary analytics data store. PostgreSQL remains for transactional data (users, configs, permissions). Events flow from Kafka into ClickHouse via a custom consumer service.
**Consequences**: Query performance improved from 30s to under 500ms for 95th percentile. Trade-off is that ClickHouse does not support UPDATE/DELETE well, so we use an append-only model with deduplication views. Team needed 3 weeks of ClickHouse training.

### ADR-002: Migrate from REST to GraphQL for Dashboard API
**Status**: Accepted
**Date**: 2026-02-12
**Context**: Dashboard frontend was making 8-12 REST calls per page load to assemble widget data. This created waterfall latency and tight coupling between frontend components and backend endpoints.
**Decision**: Implement a GraphQL gateway (Apollo Server) that sits in front of the existing services. REST endpoints remain for mobile and third-party integrations.
**Consequences**: Frontend page loads dropped from 12 requests to 1-2. Reduced average dashboard load time by 60%. Added complexity to the backend - the GraphQL resolver layer requires its own testing and monitoring. Schema changes require coordination between frontend and backend teams.

### ADR-003: Event-Driven Architecture with Kafka
**Status**: Accepted
**Date**: 2026-02-12
**Context**: The old system used synchronous API calls between services. When the analytics service was slow or down, it cascaded failures to the event ingestion API, causing data loss.
**Decision**: Introduce Apache Kafka as the central event bus. All services publish and consume events asynchronously. Events are persisted in Kafka for 7 days, allowing replay if a consumer falls behind or fails.
**Consequences**: Services are fully decoupled. We can add new consumers without modifying producers. Operational complexity increased - Kafka cluster requires dedicated monitoring and occasional partition rebalancing. Added ~200ms latency for event processing (acceptable for analytics use case).

## Tech Stack

### Frontend
- **Framework**: React 18 with TypeScript
- **State Management**: TanStack Query (server state) + Zustand (UI state)
- **Charting**: Recharts for dashboards, D3.js for custom visualizations
- **Styling**: Tailwind CSS v4 with component library (internal)
- **Build**: Vite 5, deployed as static assets to CloudFront

### Backend
- **Runtime**: Node.js 20 LTS
- **API Layer**: Apollo Server (GraphQL) + Express (REST)
- **Event Processing**: Custom Kafka consumers in Node.js
- **Task Queue**: BullMQ with Redis for scheduled reports and exports
- **Authentication**: Auth0 with SAML SSO for enterprise customers

### Data Layer
- **Transactional DB**: PostgreSQL 16 (AWS RDS) - users, configs, permissions
- **Analytics DB**: ClickHouse cluster (3 nodes) - event data, aggregations
- **Cache**: Redis 7 (ElastiCache) - session data, query cache, rate limiting
- **Message Bus**: Apache Kafka (MSK) - event streaming between services

### Infrastructure
- **Cloud**: AWS (us-east-1 primary, us-west-2 disaster recovery)
- **Container Orchestration**: EKS (Kubernetes 1.29)
- **CI/CD**: GitHub Actions with ArgoCD for GitOps deployments
- **Monitoring**: Datadog (APM, logs, metrics), PagerDuty (alerting)
- **IaC**: Terraform for AWS resources, Helm charts for K8s

## Development Workflows

### Local Setup
```bash
# Clone and install
git clone [email protected]:acme-corp/meridian.git
cd meridian
nvm use 20
npm install

# Start dependencies (Postgres, Redis, ClickHouse, Kafka)
docker compose up -d

# Environment setup
cp .env.example .env
# Edit .env - get secrets from 1Password vault "Meridian Dev"

# Run database migrations
npm run db:migrate
npm run db:seed

# Start development (all services)
npm run dev
# Frontend: http://localhost:5173
# GraphQL Playground: http://localhost:4000/graphql
# REST API: http://localhost:4000/api/v1
```

### Branch Strategy
- `main` - Production-ready code, deploys automatically
- `staging` - Pre-production integration, deploys to staging environment
- `feature/*` - New features, branch from `main`
- `hotfix/*` - Production fixes, branch from `main`, merge to both `main` and `staging`

### Code Review Process
1. Create feature branch from `main`
2. Implement changes with tests (minimum 80% coverage on new code)
3. Open PR with description template filled out (what, why, how to test)
4. Automated checks must pass: lint, type-check, unit tests, integration tests
5. Require 1 approval from code owner for the affected area
6. Squash and merge - PR title becomes the commit message

### Commit Convention
```
feat: add retention cohort chart to dashboard
fix: resolve timezone offset in event timestamps
perf: add ClickHouse projection for top-events query
refactor: extract shared auth middleware to @meridian/auth
docs: update API changelog for v3.2 release
test: add integration tests for GraphQL mutations
chore: upgrade TanStack Query to v5
```

## Deployment Procedures

### Staging Deployment
```bash
# Automated on merge to staging branch
git checkout staging
git merge feature/my-feature
git push origin staging
# GitHub Actions: lint -> test -> build -> deploy to EKS staging
# URL: https://staging.meridian.internal.acme.com
# Slack notification sent to #meridian-deploys
```

### Production Deployment
```bash
# Merge staging to main (after QA sign-off)
git checkout main
git merge staging
git push origin main
# GitHub Actions: full test suite -> build -> push to ECR -> ArgoCD sync
# ArgoCD performs rolling update (zero-downtime)
# Canary: 10% traffic for 15 minutes, then full rollout
# URL: https://meridian.acme.com
```

### Rollback Procedure
```bash
# Option 1: ArgoCD rollback (fastest, under 2 minutes)
argocd app rollback meridian-prod --revision [previous-revision]

# Option 2: Git revert (creates audit trail)
git revert [commit-hash]
git push origin main
# ArgoCD auto-syncs the revert

# Option 3: Emergency - direct image rollback
kubectl set image deployment/meridian-api api=meridian-api:[previous-tag] -n production
```

### Database Migrations
```bash
# Migrations run automatically during deployment
# For manual execution:
npm run db:migrate          # Apply pending migrations
npm run db:migrate:status   # Check migration status
npm run db:migrate:rollback # Rollback last migration

# ClickHouse migrations are separate:
npm run ch:migrate          # Apply ClickHouse DDL changes
```

## Critical Dependencies

- **Auth0** (authentication) - If Auth0 is down, no users can log in. Mitigation: JWT tokens are validated locally with cached JWKS, existing sessions continue to work for up to 1 hour.
- **Kafka (MSK)** (event pipeline) - If Kafka is down, events queue in the ingestion API (up to 10K events in memory buffer, then rejected with 503). Events are not lost if Kafka recovers within the buffer window.
- **ClickHouse** (analytics queries) - If ClickHouse is down, dashboards show cached data (up to 5 minutes stale) and a degraded mode banner. Transactional features (user management, project config) are unaffected.
- **Redis** (caching/sessions) - If Redis is down, query performance degrades (every query hits ClickHouse directly) and rate limiting is disabled. Sessions fall back to JWT-only validation.

## Known Issues and Technical Debt

- [ ] **ClickHouse dedup lag** - Deduplication views can lag up to 30 seconds during high-throughput periods. Users occasionally see duplicate events in real-time dashboards. Priority: Medium. Workaround: client-side dedup in the dashboard query layer.
- [ ] **GraphQL N+1 in nested resolvers** - The `project.members.recentActivity` resolver generates N+1 queries when loading team dashboards with 20+ members. Priority: High. Fix planned: implement DataLoader batching in Sprint 47.
- [ ] **Legacy REST endpoints** - 14 REST endpoints are still used by the mobile app (v2.x). These duplicate logic that now lives in GraphQL resolvers. Priority: Low. Plan: deprecate after mobile v3 ships (Q3 2026).
- [ ] **Test coverage gap in Kafka consumers** - Consumer error handling paths have ~40% test coverage. Priority: Medium. Integration tests are hard to write because they require a running Kafka instance.

## Environment Configuration

| Variable | Required | Description |
|----------|----------|-------------|
| `DATABASE_URL` | Yes | PostgreSQL connection string |
| `CLICKHOUSE_URL` | Yes | ClickHouse HTTP endpoint |
| `REDIS_URL` | Yes | Redis connection string |
| `KAFKA_BROKERS` | Yes | Comma-separated Kafka broker addresses |
| `AUTH0_DOMAIN` | Yes | Auth0 tenant domain |
| `AUTH0_CLIENT_ID` | Yes | Auth0 application client ID |
| `AUTH0_CLIENT_SECRET` | Yes | Auth0 application client secret |
| `NODE_ENV` | No | Environment mode (development/staging/production) |
| `LOG_LEVEL` | No | Logging verbosity (debug/info/warn/error) |
| `GRAPHQL_INTROSPECTION` | No | Enable GraphQL introspection (disabled in prod) |

## Contact and Resources

- **Tech Lead**: [Name] - [Email/Slack handle]
- **Product Manager**: [Name] - [Email/Slack handle]
- **On-Call Rotation**: See #meridian-oncall channel topic in Slack
- **Runbook**: [Link to operational runbook]
- **Architecture Diagram**: [Link to Miro/Lucidchart]
- **Monitoring Dashboard**: [Link to Datadog dashboard]
- **Incident Response**: Page on-call via PagerDuty, escalation in #meridian-incidents

SDLC.md - Context as Infrastructure

Architect Your Development Lifecycle with Context

SDLC Context Best Practices

Structure by Phase

Version Decision Records

Define Stakeholder Context

Keep Context Fresh

Make Context Discoverable

Layer Your Context

Test Your Context

Guard Sensitive Context

The SDLC Principle

The SDLC Template

Why Markdown Matters for AI-Native Development

Context as Infrastructure

Markdown as Substrate

Requirements as Code

Explore More Templates

About SDLC.md

Our Mission

Why Markdown Matters

AI-Native

Version Control

Human Readable