Infrastructure Guide
1.0.0Overview of Pulumi stacks and escape hatches.
On this page
- App Factory Infrastructure Documentation
- π Table of Contents
- ποΈ Architecture Overview
- High-Level Architecture
- Technology Stack
- π Infrastructure Stacks
- AWS Stack (infra/pulumi-aws)
- GCP Stack (infra/pulumi-gcp)
- π Deployment Guide
- Prerequisites
- Quick Deployment
- AWS Deployment
- GCP Deployment
- Using the Example Script (GCP)
- Environment-Specific Configurations
- Development Environment
- Production Environment
- π° Cost Management
- Cost Optimization Strategies
- AWS Cost Controls
- GCP Cost Controls
- Cost Monitoring
- AWS Cost Monitoring
- GCP Cost Monitoring
- Estimated Monthly Costs
- Development Environment
- Production Environment
- π Monitoring & Troubleshooting
- Health Checks
- AWS Health Checks
- GCP Health Checks
- Common Issues & Solutions
- Database Connection Issues
- Storage Access Issues
- API Gateway/Load Balancer Issues
- π Runbooks
- Daily Operations
- Morning Health Check
- Weekly Maintenance
- Incident Response
- Database Outage Response
- API Service Outage
- Disaster Recovery
- Complete Stack Recovery
- π Security Best Practices
- Network Security
- Access Control
- Monitoring & Alerting
- π Infrastructure Teardown
- Safe Teardown Procedure
- Pre-Teardown Checklist
- Teardown Commands
- Post-Teardown Cleanup
- Emergency Teardown
- π Support & Resources
App Factory Infrastructure Documentation
This document provides comprehensive information about the optional App Factory infrastructure, including architecture diagrams, deployment commands, cost controls, and troubleshooting guides.
Note: The active platform runs on Supabase + Vercel by default. Use these AWS/GCP stacks when a customer graduates to dedicated infrastructure or compliance requires it.
π Table of Contents
- Architecture Overview
- Infrastructure Stacks
- Deployment Guide
- Cost Management
- Monitoring & Troubleshooting
- Runbooks
- Security Best Practices
ποΈ Architecture Overview
App Factory uses a multi-cloud approach with support for both AWS and GCP deployments. The infrastructure is provisioned using Pulumi with TypeScript for infrastructure as code.
High-Level Architecture
graph TB
subgraph "Frontend"
WEB[Web Apps<br/>Next.js/React]
MOBILE[Mobile Apps<br/>React Native]
end
subgraph "CDN & Load Balancing"
CDN[CloudFront/Cloud CDN]
LB[Load Balancer]
end
subgraph "API Layer"
API[API Gateway/Cloud Run]
LAMBDA[Lambda Functions]
end
subgraph "Storage"
ASSETS[Assets Bucket]
UPLOADS[Uploads Bucket]
DB[(PostgreSQL Database)]
end
subgraph "External Services"
STRIPE[Stripe Payments]
AI[AI Providers<br/>OpenAI/Anthropic]
ANALYTICS[Analytics<br/>Mixpanel/Amplitude]
end
WEB --> CDN
MOBILE --> CDN
CDN --> LB
LB --> API
API --> LAMBDA
LAMBDA --> DB
LAMBDA --> ASSETS
LAMBDA --> UPLOADS
API --> STRIPE
API --> AI
API --> ANALYTICS
Technology Stack
| Component | AWS | GCP | Purpose |
|---|---|---|---|
| Compute | Lambda | Cloud Run | Serverless API functions |
| Database | RDS PostgreSQL | Cloud SQL PostgreSQL | Primary data storage |
| Storage | S3 | Cloud Storage | Static assets & uploads |
| CDN | CloudFront | Cloud CDN | Global content delivery |
| API Gateway | API Gateway | Load Balancer | HTTP API routing |
| Networking | VPC | VPC | Network isolation |
| IAM | IAM Roles | Service Accounts | Access control |
π Infrastructure Stacks
AWS Stack (infra/pulumi-aws)
Components:
- VPC: Secure network with public/private subnets across 2 AZs
- RDS PostgreSQL: Encrypted database with automated backups
- S3 Buckets: Secure storage with encryption and lifecycle policies
- Lambda: Serverless compute with VPC integration
- API Gateway: HTTP API with CORS support
- CloudFront: Global CDN with S3 and API Gateway origins
- Security Groups: Network access controls
- IAM: Least-privilege roles and policies
Resource Naming Convention:
{app-name}-{environment}-{resource-type}
Example: focus-ai-production-database
GCP Stack (infra/pulumi-gcp)
Components:
- Cloud SQL PostgreSQL: Managed database with SSL encryption
- Cloud Storage: Secure buckets with lifecycle policies
- Cloud Run: Serverless container platform
- Cloud CDN: Global content delivery network
- Load Balancer: HTTP(S) load balancing
- IAM: Service accounts with minimal permissions
- API Services: Automatic enablement of required APIs
Resource Naming Convention:
{app-name}-{environment}-{resource-type}
Example: focus-ai-prod-db
π Deployment Guide
Prerequisites
-
Required Tools:
# Install Pulumi CLI curl -fsSL https://get.pulumi.com | sh # Install Node.js 22+ nvm install 22 nvm use 22 # Install pnpm npm install -g pnpm@10 -
Cloud Provider Setup:
AWS:
# Install AWS CLI pip install awscli # Configure credentials aws configureGCP:
# Install Google Cloud SDK curl https://sdk.cloud.google.com | bash # Authenticate gcloud auth login gcloud config set project YOUR_PROJECT_ID
Quick Deployment
AWS Deployment
# Navigate to AWS infrastructure
cd infra/pulumi-aws
# Install dependencies and build
pnpm install
pnpm build
# Initialize Pulumi stack
pulumi stack init focus-ai-production
# Configure stack
pulumi config set app:name focus-ai
pulumi config set app:environment production
pulumi config set aws:region us-east-1
# Deploy infrastructure
pulumi preview # Review changes
pulumi up # Deploy
GCP Deployment
# Navigate to GCP infrastructure
cd infra/pulumi-gcp
# Install dependencies and build
pnpm install
pnpm build
# Initialize Pulumi stack
pulumi stack init focus-ai-production
# Configure stack
pulumi config set app:name focus-ai
pulumi config set app:environment production
pulumi config set gcp:project your-gcp-project-id
pulumi config set gcp:region us-central1
# Deploy infrastructure
pulumi preview # Review changes
pulumi up # Deploy
Using the Example Script (GCP)
# Set environment variables
export APP_NAME="focus-ai"
export ENVIRONMENT="production"
export GCP_PROJECT="your-gcp-project-id"
export GCP_REGION="us-central1"
# Run deployment script
cd infra/pulumi-gcp
./example-deploy.sh
Environment-Specific Configurations
Development Environment
- Purpose: Testing and development
- Resources: Minimal sizing for cost optimization
- Retention: Shorter backup retention periods
- Protection: Deletion protection disabled
pulumi config set app:environment development
Production Environment
- Purpose: Live applications
- Resources: Production-grade sizing and redundancy
- Retention: Extended backup retention
- Protection: Deletion protection enabled
pulumi config set app:environment production
π° Cost Management
Cost Optimization Strategies
AWS Cost Controls
-
RDS Optimization:
- Use GP3 storage for better price/performance
- Environment-specific instance sizing
- Automated backup retention policies
-
Lambda Optimization:
- Right-sized memory allocation
- Efficient cold start handling
- VPC integration optimization
-
S3 Optimization:
- Lifecycle policies for automatic cleanup
- Intelligent tiering for infrequently accessed data
- CloudFront caching to reduce origin requests
-
CloudFront Optimization:
- Limited to PriceClass_100 (North America & Europe)
- Optimized caching policies
- Compression enabled
GCP Cost Controls
-
Cloud SQL Optimization:
- Environment-specific machine types
- Automatic storage increase limits
- Scheduled maintenance windows
-
Cloud Run Optimization:
- Pay-per-request pricing
- Automatic scaling to zero
- CPU allocation optimization
-
Cloud Storage Optimization:
- Lifecycle policies (365-day deletion for uploads)
- Multi-regional storage for availability
- Reduced log sampling in production
-
Cloud CDN Optimization:
- Optimized caching policies
- Reduced origin requests
- Efficient cache invalidation
Cost Monitoring
AWS Cost Monitoring
# View current costs
aws ce get-cost-and-usage \
--time-period Start=2024-01-01,End=2024-01-31 \
--granularity MONTHLY \
--metrics BlendedCost
# Set up billing alerts
aws budgets create-budget \
--account-id YOUR_ACCOUNT_ID \
--budget file://budget.json
GCP Cost Monitoring
# View current costs
gcloud billing accounts list
gcloud billing projects describe YOUR_PROJECT_ID
# Set up budget alerts
gcloud billing budgets create \
--billing-account=YOUR_BILLING_ACCOUNT \
--display-name="App Factory Budget" \
--budget-amount=1000USD
Estimated Monthly Costs
Development Environment
| Service | AWS | GCP | Notes |
|---|---|---|---|
| Database | $25-50 | $20-40 | Small instance |
| Compute | $10-30 | $5-20 | Low traffic |
| Storage | $5-15 | $5-15 | Minimal data |
| CDN | $5-10 | $5-10 | Development traffic |
| Total | $45-105 | $35-85 | Per environment |
Production Environment
| Service | AWS | GCP | Notes |
|---|---|---|---|
| Database | $100-300 | $80-250 | Production instance |
| Compute | $50-200 | $30-150 | Moderate traffic |
| Storage | $20-100 | $20-100 | Production data |
| CDN | $20-100 | $20-100 | Global traffic |
| Total | $190-700 | $150-600 | Per environment |
π Monitoring & Troubleshooting
Health Checks
AWS Health Checks
# Check RDS status
aws rds describe-db-instances \
--db-instance-identifier focus-ai-production-database
# Check Lambda function status
aws lambda get-function \
--function-name focus-ai-production-api
# Check S3 bucket status
aws s3api head-bucket \
--bucket focus-ai-production-assets
GCP Health Checks
# Check Cloud SQL status
gcloud sql instances describe focus-ai-prod-db
# Check Cloud Run status
gcloud run services describe focus-ai-prod-api \
--region=us-central1
# Check Cloud Storage status
gsutil ls -b gs://focus-ai-prod-assets
Common Issues & Solutions
Database Connection Issues
Symptoms:
- Connection timeouts
- Authentication failures
- SSL certificate errors
Solutions:
# AWS RDS
# Check security groups
aws ec2 describe-security-groups \
--group-ids sg-xxxxxxxxx
# Test connection
psql "postgresql://username:password@endpoint:5432/database?sslmode=require"
# GCP Cloud SQL
# Check authorized networks
gcloud sql instances describe INSTANCE_NAME
# Test connection with Cloud SQL Proxy
cloud_sql_proxy -instances=PROJECT:REGION:INSTANCE=tcp:5432
Storage Access Issues
Symptoms:
- 403 Forbidden errors
- CORS issues
- Upload failures
Solutions:
# AWS S3
# Check bucket policy
aws s3api get-bucket-policy --bucket BUCKET_NAME
# Test upload
aws s3 cp test.txt s3://BUCKET_NAME/test.txt
# GCP Cloud Storage
# Check IAM permissions
gcloud projects get-iam-policy PROJECT_ID
# Test upload
gsutil cp test.txt gs://BUCKET_NAME/test.txt
API Gateway/Load Balancer Issues
Symptoms:
- 502/503 errors
- High latency
- CORS failures
Solutions:
# AWS API Gateway
# Check API status
aws apigateway get-rest-apis
# View logs
aws logs describe-log-groups \
--log-group-name-prefix /aws/apigateway/
# GCP Load Balancer
# Check backend health
gcloud compute backend-services get-health BACKEND_SERVICE
# View logs
gcloud logging read "resource.type=http_load_balancer"
π Runbooks
Daily Operations
Morning Health Check
#!/bin/bash
# daily-health-check.sh
echo "π
Daily Infrastructure Health Check"
echo "=================================="
# Check database connectivity
echo "π Database Status:"
pulumi stack output databaseUrl | xargs pg_isready -d
# Check API endpoints
echo "π API Status:"
curl -f $(pulumi stack output apiUrl)/health
# Check CDN status
echo "π CDN Status:"
curl -I $(pulumi stack output cdnUrl)
# Check storage buckets
echo "πΎ Storage Status:"
aws s3 ls $(pulumi stack output assetsBucketName) || gsutil ls gs://$(pulumi stack output assetsBucketName)
echo "β
Health check complete"
Weekly Maintenance
#!/bin/bash
# weekly-maintenance.sh
echo "π§ Weekly Infrastructure Maintenance"
echo "==================================="
# Update Pulumi dependencies
echo "π¦ Updating Pulumi dependencies..."
cd infra/pulumi-aws && pnpm update
cd ../pulumi-gcp && pnpm update
# Check for security updates
echo "π Checking for security updates..."
pulumi preview --diff
# Backup verification
echo "πΎ Verifying backups..."
# Add backup verification logic here
# Cost analysis
echo "π° Cost analysis..."
# Add cost analysis logic here
echo "β
Weekly maintenance complete"
Incident Response
Database Outage Response
-
Immediate Actions:
# Check database status pulumi stack output databaseUrl | xargs pg_isready -d # Check recent logs aws logs tail /aws/rds/instance/focus-ai-production-database/postgresql # OR gcloud logging read "resource.type=cloud_sql_database" --limit=50 -
Escalation Steps:
- Enable read replicas if available
- Contact cloud provider support
- Implement database failover procedures
API Service Outage
-
Immediate Actions:
# Check API health curl -f $(pulumi stack output apiUrl)/health # Check function logs aws logs tail /aws/lambda/focus-ai-production-api # OR gcloud logging read "resource.type=cloud_run_revision" --limit=50 -
Recovery Steps:
- Restart services if needed
- Scale up resources temporarily
- Implement circuit breaker patterns
Disaster Recovery
Complete Stack Recovery
#!/bin/bash
# disaster-recovery.sh
echo "π¨ Disaster Recovery Procedure"
echo "=============================="
# 1. Assess damage
echo "π Assessing infrastructure state..."
pulumi refresh
# 2. Restore from backups
echo "πΎ Restoring from backups..."
# Database restore logic
# Storage restore logic
# 3. Redeploy infrastructure
echo "π Redeploying infrastructure..."
pulumi up --yes
# 4. Verify services
echo "β
Verifying services..."
./daily-health-check.sh
echo "π Disaster recovery complete"
π Security Best Practices
Network Security
-
VPC Configuration:
- Private subnets for databases
- Public subnets only for load balancers
- Network ACLs and security groups
-
Database Security:
- Encryption at rest and in transit
- Regular security patches
- Restricted network access
-
Storage Security:
- Bucket policies and IAM roles
- Encryption for sensitive data
- Access logging enabled
Access Control
-
IAM Best Practices:
- Principle of least privilege
- Regular access reviews
- Multi-factor authentication
-
Service Accounts:
- Dedicated accounts per service
- Key rotation policies
- Audit logging enabled
Monitoring & Alerting
-
Security Monitoring:
- Failed authentication attempts
- Unusual access patterns
- Resource configuration changes
-
Compliance:
- Regular security assessments
- Vulnerability scanning
- Compliance reporting
π Infrastructure Teardown
Safe Teardown Procedure
β οΈ WARNING: This will permanently delete all resources including databases and storage. Ensure you have backups!
Pre-Teardown Checklist
# 1. Backup critical data
echo "πΎ Creating final backups..."
# Database backup
pg_dump $(pulumi stack output databaseUrl) > final-backup-$(date +%Y%m%d).sql
# Storage backup
aws s3 sync s3://$(pulumi stack output assetsBucketName) ./assets-backup/
aws s3 sync s3://$(pulumi stack output uploadsBucketName) ./uploads-backup/
# OR
gsutil -m rsync -r gs://$(pulumi stack output assetsBucketName) ./assets-backup/
gsutil -m rsync -r gs://$(pulumi stack output uploadsBucketName) ./uploads-backup/
# 2. Export configuration
pulumi stack export > stack-config-$(date +%Y%m%d).json
# 3. Document current state
pulumi stack output --json > stack-outputs-$(date +%Y%m%d).json
Teardown Commands
# Preview destruction (RECOMMENDED)
pulumi destroy --preview
# Confirm and destroy
pulumi destroy --yes
# Remove stack
pulumi stack rm $(pulumi stack --show-name)
Post-Teardown Cleanup
# Clean up local state
rm -rf .pulumi/
rm -rf node_modules/
# Verify resources are deleted
# AWS
aws resourcegroupstaggingapi get-resources \
--tag-filters Key=app:name,Values=focus-ai
# GCP
gcloud asset search-all-resources \
--query="labels.app_name:focus-ai"
Emergency Teardown
For immediate resource deletion (use with extreme caution):
#!/bin/bash
# emergency-teardown.sh
echo "π¨ EMERGENCY TEARDOWN - NO BACKUPS WILL BE CREATED"
echo "=================================================="
read -p "Type 'DESTROY EVERYTHING' to confirm: " confirmation
if [ "$confirmation" != "DESTROY EVERYTHING" ]; then
echo "β Teardown cancelled"
exit 1
fi
# Disable deletion protection
pulumi config set app:environment development
pulumi up --yes
# Destroy everything
pulumi destroy --yes --skip-preview
# Clean up
pulumi stack rm $(pulumi stack --show-name) --yes
echo "π₯ Emergency teardown complete"
π Support & Resources
- Pulumi Documentation: https://www.pulumi.com/docs/
- AWS Documentation: https://docs.aws.amazon.com/
- GCP Documentation: https://cloud.google.com/docs/
- Internal Runbooks: See
docs/runbooks/directory - Emergency Contacts: See
docs/emergency-contacts.md
Last updated: $(date) Version: 1.0.0