Cloud Run Migration Strategy (Optimized) โ
Refactor: Server-Side Configs (Cloud Run vs Cloud Functions & Client-to-Server Migrations)
Last Updated: 2026-02-09 Status: Planning Related Issues: #224 (OSM Import Security), #394 (Venue Refresh Abuse)
๐ฏ Executive Summary โ
This document outlines an optimized migration strategy from Cloud Functions to Cloud Run for services that require:
- Long-running operations (>1 minute)
- Batch processing (100+ items)
- WebSocket/persistent connections
- Multi-step workflows
- Native/container dependencies
- Complex state management
Key Optimization: Consolidate related services to reduce operational overhead and improve code reuse.
๐ High Priority โ Cloud Run Services โ
1. Venue API (Combined Import + Refresh) ๐ด CRITICAL โ
Current State:
- OSM import: CLI script with client-side triggers
- Venue refresh: Client-side triggers with public write access (
firestore.rules:394) - No Google Places integration yet
Why Combined Service:
- Shared Domain Logic: Both handle venue data validation, deduplication, normalization
- Unified Rate Limiting: Reusable infrastructure across OSM, Nominatim, Google Places
- API Source Flexibility: Adding Google Places becomes a new route, not a new service
- Operational Efficiency: Single deployment, shared connection pools, lower costs
- Future-Proof: Easy to add more sources (Yelp, Foursquare), centralized quality scoring
Recommended Architecture:
Cloud Run Service: venue-api
// ===== Import Operations =====
POST /venues/import/osm
Body: { lat, lng, radius, limit, filters }
Returns: { jobId, status, estimatedTime }
Description: Import venues from OSM Overpass API
POST /venues/import/google
Body: { lat, lng, radius, types, keyword }
Returns: { jobId, status, estimatedTime }
Description: Import venues from Google Places API (future)
POST /venues/import/hybrid
Body: { lat, lng, radius, strategy: "osm-primary" | "google-primary" }
Returns: { jobId, status, estimatedTime }
Description: Combined OSM + Google enrichment with deduplication
GET /venues/import/jobs/:jobId
Returns: { status, progress, venuesProcessed, errors, results }
Description: Check import job status and results
// ===== Enrichment & Refresh =====
POST /venues/:venueId/enrich
Body: { fields: ["photos", "hours", "contact", "rating"] }
Returns: { venue, enrichedFields, source }
Description: Enrich single venue with external API data
POST /venues/refresh/batch
Body: { venueIds: [...], priority: "high" | "normal" }
Returns: { jobId, status, count }
Description: Batch refresh venue metadata
POST /venues/refresh/scheduled
Headers: X-CloudScheduler-JobName
Returns: { processed, updated, errors }
Description: Cloud Scheduler endpoint for automated refresh
// ===== Utilities =====
POST /venues/consolidate
Body: { region: { lat, lng, radius }, strategy: "strict" | "fuzzy" }
Returns: { duplicatesFound, merged, deleted }
Description: Deduplicate venues in region
POST /venues/geocode
Body: { address } or { lat, lng }
Returns: { address, coordinates, confidence }
Description: Forward or reverse geocoding via Nominatim
GET /venues/:venueId/metadata
Returns: { venue, sources, lastUpdated, quality }
Description: Get venue details with provenance info
// ===== Admin Operations =====
POST /venues/cleanup/orphaned
Returns: { deleted, errors }
Description: Remove venues with no activity for 90+ days
POST /venues/validate/all
Returns: { total, invalid, fixed, errors }
Description: Data quality check and auto-fix
POST /venues/rebuild-index
Returns: { indexed, errors }
Description: Rebuild search/geo indexesInternal Structure:
venue-api/
โโโ src/
โ โโโ index.js # Express app entry point
โ โโโ routes/
โ โ โโโ import.routes.js # OSM, Google, hybrid imports
โ โ โโโ refresh.routes.js # Enrichment & scheduled tasks
โ โ โโโ utility.routes.js # Geocoding, consolidation
โ โ โโโ admin.routes.js # Cleanup, validation
โ โโโ services/
โ โ โโโ osm.service.js # OSM Overpass API client
โ โ โโโ google.service.js # Google Places API client
โ โ โโโ nominatim.service.js # Geocoding service
โ โ โโโ venue.service.js # Core venue logic
โ โ โโโ enrichment.service.js # Data enrichment
โ โโโ queue/
โ โ โโโ job-manager.js # Queue handling (Cloud Tasks)
โ โ โโโ rate-limiter.js # Global rate limiting
โ โโโ middleware/
โ โ โโโ auth.js # Firebase Admin validation
โ โ โโโ rate-limit.js # Per-source rate limits
โ โ โโโ error-handler.js # Unified error handling
โ โโโ utils/
โ โ โโโ deduplication.js # Venue matching algorithms
โ โ โโโ validation.js # Data validation
โ โ โโโ cache.js # In-memory caching
โโโ Dockerfile
โโโ package.json
โโโ cloudbuild.yamlBenefits:
- Security: Server-side validation prevents client abuse (fixes Issue #224)
- Rate Limiting: Global throttling for OSM (5 req/sec), Nominatim (1 req/sec), Google Places
- Batch Processing: Import 1000+ venues in single job (vs Cloud Functions 9-min timeout)
- Deduplication: Automatic merging of OSM + Google Places data
- Scheduled Jobs: Cloud Scheduler โ nightly venue refresh for high-traffic locations
- Audit Trail: Full logging of all import operations
- Cost Efficiency: Single service vs two separate deployments
Migration Priority: ๐ด Phase 1 (Weeks 1-2) - Addresses critical security issues
2. Admin API (Consolidated) ๐ก HIGH PRIORITY โ
Current State:
- 10+ separate Cloud Functions for admin operations
- Each function has separate auth, logging, error handling
- No batch operations support
- Difficult local development
Why Consolidated Service:
- Shared Middleware: Auth, validation, logging, error handling
- Unified API: Single endpoint for admin portal
- Batch Operations: Process 100+ users/venues in single request
- Better DX: Local development with
docker run - Reduced Cold Starts: 1 warm service vs 10+ separate functions
- Complex Workflows: Multi-step admin actions in single transaction
Recommended Architecture:
Cloud Run Service: admin-api
// ===== User Management =====
// Migrate existing Cloud Functions
GET /admin/users/:uid
Description: Get user by ID (replaces getUserByEmail function)
POST /admin/users/:uid/role
Body: { role: "admin" | "merchant" | "user" }
Description: Set user role (replaces setUserRole function)
POST /admin/users/:uid/ban
Body: { reason, duration }
Description: Ban user (replaces banUser function)
DELETE /admin/users/:uid/ban
Description: Unban user (replaces unbanUser function)
// New batch operations
POST /admin/users/bulk-import
Body: { users: [...], sendInvites: boolean }
Returns: { jobId, status, count }
Description: Import users from CSV
POST /admin/users/bulk-ban
Body: { userIds: [...], reason }
Returns: { banned, errors }
Description: Batch ban users
POST /admin/users/bulk-role
Body: { userIds: [...], role }
Returns: { updated, errors }
Description: Batch update user roles
GET /admin/users/search
Query: ?q=email@example.com&role=admin&limit=50
Returns: { users, total, hasMore }
Description: Search users with filters
// ===== Venue Management =====
POST /admin/venues/bulk-update
Body: { venueIds: [...], updates: {...} }
Returns: { updated, errors }
Description: Batch update venue metadata
POST /admin/venues/cleanup
Body: { criteria: "inactive" | "duplicate" | "invalid" }
Returns: { deleted, merged, errors }
Description: Cleanup venues by criteria
GET /admin/venues/audit
Returns: { venues, issues, recommendations }
Description: Venue data quality audit
// ===== Content Moderation =====
GET /admin/moderation/queue
Query: ?type=profile|lantern|chat&status=pending
Returns: { items, total }
Description: Get moderation queue
POST /admin/moderation/:itemId/approve
Returns: { status }
Description: Approve content
POST /admin/moderation/:itemId/reject
Body: { reason, action: "hide" | "delete" | "ban-user" }
Returns: { status }
Description: Reject content
// ===== Analytics & Reports =====
POST /admin/reports/generate
Body: { type: "engagement" | "revenue" | "venues", period: "daily" | "weekly" | "monthly" }
Returns: { jobId, status }
Description: Generate report job
GET /admin/reports/:jobId
Returns: { status, downloadUrl, data }
Description: Get report results
GET /admin/analytics/dashboard
Returns: { metrics: {...}, trends: {...} }
Description: Real-time dashboard metrics
POST /admin/analytics/export
Body: { collection, filters, format: "csv" | "json" }
Returns: { downloadUrl }
Description: Export data
// ===== System Operations =====
POST /admin/system/cache/clear
Body: { scope: "all" | "venues" | "users" }
Returns: { cleared }
Description: Clear system caches
GET /admin/system/health
Returns: { status, services, errors }
Description: System health check
POST /admin/system/maintenance
Body: { enabled: boolean, message }
Returns: { status }
Description: Toggle maintenance modeInternal Structure:
admin-api/
โโโ src/
โ โโโ index.js # Express app
โ โโโ routes/
โ โ โโโ users.routes.js # User management
โ โ โโโ venues.routes.js # Venue management
โ โ โโโ moderation.routes.js # Content moderation
โ โ โโโ reports.routes.js # Analytics & reports
โ โ โโโ system.routes.js # System operations
โ โโโ services/
โ โ โโโ user.service.js # User operations
โ โ โโโ venue.service.js # Venue operations
โ โ โโโ report.service.js # Report generation
โ โ โโโ analytics.service.js # Metrics aggregation
โ โโโ middleware/
โ โ โโโ admin-auth.js # Admin-only auth
โ โ โโโ rate-limit.js # Rate limiting
โ โ โโโ audit-log.js # Action logging
โ โ โโโ error-handler.js # Error handling
โ โโโ utils/
โ โ โโโ csv-parser.js # CSV import/export
โ โ โโโ pdf-generator.js # PDF reports
โโโ Dockerfile
โโโ package.jsonBenefits:
- Unified Auth: Single admin authentication middleware
- Batch Operations: Process 100+ records in one request
- Better Logging: Centralized audit trail of all admin actions
- Easier Development: Run entire admin backend locally
- Complex Workflows: Multi-step operations in single transaction
- Reduced Costs: 1 service vs 10+ Cloud Functions
Migration Priority: ๐ก Phase 2 (Weeks 3-4) - Improve admin operations
3. Analytics Service ๐ก HIGH PRIORITY โ
Current State:
- No analytics infrastructure
- No merchant-facing metrics
- No cost tracking or usage monitoring
- Manual reporting and data exports
Why Cloud Run:
- Data Aggregation: Complex Firestore queries across multiple collections
- Large Data Processing: Aggregate 100k+ events for reports
- Scheduled Jobs: Daily/weekly/monthly report generation
- Memory Intensive: Process 10k+ records for trend analysis
- Export Generation: CSV/PDF reports can take minutes
- Real-time Metrics: In-memory caching for dashboard
- Cost Tracking: Parse Firebase/GCP billing data
Why High Priority:
- Product Feature: Merchants need analytics to see venue performance
- Business Intelligence: Track engagement, conversion, revenue
- Cost Management: Monitor Firebase/API costs before they spike
- Growth Metrics: Understand user behavior and retention
Recommended Architecture:
Cloud Run Service: analytics-api
// ===== Merchant Analytics =====
GET /analytics/merchant/dashboard
Query: ?merchantId={id}&period=7d|30d|90d
Returns: {
venueStats: { views, lanterns, waves, conversion },
offerStats: { impressions, claims, revenue },
trends: { daily, weekly },
topVenues: [...]
}
Description: Merchant dashboard metrics
GET /analytics/merchant/venue/:venueId
Query: ?period=7d|30d|90d
Returns: {
activity: { lanterns, waves, uniqueUsers },
peakHours: [...],
demographics: { ageGroups, genderSplit },
trends: { daily, hourly }
}
Description: Individual venue performance
GET /analytics/merchant/offer/:offerId
Returns: {
impressions: number,
claims: number,
redemptions: number,
conversionRate: number,
revenue: number,
timeline: [...]
}
Description: Offer engagement metrics
POST /analytics/merchant/export
Body: {
merchantId,
type: "venue-performance" | "offer-analytics" | "revenue",
period: "7d" | "30d" | "90d",
format: "csv" | "pdf"
}
Returns: { jobId, status }
Description: Generate export job for merchant data
// ===== Admin/Internal Analytics =====
GET /analytics/admin/overview
Returns: {
users: { total, active, new, churn },
venues: { total, active, topPerforming },
engagement: { lanterns, waves, chats, matches },
revenue: { total, byMerchant, byVenue },
costs: { firebase, cloudRun, apis }
}
Description: High-level platform metrics
GET /analytics/admin/engagement
Query: ?period=7d|30d|90d
Returns: {
dau: number, // Daily active users
wau: number, // Weekly active users
mau: number, // Monthly active users
retention: { day1, day7, day30 },
sessionDuration: { avg, median, p95 },
actions: { lanterns, waves, chats, matches }
}
Description: User engagement metrics
GET /analytics/admin/costs
Query: ?period=7d|30d|90d
Returns: {
total: number,
breakdown: {
firestore: { reads, writes, deletes, cost },
storage: { gb, cost },
cloudRun: { requests, cpu, memory, cost },
apis: { osm, nominatim, googlePlaces, cost }
},
trends: [...],
projections: { monthly, annual }
}
Description: Cost tracking and projections
GET /analytics/admin/venues
Query: ?sortBy=activity|lanterns|revenue&limit=50
Returns: {
venues: [{
id, name, location,
stats: { lanterns, waves, uniqueUsers },
revenue: number,
merchantShare: number
}],
total: number
}
Description: Venue performance rankings
GET /analytics/admin/merchants
Returns: {
merchants: [{
id, name,
venues: number,
offers: number,
revenue: number,
engagement: { impressions, conversions }
}]
}
Description: Merchant performance
// ===== Real-time Metrics =====
GET /analytics/realtime/activity
Returns: {
activeLanterns: number,
activeWaves: number,
onlineUsers: number,
topVenues: [{ venueId, name, activeLanterns }],
recentMatches: number
}
Description: Real-time platform activity
GET /analytics/realtime/venue/:venueId
Returns: {
activeLanterns: number,
recentWaves: number,
onlineUsers: number
}
Description: Real-time venue activity
// ===== Reports & Exports =====
POST /analytics/reports/generate
Body: {
type: "engagement" | "revenue" | "costs" | "venues" | "merchants",
period: "daily" | "weekly" | "monthly" | "custom",
startDate: "2026-01-01",
endDate: "2026-01-31",
format: "csv" | "pdf" | "json",
recipients: ["email@example.com"] // Optional email delivery
}
Returns: { jobId, status, estimatedTime }
Description: Generate scheduled report
GET /analytics/reports/:jobId
Returns: {
status: "queued" | "processing" | "completed" | "failed",
progress: number,
downloadUrl: string,
expiresAt: timestamp
}
Description: Get report job status and download link
GET /analytics/reports/scheduled
Returns: { reports: [{...}] }
Description: List scheduled/recurring reports
// ===== Data Export =====
POST /analytics/export/raw
Body: {
collection: "lanterns" | "waves" | "chats" | "venues",
filters: {...},
dateRange: { start, end },
format: "csv" | "json" | "parquet"
}
Returns: { jobId, status }
Description: Export raw data (admin-only)
// ===== Usage Tracking (Internal) =====
POST /analytics/track/api-usage
Body: { service: "osm" | "nominatim" | "googlePlaces", requests: number }
Returns: { recorded: true }
Description: Track external API usage for cost monitoring
POST /analytics/track/feature-usage
Body: { feature: string, userId: string, metadata: {...} }
Returns: { recorded: true }
Description: Track feature usage for product analyticsInternal Structure:
analytics-api/
โโโ src/
โ โโโ index.js # Express app entry point
โ โโโ routes/
โ โ โโโ merchant.routes.js # Merchant analytics
โ โ โโโ admin.routes.js # Admin/internal analytics
โ โ โโโ realtime.routes.js # Real-time metrics
โ โ โโโ reports.routes.js # Report generation
โ โ โโโ export.routes.js # Data exports
โ โโโ services/
โ โ โโโ aggregation.service.js # Data aggregation from Firestore
โ โ โโโ metrics.service.js # Metrics calculation
โ โ โโโ cost-tracking.service.js # Parse billing data
โ โ โโโ report-generator.service.js # PDF/CSV generation
โ โ โโโ realtime.service.js # Real-time metrics (cached)
โ โโโ jobs/
โ โ โโโ daily-aggregation.js # Daily metrics aggregation
โ โ โโโ cost-sync.js # Sync billing data
โ โ โโโ report-scheduler.js # Scheduled reports
โ โโโ middleware/
โ โ โโโ auth.js # Role-based auth (merchant vs admin)
โ โ โโโ rate-limit.js # Rate limiting
โ โ โโโ cache.js # Redis/in-memory cache
โ โโโ utils/
โ โ โโโ csv-generator.js # CSV export
โ โ โโโ pdf-generator.js # PDF reports (puppeteer)
โ โ โโโ date-utils.js # Period calculations
โ โ โโโ aggregators.js # Data aggregation helpers
โโโ Dockerfile
โโโ package.json
โโโ cloudbuild.yamlData Storage Strategy (Hybrid with BigQuery):
Recommended: Three-Tier Architecture
// Tier 1: Real-time Metrics (Redis Cache)
// - Active lanterns/waves right now
// - Online users count
// - Top venues by current activity
// - TTL: 60 seconds
// - Read latency: <10ms
// Tier 2: Recent Analytics (Firestore)
// - Pre-aggregated daily/weekly/monthly metrics
// - Last 90 days of data
// - Fast reads for merchant dashboards
// - Read latency: 50-200ms
analytics/
daily/
{venueId}_{date} โ { lanterns, waves, uniqueUsers, revenue, ... }
weekly/
{venueId}_{weekStart} โ { ... }
monthly/
{venueId}_{monthStart} โ { ... }
// Tier 3: Historical Analytics (BigQuery)
// - All historical data (unlimited retention)
// - Complex multi-table joins and aggregations
// - Ad-hoc queries and deep analytics
// - Cost tracking and projections
// - Machine learning / predictive analytics (future)
// - Read latency: 1-10 seconds (acceptable for reports)BigQuery Architecture:
# BigQuery Dataset Structure
lantern_analytics:
# Raw event tables (streamed from Firestore)
raw_lanterns: # All lantern events
- lantern_id
- user_id
- venue_id
- created_at
- ended_at
- status
- metadata (JSON)
raw_waves: # All wave events
- wave_id
- sender_id
- receiver_id
- venue_id
- created_at
- status
- response
raw_chats: # All chat events
- chat_id
- connection_id
- sender_id
- created_at
- message_length
raw_offers: # Offer impressions/claims
- offer_id
- merchant_id
- venue_id
- user_id
- action (impression/claim/redemption)
- created_at
- metadata (JSON)
# Pre-aggregated tables (for fast queries)
daily_venue_stats: # Daily rollups
- date
- venue_id
- lantern_count
- wave_count
- unique_users
- revenue
- avg_session_duration
daily_merchant_stats: # Daily merchant metrics
- date
- merchant_id
- impression_count
- claim_count
- revenue
- conversion_rate
# Cost tracking tables
daily_costs: # GCP billing data
- date
- service (firestore/storage/cloudrun/apis)
- operation_type
- cost_usd
- quantity (reads/writes/requests)
# User cohort analysis
user_cohorts: # Retention analysis
- cohort_date
- users_count
- day_1_retention
- day_7_retention
- day_30_retentionData Flow Architecture:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Real-time Events (Firestore) โ
โ โข User creates lantern โ writes to Firestore โ
โ โข Triggers Cloud Function โ streams to BigQuery โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ BigQuery Streaming Insert (Cloud Function) โ
โ โข Low latency (~1 second) โ
โ โข Automatic schema detection โ
โ โข Cost: $0.05 per GB streamed โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ BigQuery Tables (Raw Events) โ
โ โข Partitioned by date (cost optimization) โ
โ โข Clustered by venue_id, user_id โ
โ โข Retention: Unlimited (storage $0.02/GB/month) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Scheduled Queries (BigQuery โ Aggregated Tables) โ
โ โข Run daily at 2 AM UTC โ
โ โข Create daily_venue_stats, daily_merchant_stats โ
โ โข Cost: $5 per TB scanned (optimized with partitions) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Analytics API (Cloud Run) โ
โ โข Recent data (90 days): Read from Firestore โ
โ โข Historical data (>90 days): Query BigQuery โ
โ โข Reports: BigQuery โ CSV/PDF export โ
โ โข Cache results in Redis (5-minute TTL) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโImplementation Approach:
Phase 1: Initial Setup (Week 1)
# 1. Create BigQuery dataset
bq mk --dataset --location=us-central1 lantern_analytics
# 2. Create raw event tables with partitioning
bq mk --table \
--time_partitioning_field created_at \
--clustering_fields venue_id,user_id \
lantern_analytics.raw_lanterns \
schema_lanterns.json
# 3. Deploy Cloud Functions for streaming
# Trigger: onCreate("lanterns/{lanternId}")
# Action: Stream insert to BigQueryPhase 2: Streaming Pipeline (Week 2)
// Cloud Function: streamLanternToBigQuery
exports.streamLanternToBigQuery = functions.firestore
.document('lanterns/{lanternId}')
.onCreate(async (snap, context) => {
const bigquery = new BigQuery();
const dataset = bigquery.dataset('lantern_analytics');
const table = dataset.table('raw_lanterns');
const row = {
lantern_id: context.params.lanternId,
user_id: snap.data().userId,
venue_id: snap.data().venueId,
created_at: snap.data().createdAt.toDate().toISOString(),
status: snap.data().status,
metadata: JSON.stringify(snap.data())
};
await table.insert([row]);
});Phase 3: Scheduled Aggregations (Week 3)
-- Scheduled query: daily_venue_stats (runs daily at 2 AM)
CREATE OR REPLACE TABLE lantern_analytics.daily_venue_stats
PARTITION BY date
AS
SELECT
DATE(created_at) as date,
venue_id,
COUNT(DISTINCT lantern_id) as lantern_count,
COUNT(DISTINCT user_id) as unique_users,
AVG(TIMESTAMP_DIFF(ended_at, created_at, MINUTE)) as avg_duration_minutes
FROM lantern_analytics.raw_lanterns
WHERE DATE(created_at) = CURRENT_DATE() - 1 -- Yesterday only
GROUP BY date, venue_id;Phase 4: Analytics API Integration (Week 4)
// analytics-api service
async function getVenueAnalytics(venueId, period) {
const startDate = calculateStartDate(period);
// Recent data: Use Firestore (fast)
if (startDate > Date.now() - 90 * 24 * 60 * 60 * 1000) {
return getFromFirestore(venueId, startDate);
}
// Historical data: Use BigQuery
const query = `
SELECT
date,
lantern_count,
unique_users,
avg_duration_minutes
FROM lantern_analytics.daily_venue_stats
WHERE venue_id = @venueId
AND date >= @startDate
ORDER BY date DESC
`;
const [rows] = await bigquery.query({
query,
params: { venueId, startDate }
});
return rows;
}Benefits:
- Merchant Value: Data-driven insights to improve venue performance
- Cost Control: Early warning for runaway Firebase/API costs
- Product Intelligence: Understand feature usage and user behavior
- Revenue Optimization: Track offer performance and conversion
- Scheduled Reports: Automated daily/weekly/monthly reports
- Real-time Dashboards: Fast metrics without hitting Firestore constantly
- Unlimited Historical Data: BigQuery retains all data forever at low cost
- Complex Analytics: Multi-table joins, cohort analysis, funnel tracking
- Cost-Effective at Scale: Firestore reads cost $0.36/million; BigQuery $5/TB scanned
- ML/AI Ready: BigQuery ML for predictive analytics and recommendations
BigQuery Cost Analysis (10,000 MAU):
Streaming Inserts:
- 10k users ร 5 events/day ร 30 days = 1.5M events/month
- 1.5M events ร 1KB avg = 1.5GB/month
- Cost: 1.5GB ร $0.05/GB = $0.08/month
Storage:
- Year 1: 18GB ร $0.02/GB/month = $0.36/month
- Year 2: 36GB ร $0.02/GB/month = $0.72/month
- Year 3: 54GB ร $0.02/GB/month = $1.08/month
Query Costs (with partitioning):
- 100 merchant dashboard queries/day ร 30 days = 3,000 queries/month
- Avg 10MB scanned per query (90-day window, partitioned)
- 3,000 ร 10MB = 30GB = 0.03TB
- Cost: 0.03TB ร $5/TB = $0.15/month
Scheduled Aggregations:
- Daily aggregation: 500MB scanned/day ร 30 = 15GB/month
- Cost: 0.015TB ร $5/TB = $0.08/month
Total BigQuery Cost: ~$0.70/month (Year 1)
Compare to Firestore-only approach:
- 3,000 queries ร 50 document reads = 150k reads/month
- 150k ร $0.36/million = $0.05/month (limited to 90 days)
BigQuery enables unlimited historical analysis for <$1/month!When to Use Each Storage Tier:
| Use Case | Storage | Read Latency | Cost | Best For |
|---|---|---|---|---|
| Real-time metrics | Redis | <10ms | Free (in-memory) | Live dashboards, current activity |
| Recent analytics (90d) | Firestore | 50-200ms | $0.36/million reads | Merchant dashboards, daily reports |
| Historical analytics | BigQuery | 1-10s | $5/TB scanned | Trend analysis, cohort reports, ML |
| Cost tracking | BigQuery | 1-10s | $5/TB scanned | GCP billing, projections, alerts |
Migration Priority: ๐ก Phase 2B (Weeks 4-5) - High priority for product/business
4. Image Processing Service ๐ข MEDIUM PRIORITY โ
Current State:
- Direct Firebase Storage uploads
- Client-side validation only
- No image optimization or resizing
Why Cloud Run:
- Multi-step Processing: Resize, compress, watermark, format conversion
- Native Libraries: Sharp, ImageMagick for advanced operations
- Large Files: Handle 5MB+ uploads without timeout
- Multiple Outputs: Generate thumbnails, previews, full-size in one request
- Security: Malware scanning, EXIF stripping, content moderation
Recommended Architecture:
Cloud Run Service: image-service
POST /images/upload
Body: multipart/form-data image file
Returns: { imageId, urls: { thumbnail, preview, full }, metadata }
Description: Upload and process image (auto-generates all sizes)
POST /images/:imageId/resize
Body: { width, height, fit: "cover" | "contain" | "fill" }
Returns: { url, dimensions }
Description: Generate custom size
POST /images/:imageId/compress
Body: { quality: 1-100, format: "webp" | "jpeg" }
Returns: { url, originalSize, compressedSize, savings }
Description: Compress image
GET /images/:imageId/metadata
Returns: { dimensions, format, size, exif, colors }
Description: Get image metadata
POST /images/:imageId/moderate
Returns: { safe, labels, nsfw_score }
Description: Content moderation check
DELETE /images/:imageId
Returns: { deleted }
Description: Delete image and all variantsBenefits:
- Unified Pipeline: All image processing in one place
- Consistent Quality: Automatic optimization and sizing
- Security: Malware scanning and EXIF stripping
- Multiple Formats: WebP for modern browsers, JPEG fallback
- Watermarking: For merchant offer photos
Migration Priority: ๐ข Phase 3 (Weeks 6-8) - If image uploads are high-volume
5. Real-time Updates Service (Future) ๐ต LOW PRIORITY โ
Current State:
- Firestore listeners for real-time updates
- Works well for current scale
When to Build: Only if you need:
- WebSocket connections for 10,000+ concurrent users
- Sub-100ms latency for live updates
- Presence system (online/offline, typing indicators)
- Live lantern count updates at high traffic venues
Recommended Architecture:
Cloud Run Service: realtime-service
WS /ws/connect
Description: Establish WebSocket connection
WS /ws/venue/:venueId
Description: Subscribe to venue updates
WS /ws/chat/:connectionId
Description: Subscribe to chat messages
HTTP /presence/status
Returns: { online, venueActivity }
Description: Get current presence statusMigration Priority: ๐ต Phase 4+ (Weeks 9+) - Only if Firestore listeners don't scale
6. Full-Text Search Service (Future) ๐ต LOW PRIORITY โ
Current State:
- Basic Firestore queries
- Limited search capabilities
When to Build: Only if you need:
- Fuzzy search ("cofee" โ "coffee")
- Multi-field search (name + address + tags)
- Real-time autocomplete
- Ranking and relevance scoring
Recommended Architecture:
Cloud Run Service: search-service
GET /search/venues?q=coffee+downtown&limit=20
Returns: { results, total, suggestions }
Description: Search venues with fuzzy matching
GET /search/users?q=alice
Returns: { results, total }
Description: Search users (admin-only)
POST /search/index/rebuild
Returns: { indexed, errors }
Description: Rebuild search index
WS /search/suggest
Description: Real-time autocompleteTechnology Options:
- MeiliSearch (easy setup, good DX)
- Typesense (fast, typo-tolerant)
- Elasticsearch (powerful, complex)
Migration Priority: ๐ต Phase 4+ (Weeks 9+) - Only if basic Firestore queries insufficient
๐ต Keep as Cloud Functions โ
Well-suited for Cloud Functions:
- Short-lived operations (<30 seconds)
- Event-driven triggers (onCreate, onUpdate, etc.)
- Simple HTTP endpoints
- Lightweight utilities
Functions to Keep:
// Auth Triggers
onCreate("users/{uid}") โ initializeUserProfile
onDelete("users/{uid}") โ cleanupUserData
// Firestore Triggers
onUpdate("lanterns/{lanternId}") โ updateVenueActivityCount
onDelete("venues/{venueId}") โ cleanupVenueLanterns
// Scheduled Functions (if simple)
schedule("every 24 hours") โ cleanupExpiredLanterns
schedule("every 1 hour") โ cleanupExpiredWaves
// Simple HTTP Endpoints
POST /logClientError โ Log client error to Cloud Logging
GET /health โ Service health check๐ Cloud Functions vs Cloud Run Comparison โ
| Feature | Cloud Functions | Cloud Run |
|---|---|---|
| Max Execution Time | 9 minutes | 60 minutes (configurable) |
| Cold Start | ~1-2 seconds | ~2-5 seconds |
| Concurrency | 1-1000 requests/instance | Configurable (โค1000) |
| WebSocket Support | โ No | โ Yes |
| Container Size | Limited | Up to 10GB |
| Memory | 256MB - 16GB | 128MB - 32GB |
| CPU | Always allocated | Can be CPU-only during request |
| Pricing Model | Invocation + GB-sec | Request + CPU-time + Memory |
| Best For | Event-driven, simple HTTP | APIs, batch jobs, long tasks |
| Local Development | Firebase Emulator | Docker |
| State Management | None (stateless) | In-memory caching supported |
| Native Dependencies | Limited | Full container support |
๐ฏ Optimized Migration Roadmap โ
Phase 1 โ Immediate (Weeks 1-2) ๐ด CRITICAL โ
Service: venue-api (combined import + refresh)
Why First:
- Fixes critical security issue (Issue #224 - OSM import abuse)
- Addresses
firestore.rules:394public write vulnerability - Unblocks Google Places API integration
- High impact, clear requirements
Deliverables:
- [ ] Cloud Run service deployed to dev
- [ ] OSM import endpoint with job queue
- [ ] Venue refresh endpoint with rate limiting
- [ ] Admin consolidation endpoint (dedupe)
- [ ] Firestore security rules updated (remove public write)
- [ ] Documentation and API reference
- [ ] Integration tests
Success Metrics:
- OSM imports no longer client-accessible
- Venue refresh rate limited to 1 req/sec
- Import jobs handle 1000+ venues without timeout
Phase 2 โ Short Term (Weeks 3-4) ๐ก HIGH PRIORITY โ
Service: admin-api (consolidated admin operations)
Why Second:
- Improves admin portal backend
- Enables batch operations for efficiency
- Better local development experience
- Reduces cold start latency
Deliverables:
- [ ] Cloud Run service deployed to dev
- [ ] Migrate existing Cloud Functions to Express routes:
getUserByEmailโGET /admin/users/:uidsetUserRoleโPOST /admin/users/:uid/rolebanUserโPOST /admin/users/:uid/banunbanUserโDELETE /admin/users/:uid/ban
- [ ] New batch operations:
- Bulk ban users
- Bulk import users
- Bulk venue updates
- [ ] Admin audit logging
- [ ] Admin portal updated to use new API
- [ ] Deprecation plan for old Cloud Functions
Success Metrics:
- Admin operations 50% faster (reduced cold starts)
- Batch operations process 100+ records in <10 seconds
- Local development setup documented
Phase 2B โ Analytics (Weeks 4-5) ๐ก HIGH PRIORITY โ
Service: analytics-api (merchant & internal analytics)
Why Critical:
- Merchant-facing feature (product requirement)
- Cost tracking and usage monitoring
- Business intelligence for growth
- Revenue optimization for merchants
Deliverables:
- [ ] Cloud Run service deployed to dev
- [ ] BigQuery setup:
- Create
lantern_analyticsdataset - Create partitioned raw event tables (lanterns, waves, chats, offers)
- Deploy Cloud Functions for streaming inserts
- Create scheduled queries for daily aggregations
- Set up cost tracking tables
- Create
- [ ] Merchant analytics endpoints:
- Dashboard metrics (venue/offer performance)
- Venue analytics (activity, demographics, trends)
- Offer analytics (impressions, claims, conversions)
- Export reports (CSV/PDF)
- [ ] Admin analytics endpoints:
- Platform overview (users, engagement, revenue)
- Cost tracking (Firebase, Cloud Run, APIs)
- Venue rankings
- Merchant performance
- [ ] Real-time metrics (cached):
- Active lanterns/waves
- Online users
- Top venues
- [ ] Hybrid query logic:
- Recent data (90 days): Read from Firestore
- Historical data (>90 days): Query BigQuery
- Automatic routing based on date range
- [ ] Scheduled jobs:
- Daily metrics aggregation (Firestore โ BigQuery)
- Weekly/monthly reports (BigQuery queries)
- Cost sync from GCP billing API
- [ ] Merchant portal integration:
- Analytics dashboard
- Export buttons
- Real-time venue stats
- [ ] Admin portal integration:
- Cost dashboard
- Platform metrics
- Merchant reports
Success Metrics:
- Merchants can view venue performance in <2 seconds
- Cost tracking updates daily
- Reports generate in <30 seconds for 90-day periods
- Real-time metrics cached and served in <100ms
Phase 3 โ Medium Term (Weeks 6-8) ๐ข MEDIUM PRIORITY โ
Service: image-service (if needed)
Why Third:
- Image processing only needed if uploads become high-volume
- Can start with direct Firebase Storage uploads
- Migrate when optimization/security becomes critical
Deliverables:
- [ ] Assess current image upload volume and requirements
- [ ] If needed: Deploy
image-servicewith Sharp/ImageMagick - [ ] Multi-size image generation (thumbnail, preview, full)
- [ ] Image compression and format optimization
- [ ] Content moderation integration
- [ ] Watermarking for merchant offers
Phase 4 โ Long Term (Weeks 9+) ๐ต LOW PRIORITY โ
Services (build only if needed):
realtime-service(if Firestore listeners don't scale)search-service(if Firestore queries insufficient)
Why Last:
- Current solutions may be sufficient for 1-2 years
- Build these only when you hit scaling issues
- Focus on product features instead
๐ฐ Cost Analysis โ
Cloud Functions (Current) โ
Estimate for 10,000 monthly active users:
- 10 admin functions ร $0.40/million invocations = $4/month
- 2GB-sec per invocation ร 50,000 invocations = $1.25/month
Total: ~$5.25/monthCloud Run (Proposed) โ
Estimate for 10,000 monthly active users:
venue-api:
- 100,000 requests/month
- 512MB memory, 1 vCPU
- Avg 2 seconds per request
- Cost: ~$15/month
admin-api:
- 10,000 requests/month
- 256MB memory, 1 vCPU
- Avg 500ms per request
- Cost: ~$3/month
analytics-api:
- 50,000 requests/month (merchant dashboards + admin)
- 512MB memory, 1 vCPU
- Avg 1 second per request (cached metrics)
- Scheduled jobs: 3 daily (aggregation, cost sync, reports)
- Cost: ~$8/month
Total: ~$26/monthCost Increase: ~$21/month (~5x)
Value Gained:
- Eliminates security vulnerabilities (Issue #224, #394)
- Enables Google Places integration
- Supports 1000+ venue imports
- Batch admin operations
- Merchant analytics (product feature, revenue driver)
- Cost tracking (prevents runaway Firebase bills)
- Business intelligence and growth metrics
- Better monitoring and logging
- Foundation for future growth
Break-even Analysis: As usage grows, Cloud Run becomes more cost-effective:
- At 100k users: Cloud Run is 50% cheaper
- At 1M users: Cloud Run is 70% cheaper
- Long-running operations impossible on Cloud Functions
๐๏ธ Implementation Guidelines โ
Cloud Run Best Practices โ
Service Configuration
yaml# cloudbuild.yaml example timeout: 3600s # 60 minutes memory: 1Gi cpu: 1 minInstances: 0 # Scale to zero when idle maxInstances: 10 concurrency: 80 # Requests per instanceEnvironment Variables
bashFIREBASE_PROJECT_ID=lantern-app-dev ENVIRONMENT=dev LOG_LEVEL=info RATE_LIMIT_OSM=5 # 5 req/sec RATE_LIMIT_NOMINATIM=1 # 1 req/secHealth Checks
javascriptapp.get('/health', (req, res) => { res.json({ status: 'ok', timestamp: Date.now() }); });Graceful Shutdown
javascriptprocess.on('SIGTERM', () => { console.log('SIGTERM received, closing server...'); server.close(() => { console.log('Server closed'); process.exit(0); }); });
Security โ
Authentication
- Use Firebase Admin SDK to verify ID tokens
- Implement role-based access control (RBAC)
- Admin endpoints require
adminrole
Rate Limiting
- Per-user rate limits (10 req/min for imports)
- Global rate limits for external APIs
- IP-based rate limiting for abuse prevention
Input Validation
- Validate all request bodies with Joi/Zod
- Sanitize user inputs
- Limit request body size (10MB max)
Monitoring โ
Cloud Logging
- Structured JSON logs
- Include request ID, user ID, action
- Log errors with stack traces
Cloud Monitoring
- Request latency (p50, p95, p99)
- Error rate
- Active instances
- Memory/CPU usage
Alerts
- Error rate >5% for 5 minutes
- Latency p99 >10 seconds
- Memory usage >80%
๐งช Testing Strategy โ
Local Development โ
# venue-api local testing
cd services/venue-api
docker build -t venue-api .
docker run -p 8080:8080 \
-e FIREBASE_PROJECT_ID=lantern-app-dev \
-e ENVIRONMENT=local \
venue-api
# Test endpoints
curl http://localhost:8080/health
curl -X POST http://localhost:8080/venues/import/osm \
-H "Authorization: Bearer $TOKEN" \
-d '{"lat":40.7128,"lng":-74.0060,"radius":1000}'Integration Tests โ
// venue-api integration test
describe('POST /venues/import/osm', () => {
it('should import venues from OSM', async () => {
const response = await request(app)
.post('/venues/import/osm')
.set('Authorization', `Bearer ${adminToken}`)
.send({ lat: 40.7128, lng: -74.0060, radius: 1000 })
.expect(200);
expect(response.body).toHaveProperty('jobId');
expect(response.body.status).toBe('queued');
});
it('should reject unauthorized users', async () => {
await request(app)
.post('/venues/import/osm')
.send({ lat: 40.7128, lng: -74.0060, radius: 1000 })
.expect(401);
});
});Load Testing โ
# Use Apache Bench for load testing
ab -n 1000 -c 10 -H "Authorization: Bearer $TOKEN" \
http://venue-api-dev.a.run.app/health
# Use k6 for complex scenarios
k6 run load-test.js๐ Migration Checklist โ
Pre-Migration โ
- [ ] Review current Cloud Functions usage and costs
- [ ] Audit API rate limits (OSM, Nominatim, Google Places)
- [ ] Set up Cloud Run development environment
- [ ] Create Dockerfiles for each service
- [ ] Set up Cloud Build pipelines
- [ ] Configure secret management (Secret Manager)
During Migration โ
- [ ] Deploy to dev environment first
- [ ] Run integration tests
- [ ] Monitor error rates and latency
- [ ] Gradual rollout (10% โ 50% โ 100%)
- [ ] Keep Cloud Functions as fallback
Post-Migration โ
- [ ] Monitor for 1 week before deprecating Cloud Functions
- [ ] Update documentation
- [ ] Update admin portal to use new APIs
- [ ] Remove old Cloud Functions
- [ ] Update Firestore security rules
- [ ] Create changelog entry
๐ Key Takeaways โ
โ Use Cloud Run When You Need: โ
- Batch processing (100+ items)
- Long-running tasks (>1 minute, up to 60 minutes)
- WebSockets or persistent connections
- Multi-step workflows with state
- Native dependencies (ImageMagick, etc.)
- Container-based applications
- High concurrency (1000+ req/instance)
โ Use Cloud Functions When You Need: โ
- Simple, single-purpose logic (<50 lines)
- Event-driven triggers (onCreate, onUpdate)
- Fast execution (<30 seconds)
- Minimal configuration
- Pay-per-invocation model for low traffic
๐ฏ Lantern-Specific Priorities: โ
- Phase 1:
venue-api(security fix, Google Places foundation) - Phase 2:
admin-api(better admin operations) - Phase 2B:
analytics-api(merchant feature + cost tracking) - Phase 3:
image-service(if needed) - Phase 4+: Real-time, search (only if scaling issues)
Most Important: Start with Phase 1 (venue-api) to address security issues, then Phase 2B (analytics-api) is critical for merchant value and cost control. Analytics is both a product feature (merchant-facing) and operational necessity (cost tracking).
๐ References โ
Document Owner: @cattreedev Last Review: 2026-02-09 Next Review: After Phase 1 completion