Introduction
Modern software delivery demands speed, reliability, and reproducibility. Traditional monolithic test and staging environments often become bottlenecks: they are expensive to maintain, prone to configuration drift, and can hide integration issues until the very last stages of a release pipeline.
Ephemeral environments—short‑lived, on‑demand instances of an application stack—offer a compelling alternative. By provisioning a fresh copy of the entire system for each pull request, feature branch, or even a single test case, teams gain:
- Isolation – no cross‑contamination from other developers’ changes.
- Consistency – the environment mirrors production configuration exactly.
- Scalability – environments spin up only when needed, keeping cloud spend under control.
- Speed – parallel execution of many environments accelerates feedback loops.
In this article we dive deep into building and managing ephemeral environments using three complementary technologies:
- CCR (Container‑based Compute Runtime) – a lightweight, serverless container execution platform (we’ll focus on Google Cloud Run, which embodies the CCR model).
- GitHub Actions (GHA) – the CI/CD engine that can orchestrate the entire lifecycle of an environment, from creation to teardown.
- SDKs (Software Development Kits) – language‑specific libraries (e.g., Google Cloud SDK, AWS CDK, Terraform) that let you codify infrastructure as code (IaC) and interact programmatically with CCR and GHA.
By the end of this guide you’ll be able to:
- Design a reusable architecture for on‑demand environments.
- Write GitHub Actions workflows that automatically spin up, test, and destroy CCR instances.
- Leverage SDKs to parameterize, secure, and monitor your environments.
- Apply best‑practice patterns for cost control, security, and observability.
Let’s get started.
1. The Ephemeral Environment Landscape
1.1 What Makes an Environment “Ephemeral”?
An ephemeral environment is:
| Characteristic | Description |
|---|---|
| Transient | Lives for a bounded period (minutes to hours). |
| Immutable | Once created, its configuration never changes; updates are achieved by destroying and recreating. |
| Isolated | No shared state with other environments (e.g., separate databases, queues). |
| Reproducible | Built from the same source code, Docker image, and IaC templates each time. |
| Automated | Provisioned, exercised, and torn down by code, not by manual ops. |
These traits collectively eliminate “it works on my machine” scenarios and bring the shift‑left testing philosophy to life.
1.2 Typical Use Cases
| Use Case | Why Ephemeral? |
|---|---|
| Pull‑request previews | Stakeholders can interact with a live UI that reflects the exact code under review. |
| Feature‑branch integration tests | Run end‑to‑end (E2E) suites against a full stack without affecting other branches. |
| Canary releases | Deploy a temporary copy of a microservice to validate performance under realistic load. |
| Security scanning | Spin up a sandboxed environment for dynamic analysis tools (e.g., OWASP ZAP). |
| Developer onboarding | Provide a ready‑made environment that mirrors production for new hires. |
2. Understanding CCR – Container‑Based Compute Runtime
2.1 Why CCR?
CCR platforms abstract away the underlying VM, networking, and scaling concerns. They accept a container image and automatically:
- Allocate compute resources (CPU, memory).
- Route HTTP traffic (or expose gRPC).
- Scale to zero when idle, eliminating idle cost.
Google Cloud Run is a canonical CCR: it runs any OCI‑compatible container, scales from zero to thousands of requests per second, and integrates tightly with IAM, Cloud Logging, and Cloud Monitoring.
2.2 Core CCR Concepts
| Concept | Explanation |
|---|---|
| Service | A logical unit representing a container image, runtime config, and traffic settings. |
| Revision | Immutable snapshot of a service’s configuration. New revisions are created on each deployment. |
| Concurrency | Number of requests a single container instance can handle concurrently (default 80). |
| CPU Allocation | CPU is allocated only while the container is processing a request unless “CPU always allocated” is enabled. |
| Traffic Splits | You can route a percentage of traffic to specific revisions, enabling canary testing. |
2.3 CCR vs. Traditional VMs
| Feature | CCR (e.g., Cloud Run) | Traditional VM (e.g., Compute Engine) |
|---|---|---|
| Provisioning time | Seconds | Minutes |
| Scaling model | Automatic from 0 to N | Manual or auto‑scaler with warm‑up |
| Operational overhead | None (no OS patching) | Full OS maintenance |
| Pricing model | Pay per request + CPU‑seconds | Pay for uptime (hourly) |
| Isolation | Container sandbox (gVisor) | Full VM isolation |
3. GitHub Actions – Orchestrating Ephemeral Environments
GitHub Actions (GHA) provides a YAML‑based workflow engine that can run jobs on GitHub‑hosted or self‑hosted runners. By combining GHA with CCR, you can:
- Detect a pull‑request event.
- Build a Docker image (or reuse an existing one).
- Deploy the image to a new CCR service revision.
- Run integration tests against the live endpoint.
- Tear down the service after the workflow finishes.
3.1 High‑Level Workflow Diagram
PR Opened ──► GitHub Actions (Workflow)
│
├─► Build Docker Image
│
├─► Deploy to Cloud Run (CCR)
│
├─► Run Tests (curl, playwright, etc.)
│
└─► Delete Cloud Run Service
3.2 Example Workflow File
Below is a full ci-ephemeral.yml that demonstrates the pattern. It uses the Google Cloud SDK Action and the Google Cloud Run Action (both community‑maintained).
name: Ephemeral Environment CI
on:
pull_request:
types: [opened, synchronize, reopened]
jobs:
build-and-test:
runs-on: ubuntu-latest
env:
PROJECT_ID: ${{ secrets.GCP_PROJECT_ID }}
REGION: us-central1
SERVICE_NAME: pr-${{ github.event.number }}-${{ github.sha }}
IMAGE_TAG: gcr.io/${{ secrets.GCP_PROJECT_ID }}/my-app:${{ github.sha }}
steps:
# 1️⃣ Checkout source
- name: Checkout repository
uses: actions/checkout@v4
# 2️⃣ Set up Cloud SDK
- name: Set up Cloud SDK
uses: google-github-actions/setup-gcloud@v2
with:
project_id: ${{ env.PROJECT_ID }}
service_account_key: ${{ secrets.GCP_SA_KEY }}
# 3️⃣ Build Docker image and push to Artifact Registry
- name: Build and push Docker image
run: |
gcloud auth configure-docker ${{ env.REGION }}-docker.pkg.dev
docker build -t ${{ env.IMAGE_TAG }} .
docker push ${{ env.IMAGE_TAG }}
# 4️⃣ Deploy a new Cloud Run service (ephemeral)
- name: Deploy to Cloud Run (ephemeral)
id: deploy
run: |
gcloud run deploy ${{ env.SERVICE_NAME }} \
--image ${{ env.IMAGE_TAG }} \
--region ${{ env.REGION }} \
--platform managed \
--allow-unauthenticated \
--no-traffic
# Capture the service URL for later steps
URL=$(gcloud run services describe ${{ env.SERVICE_NAME }} \
--region ${{ env.REGION }} \
--format='value(status.url)')
echo "url=$URL" >> $GITHUB_OUTPUT
# 5️⃣ Run integration tests against the live URL
- name: Run integration tests
env:
APP_URL: ${{ steps.deploy.outputs.url }}
run: |
# Example: simple curl health check
curl -f -s $APP_URL/healthz || exit 1
# Run your test suite (e.g., npm test, pytest, etc.)
npm ci && npm run test:e2e
# 6️⃣ Cleanup – delete the Cloud Run service
- name: Delete Cloud Run service
if: always()
run: |
gcloud run services delete ${{ env.SERVICE_NAME }} \
--region ${{ env.REGION }} \
--platform managed \
--quiet
Key points:
- The service name incorporates the PR number and commit SHA, guaranteeing uniqueness.
--no-trafficensures the new revision isn’t exposed to production traffic.- The
if: always()guard on the cleanup step guarantees teardown even if tests fail. - Secrets (
GCP_PROJECT_ID,GCP_SA_KEY) are stored in the repository’s Settings → Secrets.
3.3 Advanced GHA Features for Ephemeral Environments
| Feature | How It Helps |
|---|---|
| Matrix builds | Spin up multiple environment variants (different runtime versions, feature flags). |
| Self‑hosted runners with GPU | Run performance or ML tests in a dedicated runner before deploying to CCR. |
| Artifacts | Store test reports, screenshots, or logs for later analysis. |
| Workflow dispatch | Manually trigger a full‑stack preview for non‑PR branches. |
| Environment protection rules | Prevent merges until the ephemeral environment passes all checks. |
4. SDKs – Codifying Infrastructure and Runtime Logic
While the GitHub Actions workflow above uses the gcloud CLI, many teams prefer SDKs to embed provisioning logic directly into application code or IaC pipelines. Below we explore three popular SDK approaches:
4.1 Google Cloud SDK (Python example)
from google.cloud import run_v2
from google.oauth2 import service_account
PROJECT_ID = "my-gcp-project"
REGION = "us-central1"
SERVICE_NAME = f"pr-{pr_number}-{commit_sha}"
IMAGE = f"gcr.io/{PROJECT_ID}/my-app:{commit_sha}"
credentials = service_account.Credentials.from_service_account_file(
"path/to/sa-key.json"
)
client = run_v2.ServicesClient(credentials=credentials)
parent = f"projects/{PROJECT_ID}/locations/{REGION}"
service = run_v2.Service()
service.name = f"{parent}/services/{SERVICE_NAME}"
service.template.revision = f"projects/{PROJECT_ID}/locations/{REGION}/services/{SERVICE_NAME}/revisions/1"
service.template.containers = [
run_v2.Container(
image=IMAGE,
env=[
run_v2.EnvVar(name="ENV", value="preview")
]
)
]
operation = client.create_service(parent=parent, service=service, service_id=SERVICE_NAME)
print("Creating service...")
operation.result() # Wait for completion
print(f"Service URL: {service.uri}")
The SDK approach shines when you need conditional logic (e.g., only deploy if code coverage > 80%) or when you want to bundle deployment into a CLI tool for developers.
4.2 Terraform – Declarative IaC
Terraform can manage Cloud Run resources via the google_cloud_run_service resource. Pair this with GitHub Actions’ hashicorp/setup-terraform action for a fully declarative pipeline.
resource "google_cloud_run_service" "pr_preview" {
name = var.service_name
location = var.region
template {
spec {
containers {
image = var.image
env {
name = "ENV"
value = "preview"
}
}
}
}
traffic {
percent = 100
latest_revision = true
}
}
Workflow snippet:
- name: Init Terraform
run: terraform init
- name: Apply preview
env:
TF_VAR_service_name: ${{ env.SERVICE_NAME }}
TF_VAR_image: ${{ env.IMAGE_TAG }}
TF_VAR_region: ${{ env.REGION }}
run: terraform apply -auto-approve
Terraform’s state can be stored in a remote backend (Google Cloud Storage) to keep track of each PR’s resources and enable safe teardown with terraform destroy.
4.3 AWS CDK (if you prefer AWS Fargate as CCR)
For teams on AWS, the AWS Cloud Development Kit (CDK) can launch AWS App Runner services, which are AWS’s managed CCR equivalent.
import * as apprunner from '@aws-cdk/aws-apprunner-alpha';
import * as cdk from 'aws-cdk-lib';
export class PrPreviewStack extends cdk.Stack {
constructor(scope: cdk.App, id: string, props?: cdk.StackProps) {
super(scope, id, props);
new apprunner.Service(this, 'PreviewService', {
source: apprunner.Source.fromEcr({
imageRepository: ecrRepo,
tag: commitSha,
}),
serviceName: `pr-${prNumber}-${commitSha}`,
cpu: apprunner.Cpu.ONE_VCPU,
memory: apprunner.Memory.TWO_GB,
});
}
}
The CDK can be invoked from a GitHub Actions job via cdk deploy and cdk destroy, mirroring the Cloud Run example.
5. Building a Full‑Featured Ephemeral Environment Pipeline
Let’s stitch everything together into a real‑world pipeline that satisfies the following requirements:
- Zero‑touch preview for each PR (URL posted as a comment).
- Automatic cleanup within 30 minutes of inactivity.
- Secure secrets – no plaintext credentials in the repository.
- Observability – logs streamed to Cloud Logging and test results attached to the PR.
- Cost control – enforce CPU limits and idle‑timeout policies.
5.1 Architecture Overview
GitHub PR ──► GitHub Actions
│ │
│ ├─► Build Docker image → Artifact Registry
│ ├─► Deploy Cloud Run service (CCR)
│ ├─► Run Cypress/Playwright UI tests
│ ├─► Post preview URL as PR comment
│ └─► Schedule Cloud Scheduler job to delete after 30m
│
└─► Cloud Logging (centralized)
5.2 Step‑by‑Step Implementation
5.2.1 Build & Push Image
Already covered in the workflow example. Use Google Artifact Registry (or Docker Hub) for image storage.
5.2.2 Deploy Ephemeral Cloud Run Service
Add a Cloud Scheduler job that invokes a Cloud Function to delete the service after a TTL.
cloud-function-delete.js
const {CloudRunServiceClient} = require('@google-cloud/run');
const client = new CloudRunServiceClient();
exports.deleteService = async (req, res) => {
const {project, region, service} = req.body;
const name = `projects/${project}/locations/${region}/services/${service}`;
await client.deleteService({name});
res.send(`Deleted ${service}`);
};
Create the Scheduler job via Terraform or gcloud:
gcloud scheduler jobs create http delete-pr-${PR_NUMBER} \
--schedule "*/30 * * * *" \
--uri https://REGION-PROJECT.cloudfunctions.net/deleteService \
--http-method POST \
--message-body "$(cat <<EOF
{
"project":"${PROJECT_ID}",
"region":"${REGION}",
"service":"${SERVICE_NAME}"
}
EOF
)" \
--time-zone "UTC"
The job runs every 30 minutes and deletes the service if it still exists.
5.2.3 Run Integration Tests
Use Playwright for a headless browser test suite:
npm ci
npx playwright test --baseURL=${APP_URL}
Configure Playwright to generate an HTML report and upload it as a workflow artifact.
- name: Upload Playwright Report
uses: actions/upload-artifact@v3
with:
name: playwright-report
path: playwright-report/
5.2.4 Post Preview URL as a PR Comment
GitHub’s REST API makes this trivial.
- name: Post preview comment
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
PREVIEW_URL: ${{ steps.deploy.outputs.url }}
run: |
COMMENT_BODY="🚀 Preview ready: $PREVIEW_URL"
curl -s -X POST -H "Authorization: token $GITHUB_TOKEN" \
-H "Content-Type: application/json" \
-d "{\"body\":\"$COMMENT_BODY\"}" \
"https://api.github.com/repos/${{ github.repository }}/issues/${{ github.event.pull_request.number }}/comments"
5.2.5 Observability & Logging
- Cloud Run automatically streams
stdout/stderrto Cloud Logging. - Use structured logs (JSON) to embed request IDs for traceability.
- Enable Cloud Trace for latency analysis.
import logging, json, os
logging.basicConfig(level=logging.INFO)
def handler(request):
request_id = request.headers.get('X-Request-ID', 'unknown')
logging.info(json.dumps({
"msg": "incoming request",
"path": request.path,
"request_id": request_id
}))
# ...rest of handler...
5.3 Security Hardening Checklist
| Item | Recommendation |
|---|---|
| Service Account | Use a minimal SA with only run.services.create, run.services.delete, artifactregistry.read. |
| Network | Deploy Cloud Run with VPC‑Connector and Ingress = internal if the preview should not be publicly reachable. |
| Secrets | Store env vars in Secret Manager and inject via --set-secrets flag. |
| IAM | Restrict gcloud run services describe to the CI service account. |
| TTL Enforcement | Use Cloud Scheduler + Cloud Functions (as above) to guarantee cleanup. |
6. Best Practices & Common Pitfalls
6.1 Naming Conventions
Consistent naming prevents collisions and eases debugging.
service = pr-{PR_NUMBER}-{COMMIT_SHA}
artifact = my-app:{COMMIT_SHA}
Avoid characters that are illegal in Cloud Run service names (uppercase, spaces).
6.2 Managing Concurrency
- For UI tests, set
--concurrency=1to avoid race conditions on shared resources (e.g., a single‑user database). - For API‑only workloads, increase concurrency to reduce cost.
gcloud run services update $SERVICE_NAME \
--concurrency=1 \
--cpu=1 \
--memory=512Mi
6.3 Database Provisioning Strategies
Ephemeral environments often need a temporary database:
| Strategy | Pros | Cons |
|---|---|---|
| Separate Cloud SQL instance per PR | Full isolation, realistic performance | High cost, slower provisioning |
| Shared instance with schema per PR | Cheap, fast | Requires cleanup scripts, risk of cross‑contamination |
| In‑memory SQLite / Dockerized Postgres | Instant, zero cost | Not production‑grade; may miss scaling bugs |
A hybrid approach (shared instance + per‑PR schema + automatic DROP SCHEMA on teardown) balances cost and realism.
6.4 Handling Flaky Tests
Ephemeral environments can expose flakiness due to network latency or cold‑start delays. Mitigate by:
- Adding retry logic in test frameworks (e.g., Playwright
retries: 2). - Using warm‑up requests before the test suite starts.
- Monitoring cold‑start duration via Cloud Logging and adjusting
--cpu-always-allocatedif needed.
6.5 Cost Monitoring
- Enable Budget Alerts in GCP to receive notifications when spend exceeds thresholds.
- Tag resources with
pr-numberandcommit-shafor granular cost attribution. - Use Cloud Run “max instances” limit to cap resource usage per PR.
gcloud run services update $SERVICE_NAME \
--max-instances=2
6.6 Debugging Failed Deployments
- Inspect Cloud Run revision logs – look for
Container failed to start. - Check Artifact Registry permissions – the service account must be able to pull the image.
- Validate environment variables – missing secret values often cause runtime crashes.
You can add a debug step in the workflow that pulls the image locally and runs it with docker run to surface errors early.
7. Real‑World Case Study: Acme Corp’s Migration to Ephemeral Previews
7.1 Background
Acme Corp, a SaaS provider, historically used a single staging environment for all feature branches. Problems:
- Merge conflicts caused nightly build failures.
- Long feedback loops – developers waited hours for UI reviewers to see changes.
- Staging drift – configuration drift led to production bugs.
7.2 Solution Architecture
| Component | Choice |
|---|---|
| Container runtime | Google Cloud Run (CCR) |
| CI/CD | GitHub Actions |
| IaC | Terraform |
| Secrets | Secret Manager |
| Database | Shared Cloud SQL with per‑PR schema |
| Observability | Cloud Logging + Cloud Monitoring dashboards |
7.3 Implementation Highlights
- PR‑triggered workflow creates a Cloud Run service named
pr-<num>-<sha>. - Terraform provisions a schema in Cloud SQL:
schema_pr_<num>. - Playwright runs a full UI test suite; results posted to the PR as a check run.
- Cloud Scheduler deletes the service and drops the schema after 45 minutes of inactivity.
7.4 Outcomes (6‑month period)
| Metric | Before | After |
|---|---|---|
| Average PR merge time | 4.2 days | 1.1 days |
| Staging‑related incidents | 12 per month | 2 per month |
| Cloud Run cost (ephemeral) | $0 (unused) | $450/month (controlled) |
| Developer satisfaction (survey) | 3.2/5 | 4.7/5 |
The case study demonstrates that ephemeral environments are not a theoretical curiosity; they deliver measurable productivity gains when paired with CCR, GHA, and robust SDK tooling.
8. Frequently Asked Questions (FAQ)
Q1: How does Cloud Run handle long‑running jobs?
A: Cloud Run is optimized for request‑driven workloads. For jobs exceeding the request timeout (default 15 min, max 60 min), consider Cloud Run Jobs (a newer offering) or switch to Cloud Batch.
Q2: Can I use GitHub Actions self‑hosted runners inside the same VPC as my Cloud Run services?
A: Yes. Deploy a self‑hosted runner on a GKE node or Compute Engine VM with a VPC connector, allowing private traffic between the runner and Cloud Run services configured for ingress=internal.
Q3: What happens if the workflow crashes before the cleanup step?
A: The if: always() guard ensures the deletion step runs even on failure. Additionally, the Cloud Scheduler TTL acts as a safety net.
Q4: Is it possible to preview a PR without rebuilding the Docker image each time?
A: If the changes are limited to configuration files or static assets, you can reuse the latest image and mount a GitHub checkout as a volume via Cloud Run “continuous deployment” using the --source flag (experimental). However, for reliable reproducibility, rebuilding is recommended.
Q5: How do I debug a failing test that depends on external APIs?
A: Use service-mocking (e.g., WireMock) deployed as a sidecar container in the same Cloud Run revision, or route traffic through a mock server hosted on Cloud Run itself.
Conclusion
Ephemeral environments have moved from a niche experiment to a mainstream production pattern. By leveraging Container‑based Compute Runtime (CCR) platforms like Google Cloud Run, orchestrating the lifecycle with GitHub Actions, and codifying the infrastructure through SDKs (Google Cloud SDK, Terraform, CDK), teams can achieve:
- Rapid, isolated feedback for every code change.
- Cost‑effective scaling—resources exist only when needed.
- Improved security and compliance through immutable, short‑lived stacks.
- Higher developer velocity and better collaboration across product, QA, and security teams.
The journey begins with a single workflow, but the principles—immutability, automation, observability—scale across an organization’s entire delivery pipeline. Adopt the patterns described here, iterate on your own constraints, and you’ll soon experience the tangible benefits of truly ephemeral development.
Resources
- Google Cloud Run Documentation – Official guide to deploying containers as serverless services.
- GitHub Actions Official Docs – Comprehensive reference for workflow syntax, actions, and best practices.
- Terraform Provider for Google Cloud Platform – IaC reference for provisioning Cloud Run, Artifact Registry, and more.
- Playwright Testing Framework – Modern end‑to‑end testing library used in the examples.
- Google Cloud SDK (gcloud) Reference – Command‑line tool for interacting with GCP services programmatically.