gstack Mastery Guide

A complete course to take you from "what's a slash command?" to running an entire virtual engineering team. No coding knowledge required.

What is gstack?

gstack is an open-source toolkit created by Garry Tan (CEO of Y Combinator) that turns Claude Code into a virtual software development team. Instead of one AI that does everything the same way, gstack gives you 28+ specialized slash commands that each activate a different "role" — like switching between a CEO, a designer, an engineer, a QA tester, and a release manager.

Think of it this way: without gstack, asking Claude to build something is like hiring one person and saying "do everything." With gstack, it's like having a full team where each person has a clear job and they hand work off to each other in a structured sequence.

KEY INSIGHT

gstack doesn't add new AI capabilities. It adds structure, roles, and discipline to what Claude Code already can do. The same way a great process makes a team 10x more effective than a group of talented individuals working chaotically.

The Mental Model

Before you touch any commands, internalize this: gstack follows the same cycle that real software teams use. Every feature, every bug fix, every product idea flows through these stages.

💡
Think
Sharpen the idea
📋
Plan
Lock strategy & architecture
🎨
Design
Visual system & UI
🔨
Build
Write the code
🔎
Review
Catch bugs & issues
Test
QA in a real browser
🚀
Ship
Deploy & release
📚
Reflect
Retro & learn
🔄
Repeat
Next iteration
You don't always need every phase

Fixing a small bug? You might jump straight to Build → Review → Ship. Building a new product from scratch? Start from Think. gstack is a toolkit, not a prison — use what fits.

Install gstack From Zero (Mac)

If you've never opened Terminal before, start here. Every step is a copy-paste. Total time: about 15 minutes, most of which is waiting for downloads.

BEFORE YOU START What you'll end up with

A working gstack setup on your Mac: Claude Code installed, gstack's 28+ slash commands ready to use, and your first session running in under 20 minutes.

You will not need to know how to code. You will need to paste commands into a black window (called Terminal) and click "Allow" on a few system prompts.

Step 1 — Open Terminal

Terminal is a built-in app on every Mac. It's the black window where you type commands directly to your computer.

To open it: press Cmd + Space to open Spotlight, type Terminal, hit Enter. A black or white window will open. That's it — you're in.

Leave this window open for the rest of the install. Everything you do is pasting into this window.

Step 2 — Install Homebrew

Homebrew is a tool that lets you install other tools with a single command. Think of it as the App Store for developer software. Installing it once unlocks every other step.

Copy this entire line, paste it into Terminal, hit Enter:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

It will ask for your Mac password. You won't see anything as you type — that's normal, it's a security feature. Type your password anyway and hit Enter.

Installation takes 2–5 minutes. When it finishes, you'll see "Installation successful!"

If Terminal prints "Next steps" with extra commands at the end of Homebrew's install (usually two lines starting with echo and eval), copy and run those too. They tell your computer where to find Homebrew.

Step 3 — Install Node.js

Node is the runtime Claude Code is built on. One command:

brew install node

Takes about a minute. When your prompt comes back without errors, you're done.

Step 4 — Install Claude Code

npm install -g @anthropic-ai/claude-code

This installs Claude Code globally on your machine. Another minute or so.

Step 5 — Install gstack

gstack isn't one "thing" — it's a bundle of skills you drop into Claude Code. Run gstack's installer:

curl -fsSL https://gstack.dev/install.sh | bash

This places gstack's slash commands into the right folder so Claude Code picks them up automatically on your next session.

Step 6 — Start your first gstack session

Pick a folder to work in. If you don't have one, make a new one on your Desktop:

mkdir ~/Desktop/my-first-gstack
cd ~/Desktop/my-first-gstack

Now launch Claude Code:

claude

Claude Code opens inside Terminal. First-time only, it walks you through signing in with your Anthropic account. Follow the prompts.

Once you're in, try your first slash command:

/careful

If gstack loaded correctly, Claude will confirm safety mode is on. You're live.

STUCK? Common first-time errors

"command not found: brew" — Homebrew's install finished but your shell doesn't know where it is yet. Close Terminal completely (Cmd + Q), reopen it, try again.

"permission denied" during npm install — macOS blocking the install. Prefix the command with sudo, like sudo npm install -g @anthropic-ai/claude-code. It'll ask for your Mac password.

Slash commands don't appear in Claude Code — gstack didn't install into the right directory. Check that ~/.claude/skills/gstack exists. If not, re-run Step 5.

From here, keep reading. The next section (Getting Started) shows you what to type once you're inside a session.

Getting Started

How to use slash commands

Every gstack skill is activated by typing a / followed by the command name inside Claude Code. You type these in the same place you'd type any message to Claude.

/office-hours ← type this and hit Enter

Claude will then "become" that role. It'll change how it thinks, what it focuses on, and what questions it asks you. When you're done with that role, just invoke the next one.

The two ways to run gstack

OPTION A Step by step (recommended for learning)

Call each skill manually in sequence. This gives you control and teaches you what each phase does.

/office-hours ← refine your idea
/plan-ceo-review ← product strategy
/plan-eng-review ← technical architecture
/review ← code review
/qa ← test it
/ship ← deploy it
OPTION B Autoplan (for when you know the flow)

One command chains CEO → Design → Eng review automatically, pausing only for your approval at key decision points.

/autoplan

Phase 1: Think

/office-hours
Role: Startup Advisor / Brainstorm Partner
Think +

This is where every new idea should start. gstack asks you 6 forcing questions designed to cut through vague thinking and expose whether your idea has real demand. It has two modes:

Startup mode — challenges you on demand reality, the status quo your users tolerate, how desperate users are for a solution, the narrowest possible wedge to start with, observations others have missed, and future-fit.

Builder mode — for side projects, hackathons, and learning. More exploratory, less pressure-testing.

When to use: Every time you have a new product idea, feature concept, or are starting something from scratch.
You: /office-hours
You: "I want to build a tool that helps restaurants manage their waitlist via SMS"
gstack: [asks 6 hard questions about demand, alternatives, specificity...]
Don't skip this phase

The biggest mistake beginners make is jumping straight to "build me X." The /office-hours questions force you to think clearly about what you're building and why before any code gets written. It saves hours of building the wrong thing.

Phase 2: Plan

Planning in gstack happens from two perspectives: the product leader and the engineer. This dual review catches problems that a single perspective would miss.

/plan-ceo-review
Role: Product CEO
Plan +

Thinks like a product leader: "What's the 10-star version hiding inside this request?" Has 4 modes:

Expansion — go bigger, find the hidden opportunity
Selective Expansion — expand some parts, hold others
Hold Scope — keep it exactly as specified
Reduction — cut it down to the absolute minimum viable version

Outputs a Markdown plan document that becomes the source of truth for what gets built.

When to use: After /office-hours, before any code is written. Also when scope is creeping and you need someone to challenge it.
You: /plan-ceo-review
gstack: "Let me review this plan as if I'm the CEO. Here's what I think the 10-star version looks like..."
/plan-eng-review
Role: Staff Engineer
Plan +

Locks down the technical architecture: data models, API structure, file organization, edge cases, error handling, and what tests need to be written. Produces architecture diagrams and a technical spec.

This is what prevents the "I built it but it's a mess" problem. The eng review creates a blueprint that the build phase follows.

When to use: After the CEO review approves the product direction. Before any implementation.
You: /plan-eng-review
gstack: "Here's my architecture recommendation: data flow, component structure, edge cases to handle..."
/plan-design-review
Role: Senior Product Designer
Plan +

Opens your live site and audits it the way a Stripe or Linear designer would — with immediate, opinionated visual reactions. Covers an 80-item checklist across 10 categories: typography, spacing, hierarchy, color, responsive design, interaction states, motion, content quality, and AI Slop Detection.

Outputs two headline scores: Design Score and AI Slop Score (letter grades A–F). Also extracts your Inferred Design System from the live CSS — fonts, colors, spacing scale — and can save it as DESIGN.md.

Report only — never touches your code. Use /qa-design-review when you want it to fix what it finds.

When to use: After building any UI. Catches AI slop patterns (purple gradients, 3-column icon grids, uniform border-radius) and visual hierarchy problems before you ship.
You: /plan-design-review https://myapp.com
gstack: "Design Score: C | AI Slop Score: D — 'The site communicates generic SaaS energy.' Top issue: gradient hero + 3-column icon grid. [Full report with 12 findings]"
/plan-devex-review
Role: Developer Experience Lead
Plan +

Reviews your plan through a developer experience lens: API ergonomics, onboarding friction, documentation needs, tooling setup, and local dev workflow. Has multiple interactive modes to target specific DX concerns.

Catches problems that will slow down future developers (including yourself) before you've built the wrong thing.

When to use: When building APIs, SDKs, developer tools, or anything where DX is a first-class concern. Also valuable for internal tools where developer onboarding matters.
You: /plan-devex-review
gstack: "Reviewing from a DX perspective... The current API design requires 4 steps to authenticate. Let's talk about what a zero-friction first call looks like..."
/autoplan
Role: Orchestrator
Shortcut +

Chains CEO Review → Design Review → Eng Review automatically. Pauses at key decision points for your approval, so you stay in control without manually calling each command.

This is the "I trust the process, just run it" button for planning.

When to use: When you're comfortable with the planning flow and want to save time.
You: /autoplan
gstack: [runs CEO review, pauses for approval, runs design review, pauses, runs eng review]

Phase 3: Design

gstack's design skills are one of its most powerful differentiators. They don't just suggest colors — they build complete design systems and generate real, production-quality HTML.

/design-consultation
Role: Design Director
Design +

Builds your design system from scratch: researches the competitive space, proposes a visual direction, defines typography, colors, spacing, component patterns, and creative risks. Outputs a DESIGN.md file that becomes your design source of truth.

When to use: At the start of a new project, or when you need a coherent visual identity.
You: /design-consultation
gstack: "Let me research the space and propose a design system. What's the product and who's it for?"
/design-shotgun
Role: Visual Explorer
Design +

Generates multiple visual variants side-by-side so you can compare directions. Instead of getting one design and tweaking it 50 times, you see 3-4 options at once and pick the best one.

When to use: When you're not sure what direction to go visually and want to explore options quickly.
You: /design-shotgun
You: "Show me 3 different hero section styles for a SaaS landing page"
/design-html
Role: Frontend Implementer
Design +

Takes approved mockups, CEO plans, design reviews, or a from-scratch brief and generates production-quality Pretext-native HTML. Text reflows on resize, heights adjust to content, and the skill auto-routes to the right API per design type.

Includes framework detection — if your project is React, Svelte, or Vue, it emits idiomatic component code instead of raw HTML.

When to use: After you've approved a design direction and want real, shippable code in your project's framework.
You: /design-html
gstack: "Detected Next.js + Tailwind. Converting approved hero to a responsive React component..."
/design-review
Role: Designer-Who-Codes
Design +

Live-site visual audit + fix loop. Runs the same 80-item audit as /plan-design-review (typography, spacing, hierarchy, color, AI Slop Detection), then enters a fix loop: locates the source file for each finding, makes the minimal CSS change, and commits with style(design): FINDING-NNN.

One commit per fix, before/after screenshots, fully bisectable. CSS-only changes get a free pass; component JSX edits count against a self-regulation risk budget (hard cap 30 fixes).

Note: the current upstream skill that fixes design issues. The older /qa-design-review name is the legacy alias — both folders may co-exist transitionally.

When to use: After /plan-design-review surfaces issues you actually want fixed in code, not just listed.
You: /design-review https://myapp.com
gstack: "Design Score C → B+ | AI Slop Score D → A. 9 fixes applied across 9 atomic commits. Before/after screenshots in .gstack/design-reports/."

Phase 4: Build

This is where Claude Code writes the actual code. In gstack's flow, building happens after the plan and design are locked. The plans from earlier phases become the instructions Claude follows.

HOW IT WORKS

You don't need a special slash command for building. Once your plans are set (from /plan-eng-review), you simply ask Claude to implement them:

"Implement the user authentication feature according to the engineering plan"
"Build the landing page following DESIGN.md"
"Add the API endpoints from the architecture doc"

Claude will reference the plan files that /plan-eng-review and /design-consultation created, and build against those specs rather than improvising.

Why this matters for non-coders

Even if you can't read code, the planning phase created documents in plain English that describe what should be built. Claude follows those documents. Your job during the build phase is to make sure Claude is sticking to the plan — and if something seems off, ask "does this match the engineering plan?"

Phase 5: Review

/review
Role: Senior Code Reviewer
Review +

A thorough code review that looks for bugs, security issues, performance problems, edge cases, and style inconsistencies. It doesn't just point things out — it catches bugs that /ship then verifies are fixed.

Think of it as the "experienced developer looking over the junior developer's shoulder" step.

When to use: After any code has been written, before testing and shipping.
You: /review
gstack: "I'll review the recent changes. Found 3 issues: a potential null pointer, a missing error handler, and an N+1 query..."
/cso
Role: Chief Security Officer
Security +

Runs a threat model on your code: injection attacks, authentication holes, data exposure, CSRF, XSS, and more. Thinks like a hacker trying to break your app.

When to use: Before shipping anything that handles user data, authentication, payments, or is publicly accessible.
You: /cso
gstack: "Running threat model... Found: API endpoint doesn't validate auth tokens, SQL injection possible in search..."
/health
Role: Code Quality Inspector
Review +

An overall code quality health check. Scans the codebase for technical debt, code smells, outdated dependencies, missing tests, and structural issues. Think of it as a routine checkup rather than a targeted review of recent changes.

When to use: Periodically to get a snapshot of overall code health, or before starting a new sprint to understand what needs the most attention.
You: /health
gstack: "Code health report: 3 files with high complexity, 2 outdated dependencies, test coverage at 61%. Priority issues: [list]..."
/devex-review
Role: DX Auditor
Review +

A live developer experience audit — runs real tests of your local dev setup, CLI tools, build process, and onboarding flow. Unlike /plan-devex-review (which reviews plans), this reviews actual running code and tools.

Tests the "getting started" journey from scratch, identifies friction points a new developer would hit, and suggests concrete fixes.

When to use: After building a tool, API, or SDK to verify the real-world DX matches the intended design. Also useful before open-sourcing or sharing with a new team.
You: /devex-review
gstack: "Running DX audit... Cloning repo... Running npm install... Found: 3 undocumented env variables are required before the app starts. Here's how to fix it..."

Phase 6: Test

/qa
Role: QA Lead
Test +

The breakthrough skill. /qa opens a real Chromium browser, navigates to your app, takes screenshots, clicks buttons, fills forms, and verifies things actually work. When it finds a bug, it doesn't just report it — it fixes the code, writes a regression test, and verifies the fix.

This was Garry Tan's "massive unlock" — Claude saying "I SEE THE ISSUE" and then actually fixing it live.

When to use: After /review has cleaned up the code. Before shipping.
You: /qa
gstack: "Opening browser... navigating to localhost:3000... Screenshot taken. Testing login flow... [clicks, types, verifies] Found: the signup button doesn't respond on mobile widths. Fixing..."
/qa-only
Role: QA Tester (no fixes)
Test +

Same as /qa but only reports issues without fixing them. Use this when you want a clean bug report without Claude modifying your code.

When to use: When you want a list of bugs but want to decide yourself how to handle them.
You: /qa-only
gstack: "Found 5 issues: [list with screenshots and reproduction steps]"
/qa-design-review
Role: Designer Who Codes
Test +

Runs the same 80-item visual audit as /plan-design-review, then fixes what it finds. For each design finding it locates the source file, makes the minimal CSS/styling change, commits with style(design): FINDING-NNN, re-navigates to verify, and takes before/after screenshots. One atomic commit per fix — fully bisectable.

CSS-only changes get a free pass. Changes to component JSX/TSX count against a risk budget. Hard cap at 30 fixes; if risk score exceeds 20% it stops and asks.

When to use: When /plan-design-review has flagged issues and you want them fixed automatically, not just reported.
You: /qa-design-review
gstack: "Design Score: C → B+ | AI Slop Score: D → A. 9 fixes applied (8 verified). style(design): FINDING-001 — replace 3-column icon grid with asymmetric layout..."
/benchmark
Role: Performance Engineer
Test +

Performance testing — measures load times, response times, memory usage, and identifies bottlenecks.

When to use: When you suspect performance issues or before launching to production.
You: /benchmark
gstack: "Running performance tests... API response time: 340ms (acceptable), but the dashboard page loads in 4.2s (too slow). The bottleneck is..."

Phase 7: Ship

/ship
Role: Release Manager
Ship +

Pushes your code as a pull request. Handles git operations, commit messages, PR descriptions. Verifies that review issues were actually fixed before allowing the push.

When to use: After /review and /qa have passed. This is the "go" button.
You: /ship
gstack: "Creating PR... All review issues resolved. PR created: #47 'Add user authentication'..."
/land-and-deploy
Role: DevOps Engineer
Ship +

Merges the PR and deploys to production. Handles the actual deployment pipeline.

When to use: After the PR is approved and you're ready to go live.
You: /land-and-deploy
gstack: "Merging PR #47... Deploying to production... Deploy successful."
/canary
Role: Release Safety
Ship +

Monitors the deployment for errors after it goes live. Watches logs, error rates, and performance metrics to catch problems early.

When to use: Immediately after /land-and-deploy to make sure nothing broke in production.
You: /canary
gstack: "Monitoring deployment... Error rate: 0.1% (normal). No new exceptions. Deploy looks healthy."
/setup-deploy
Role: Deploy Configurator
Ship +

One-time deployment pipeline configuration. Sets up the connection between your repository and your hosting platform so that /ship and /land-and-deploy work correctly for your project.

Guides you through selecting your hosting provider, configuring environment variables, and verifying the deploy pipeline end-to-end.

When to use: Once at the start of a new project, before you use /ship for the first time.
You: /setup-deploy
gstack: "What hosting provider are you using? [Vercel / Railway / Fly / AWS / other]... Configuring deployment pipeline..."
/landing-report
Role: Ship Queue Dashboard
Ship +

Read-only snapshot of the workspace-aware ship queue. Shows which version slots are claimed and which sibling workspaces have WIP — critical when running 5–10 Conductor workspaces in parallel and you don't want two of them claiming the same version number.

Doesn't modify anything. Pure visibility into "who is shipping what right now" across the whole cluster.

When to use: Coordinating multiple branches/workspaces about to ship; before /ship when you suspect another workspace is also racing to land.
You: /landing-report
gstack: "Ship queue: v2.4.1 claimed by workspace-alpha (45min ago, branch feature/billing). v2.4.2 free. workspace-beta has WIP on hotfix/auth. workspace-gamma idle."

Phase 8: Reflect

/retro
Role: Team Facilitator
Reflect +

Runs a retrospective on what just happened: what went well, what didn't, what to change. Stores learnings so future sessions benefit.

When to use: After completing a feature or sprint. Especially after something went wrong.
You: /retro
gstack: "Retrospective: The auth feature shipped in 2 sessions. What went well: clear plan prevented scope creep. What to improve: QA found 3 bugs that review should have caught..."
/document-release
Role: Documentation Engineer
Reflect +

Generates release notes, changelogs, and documentation for what was shipped. Keeps your project's documentation up to date automatically.

When to use: After deploying, to keep a clean record of what changed and why.
You: /document-release
gstack: "Release v1.3.0 — Added: user authentication with OAuth, password reset flow. Fixed: mobile navigation bug..."

GBrain — Persistent Memory

GBrain is gstack's persistent knowledge base for AI agents. It stores what your agent learns, what you've decided, what worked, and what didn't — and lets the agent search all of it on demand. Without GBrain, every new session starts cold. With it, yesterday's learning on machine A surfaces in today's session on machine B.

HOW IT FITS THE FLOW

GBrain sits underneath every other skill. /learn writes session-scoped notes; GBrain promotes them to durable, searchable memory. /retro pulls cross-machine history when sync is on. /health includes a GBrain dimension (doctor status, sync queue depth, last-push age) in its composite score.

Three deployment paths: Supabase (cloud, shareable across teammates and machines), auto-provisioned Supabase (one-command project creation), or PGLite (local-only, zero network, ~30s setup).

/setup-gbrain
Role: Memory Architect
Meta +

One command from zero to "GBrain is running and my agent can call it." Detects current state, asks at most three questions, walks through install, init, MCP registration for Claude Code, and per-repo trust policy.

Per-remote trust triad: every repo gets a policy — read-write (search + write back), read-only (search but never contaminate), or deny. Set once per repo, sticky across worktrees and branches. Critical for multi-client consultants who don't want Client A's code leaking into Client B's brain.

Sub-flags: --switch (migrate PGLite ↔ Supabase, lossless), --repo (re-prompt trust policy for current repo), --cleanup-orphans (delete abandoned Supabase projects), --resume-provision <ref> (recover from interrupted provision).

When to use: Once per machine. Re-run with --switch if you want to move from local PGLite to a shared cloud brain.
You: /setup-gbrain
gstack: "Where should your brain live? 1) Supabase (paste URL) 2) Auto-provision Supabase 3) PGLite local. Pick one..."
Flow + combines with: Prerequisite for cross-machine memory. Pair with gstack-brain-init (machine-local gstack memory sync) for full persistence. /retro and /health automatically pick it up once configured.
gstack-brain-init / -restore / -sync / -uninstall
Role: GStack Memory Sync (CLI)
Meta +

Terminal-level controls for syncing your ~/.gstack/ state (learnings, plans, retros, design docs, developer profile) to a private git repo. Different from GBrain itself — this is the gstack memory layer that becomes indexable by GBrain when both are on.

gstack-brain-init — turns ~/.gstack/ into a git repo, pushes initial commit, writes the URL-only marker file (safe to copy between machines).

gstack-brain-restore — on a new machine, after copying ~/.gstack-brain-remote.txt over, this clones and rehydrates everything.

gstack-brain-sync --status — last successful push, pending queue depth, sync blocks, current privacy mode.

gstack-brain-uninstall — removes .git and .brain-* config; never touches your learnings, plans, or retros. Add --delete-remote to also delete the private GitHub repo.

Privacy modes: off, artifacts-only (plans + designs + retros + learnings, skip behavioral data), full (everything in the allowlist). Set with gstack-config set gbrain_sync_mode <mode>.

Secret protection: every commit is scanned before leaving your machine — AWS keys, GitHub tokens, OpenAI keys, PEM blocks, JWTs, and bearer tokens are all blocked. If a scan hits, sync stops and the queue is preserved.

When to use: Run gstack-brain-init on the machine you work on most. Run gstack-brain-restore on every new machine after copying the URL marker. Run --status when something feels stale or after /gstack-upgrade.
$ gstack-brain-sync --status
last push: 2026-05-09 18:42 UTC (12h ago)
queue: 0 pending
mode: artifacts-only
status: clean
Flow + combines with: The "follow me across machines" half of persistent memory. Pair with /setup-gbrain for the indexable-knowledge-base half. Together: yesterday's machine-A learning is searchable in today's machine-B session — the magic moment.
/sync-gbrain
Role: Keep Brain Current
Memory +

Refreshes GBrain against the current repo's code and teaches the agent when to prefer gbrain search / code-def over Grep. Idempotent — safe to re-run anytime.

Without this, GBrain's index drifts behind your code and the agent falls back to slow grep when a fast semantic search exists. Run after big merges, refactors, or whenever search results feel stale.

When to use: After significant code changes, after rebasing onto a busy main branch, or periodically to keep brain answers current.
You: /sync-gbrain
gstack: "Indexed 247 changed files. Updated 1,820 code defs. Agent will now prefer gbrain search for symbol lookups, code-def for definitions, Grep only for raw text patterns."
Why two systems instead of one?

GBrain is a separate, actively-developed project with its own release cadence, schema migrations, and MCP surface. Bundling it would slow GBrain improvements from reaching users. GStack memory sync is gstack's own state — kept separate so you can run one without the other. They become powerful together: GBrain provides the indexable surface, gstack-brain provides the content to index.

Safety Guardrails

These commands protect you from accidents. Especially important when you're not a coder and can't easily spot if Claude is about to do something destructive.

/careful
Warns before destructive commands
Safety +

Activates warnings before any destructive operation: rm -rf (delete everything), DROP TABLE (destroy database), git reset --hard (lose all changes), force-push (overwrite remote code). Claude will stop and ask you to confirm before running these.

When to use: Always. Turn this on at the start of every session.
You: /careful
gstack: "Safety mode ON. I'll warn before any destructive operations."
/freeze
Locks edits to one directory
Safety +

Restricts all file edits to a single directory. If Claude tries to edit files outside that boundary, it's blocked. Essential when debugging — prevents Claude from "fixing" unrelated code while investigating a specific module.

When to use: When debugging a specific part of the code and you don't want Claude touching anything else.
You: /freeze src/auth
gstack: "Frozen. All edits restricted to src/auth/. Nothing else can be modified."
/guard
Activates both /careful + /freeze
Safety +

Combo command that turns on both safety guardrails at once. Maximum protection.

When to use: When working on critical/sensitive parts of the codebase.
You: /guard src/payments
gstack: "Guard mode ON. Careful warnings active + edits frozen to src/payments/"
/unfreeze
Removes the /freeze restriction
Safety +

Removes the directory lock so Claude can edit files anywhere again.

When to use: When you're done debugging and want Claude to have full access again.
You: /unfreeze
gstack: "Directory lock removed. I can now edit files anywhere."
Non-coder pro tip

If you don't understand what Claude is about to do, ask it to explain in plain English before you approve. "Explain what this command does and what could go wrong" is a perfectly valid and smart thing to say. Never approve something you don't understand.

Browser & Browse

gstack includes a built-in headless browser — a real Chromium instance that Claude can control to visit websites, take screenshots, click buttons, and test your app.

/browse
Role: Browser Agent
Browser +

Opens a persistent, high-performance Chromium session (~100-200ms per command). Claude can navigate pages, take screenshots, verify layouts, check console errors, and interact with web pages. Used by /qa under the hood.

When to use: For web research, testing your app, or any task that requires seeing a webpage.
You: /browse https://yourapp.com
gstack: "Opening browser... [screenshot] I can see the landing page. What should I check?"
/open-gstack-browser
Role: Browser Launcher
Browser +

Launches gstack's AI-controlled Chromium browser with a visible sidebar agent. Unlike /browse (which is headless), this opens a browser window you can see — with Claude's controls visible in a sidebar next to the page.

Supports anti-bot stealth mode and can import cookies from your existing browser for authenticated sessions.

When to use: When you want to see the browser as Claude navigates it, or when working with sites that block headless browsers.
You: /open-gstack-browser
gstack: "Launching Chromium with sidebar agent... Browser open. Where should I navigate?"
/setup-browser-cookies
Browser Configuration
Browser +

Auto-detects installed Chromium browsers (Chrome, Arc, Brave, Edge), decrypts cookies via the macOS Keychain, and loads them into the Playwright session. Interactive picker UI lets you choose which domains to import — cookie values never displayed. Or skip the UI by passing a domain directly.

When to use: Before /qa or /browse need to test pages behind login.
You: /setup-browser-cookies github.com
gstack: "Imported 12 cookies for github.com from Comet. Session is ready."
/scrape
Role: Browser Data Extractor
Browser +

Pull structured data from a web page. First call prototypes the extraction interactively via the $B browser primitive — Claude figures out the selectors and shape live. Subsequent calls matching the same intent run a codified browser-skill in ~200ms.

The two-phase flow turns one-off scraping into reusable, fast pipelines: prototype once, run forever.

When to use: Pulling structured data from any web page — first time as a prototype, then reuse the codified version.
You: /scrape get the top 10 posts from news.ycombinator.com with titles and points
gstack: "Prototyping... extracted 10 items. Want me to /skillify this for ~200ms reuse?"
/skillify
Role: Skill Codifier
Meta +

Walks back through your conversation, finds the most recent /scrape prototype, and synthesizes a script + test + fixture. Runs the test to verify, then asks before committing.

This is how exploratory browser work becomes durable infrastructure. After /skillify, the same scrape runs in milliseconds without an LLM in the loop.

When to use: Right after a /scrape prototype works — codify it for reuse.
You: /skillify
gstack: "Found /scrape prototype from earlier. Synthesizing script + test + fixture... Tests pass. Commit as scripts/scrape-hn-top.ts? [Y/n]"

Learning System

gstack gets smarter over time. It stores patterns, preferences, and pitfalls specific to your project so it doesn't repeat mistakes.

/learn
Role: Knowledge Manager
Learn +

Manage what gstack has learned across sessions. You can:

Review — see all stored patterns
Search — find specific learnings
Prune — remove outdated or wrong learnings
Export — save your learnings to share with others

This is how gstack avoids making the same mistake twice. Every other skill auto-searches your learnings before recommending — when a past insight applies, it surfaces as "Prior learning applied". After /retro captures learnings, /learn lets you manage them.

When to use: Periodically, to review and clean up what gstack has stored about your project. Also: export to share with teammates.
You: /learn
gstack: "23 learnings (14 high, 6 medium, 3 low). [9/10] API responses always wrapped in { data, error } envelope. [8/10] Tests use factory helpers in test/support/factories.ts. 3 stale (referenced files deleted) — prune? [Y/n]"
/context-save
Role: Save State
Memory +

Saves the working context — git state, decisions made, remaining work — so a future session can resume from the same point. Not just a TODO list: captures the why behind in-progress choices so the next session doesn't have to re-derive them.

When to use: Ending a session mid-task, before a Conductor handoff, or any time you want a checkpoint you can pick up later.
You: /context-save
gstack: "Saved. Branch: feature/upload-flow. Decisions: chose async enrichment over sync. Remaining: hook up Stripe billing + write 3 regression tests. Resume with /context-restore."
/context-restore
Role: Restore State
Memory +

Resumes from a saved context — even across Conductor workspace handoffs. Pairs with /context-save to make multi-session work feel continuous instead of starting cold each time.

When to use: Starting a new session and wanting to pick up exactly where you left off.
You: /context-restore
gstack: "Resuming feature/upload-flow. Last decision: async enrichment. Next 3 items: Stripe wiring → regression tests → /qa pass."
/plan-tune
Role: Question Tuner
Meta +

Self-tunes the sensitivity of AskUserQuestion prompts across all plan/review skills. Mark questions as never-ask (you've decided), always-ask (high-stakes, never auto-resolve), or only-for-one-way (only interrupt for irreversible decisions).

This is the lever for prompt fatigue. Run after /autoplan if it's stopping you on the same questions over and over.

When to use: When the planning skills keep asking you about decisions you've already made — or, conversely, when they're auto-resolving things that should always pause for you.
You: /plan-tune
gstack: "Last 10 plan runs hit 'pick database' 8 times — you always picked Postgres. Mark as never-ask? Also 'should we add auth?' triggered twice — mark as always-ask?"
/codex
Role: Second Opinion (OpenAI Codex CLI)
Review +

Independent review from OpenAI Codex CLI. Runs against the same diff that /review saw, but with a different training distribution — catches what Claude misses. Three modes:

Reviewcodex review against current diff; classifies findings P1/P2/P3; any P1 = FAIL. Fully independent (Codex doesn't see Claude's review).
Challenge — adversarial mode at maximum reasoning effort (xhigh); pen-tests your logic.
Consult — open conversation with session continuity for follow-ups.

Cross-model analysis: when both /review and /codex have run, shows overlap (high confidence), unique-to-Codex, and unique-to-Claude — "two doctors, same patient."

When to use: When passing tests aren't enough and you want a different model's distribution to spot the bugs Claude misses. Especially before shipping anything risky.
You: /codex
gstack: "CODEX REVIEW: PASS (3 findings) [P2] race condition in payment handler — concurrent charges double-debit without advisory lock. Cross-model: OVERLAP race condition; UNIQUE TO CODEX token timing; UNIQUE TO CLAUDE N+1 in listing photos."
/gstack-upgrade
Role: Self-Updater
Learn +

Upgrades gstack to the latest version. Fetches updates from the official repository, applies them to your local install, and reports what changed.

Run this periodically to get new skills, bug fixes, and improvements as the project evolves.

When to use: Whenever you want to pull the latest gstack improvements, or when you see a reference to a command you don't have yet.
You: /gstack-upgrade
gstack: "Checking for updates... Found v1.4.2 (you have v1.3.1). Upgrading... Done. New skills added: /devex-review, /pair-agent."
/benchmark-models
Role: Model Benchmark
Meta +

Side-by-side cross-model benchmark for skills — Claude vs GPT vs Gemini on the same prompt. Measures latency, tokens, and cost; optionally adds an LLM-judged quality score.

Distinct from /benchmark (which measures site performance). This is for deciding which model to route a skill to, or for evaluating a new model release before swapping defaults.

When to use: Routing a new skill to the right model, or comparing model performance after a major release.
You: /benchmark-models /review
gstack: "Sonnet 4.6: 12.4s, $0.18, P1=2 P2=4. GPT-5.1: 18.1s, $0.31, P1=1 P2=6. Gemini 3 Pro: 9.2s, $0.09, P1=1 P2=3. LLM-judge: Claude > Gemini > GPT on signal density."
/make-pdf
Role: PDF Generator
Utility +

Turn any markdown file into a publication-quality PDF — proper margins, page numbers, cover pages, clickable table of contents. Pairs naturally with the markdown artifacts gstack produces (retros, design docs, learning exports, release notes).

When to use: Producing a sharable PDF from a /retro, /document-release output, /learn export, or design doc from /plan-*-review.
You: /make-pdf .context/retros/2026-W18.md
gstack: "Rendered to retros-2026-W18.pdf — 8 pages, cover, TOC, page numbers."

Parallel Work (Advanced)

/investigate
Role: Detective
Utility +

Deep-dive investigation into a specific issue or module. Automatically freezes to the module being investigated so Claude doesn't accidentally change unrelated code while digging into a problem.

When to use: When something is broken and you need Claude to figure out why without touching other things.
You: /investigate "Why is the payment webhook failing?"
gstack: "Auto-freezing to src/webhooks... Investigating... Found: the Stripe signature validation is using the wrong key."
/checkpoint
Role: State Manager
Utility +

Saves the current state of your work so you can safely pause and resume later. Documents what's been completed, what's in progress, and what comes next — giving the next session full context to pick up exactly where you left off.

When to use: Before ending a session mid-task, switching context, or handing work to a new session.
You: /checkpoint
gstack: "Saving progress... Completed: auth UI. In progress: API integration (50% done). Next: wire up login form to /api/auth endpoint."
/pair-agent
Role: Multi-Agent Coordinator
Think +

Enables cross-vendor browser sharing — lets multiple AI agents (Claude, GPT-4, Gemini, etc.) share the same browser session. One agent can hand off a browser context to another without losing state.

Powers advanced multi-agent workflows where different AI models collaborate on the same task, each contributing what they're best at.

When to use: Advanced workflows where you want multiple AI models to collaborate, or when you need to delegate browser tasks to a different agent mid-session.
You: /pair-agent
gstack: "Sharing browser session... Agent handoff ready. Which agent should take over?"
CONDUCTOR Parallel Multi-Session Work

Conductor is gstack's advanced orchestration tool that runs multiple Claude Code sessions in parallel, each in an isolated workspace. Imagine 4 developers working simultaneously:

Session 1: implementing Feature A
Session 2: running QA on Feature B
Session 3: doing a code review on Feature C
Session 4: investigating a production bug

This is power-user territory. Start with single sessions first and graduate to Conductor when you're comfortable with the core workflow.

Full Skill Reference

Every gstack command in one place. Click any to expand.

/office-hours 6 forcing questions to sharpen your idea
/plan-ceo-review Product strategy review (4 scope modes)
/plan-design-review UX and interaction review of plans
/plan-eng-review Lock architecture, data flow, edge cases
/plan-devex-review Review plan for developer experience and API ergonomics
/autoplan Chain all 3 reviews automatically
/design-consultation Build design system from scratch
/design-shotgun Generate multiple visual variants to compare
/design-html Convert designs to production HTML
/design-review Catch "AI slop" and design inconsistencies
/review Senior code review for bugs & issues
/cso Security threat model
/qa Full QA with real browser, auto-fixes bugs
/qa-only QA report only, no auto-fixes
/qa-design-review Visual design audit + auto-fixes (AI slop, typography, spacing)
/benchmark Performance testing
/investigate Deep-dive with auto-freeze
/health Overall code quality health check
/devex-review Live DX audit — tests real onboarding and dev workflow
/ship Create PR and push
/land-and-deploy Merge PR and deploy to production
/canary Post-deploy monitoring
/setup-deploy Configure deployment pipeline
/retro Sprint retrospective
/document-release Generate release notes & docs
/learn Manage stored project learnings
/codex Codebase reference system
/checkpoint Save progress to resume later
/gstack-upgrade Update gstack to latest version
/pair-agent Share browser session across multiple AI agents
/open-gstack-browser Launch visible Chromium with sidebar agent

Leveling Up: Your Path to Pro

1
Beginner
2
Intermediate
3
Advanced
4
Pro

Level 1: Beginner (Week 1-2)

Goal: Understand the flow and get comfortable with the core commands.

Practice this loop:

/office-hours /plan-ceo-review /plan-eng-review /review /ship

Start with /careful at the beginning of every session. Don't skip it.

Focus on answering gstack's questions thoughtfully. The quality of what gets built depends on the quality of your answers in /office-hours and /plan-ceo-review.

Level 2: Intermediate (Week 3-4)

Goal: Add design and QA to your workflow.

Expanded loop:

/office-hours /autoplan /design-consultation Build /review /qa /ship

Start using /autoplan instead of calling each plan review manually.

Add /qa to catch bugs Claude's code review missed.

Try /freeze when debugging — notice how it prevents Claude from wandering.

Level 3: Advanced (Month 2)

Goal: Use the full toolkit including security, performance, and learning.

Full workflow:

/office-hours /autoplan /design-shotgun /design-html Build /review /cso /qa /benchmark /ship /retro

Add /cso for security reviews on anything user-facing.

Use /benchmark to catch performance issues before users do.

Run /retro after each sprint and /learn to manage accumulated knowledge.

Level 4: Pro (Month 3+)

Goal: Run parallel sessions and customize gstack to your workflow.

Pro patterns:

Use Conductor to run multiple features in parallel across isolated sessions.

Customize the learning system with /learn to encode your project's specific patterns.

Use /open-gstack-browser and /setup-browser-cookies to test authenticated flows.

Chain /canary after deploys to catch production issues instantly.

Run /investigate for surgical debugging without breaking unrelated code.

Use /design-shotgun as a rapid exploration tool, not just for final designs.

Prompting Tips for Non-Coders

Your skill as a gstack user comes down to how well you communicate with Claude. Here's how to get better results.

1. Be specific about the outcome, not the implementation

BAD: "Make me a React app with a Postgres database and REST API"
(You're prescribing solutions you don't understand)
GOOD: "I need a web app where restaurant owners can manage their
waitlist. Customers join via a link, owners see the queue
in real time, and customers get an SMS when it's their turn."
(You're describing the problem and the user experience)

2. Answer gstack's questions with real-world context

When /office-hours asks "who is desperate for this?", don't say "restaurant owners." Say "small restaurant owners in busy neighborhoods who currently use a paper clipboard and lose 20% of their waitlist because people leave without being called." The more specific, the better the output.

3. Challenge Claude — don't just accept everything

If Claude proposes something that doesn't feel right, push back. Say "I don't think that's right because..." or "That seems overcomplicated, can we simplify?" Claude works best when you treat it like a smart colleague, not an oracle.

4. Use these magic phrases

"Explain what you're about to do in plain English before doing it"

"What could go wrong with this approach?"

"Is there a simpler way to do this?"

"Walk me through this like I'm not a developer"

"What assumptions are you making?"

"Before you write code, summarize the plan in bullet points"

5. Iterate in small steps

Don't ask for an entire app in one message. Build one piece, verify it works (/qa), then build the next piece. Small iterations catch problems early and prevent massive rework.

Common Mistakes

Skipping /office-hours

Jumping straight to "build me X" is the #1 mistake. You end up building the wrong thing faster. The 10 minutes you spend in /office-hours save hours of rework.

Approving things you don't understand

When Claude asks "should I proceed?", never say yes to something you can't explain in your own words. Ask "explain what this means" first. There's no shame in that — it's the smart move.

Skipping /review and /qa

Code that hasn't been reviewed and tested is code you'll regret shipping. These steps exist because AI-generated code frequently has subtle bugs. Always review, always test.

Not using /careful

Without /careful, Claude can run destructive commands without warning. One accidental rm -rf can delete your entire project. Always start with /careful.

Trying to do too much at once

Building an entire app in one Claude session leads to context overload. Break your project into small features, build each one through the full Think → Ship cycle, then move to the next.

Ignoring /retro and /learn

These aren't optional nice-to-haves. They're what make gstack get better over time. Skip them and you repeat the same mistakes. Use them and each session is better than the last.

Mastery Checklist

Track your progress. Click each item as you complete it.

Foundations

Intermediate

Advanced

Pro

Prompt Builder

Describe what you want to do in plain English. Get back a ready-to-paste gstack prompt with the right slash commands in the right order.

HOW IT WORKS Bring your own Anthropic API key

This tool calls Claude directly from your browser using your Anthropic API key. Your key stays in this browser tab — it's not stored on any server and it disappears when you close the tab. Get a key at console.anthropic.com. A typical conversion costs less than a cent.

Stored only in this tab's memory. Cleared when you close the tab.

You've reached the end.

Now go use /office-hours on something real. That's where mastery begins.