Skip to content

Best Practices

Tips for designing effective scenarios and troubleshooting common issues.

Scenario Design

Start with the End in Mind

Before building, define:

  1. Learning objective: What should participants be able to do after?
  2. Prerequisites: What should they already know?
  3. Time estimate: How long should it take?

Keep It Focused

DoDon't
One clear objective per scenarioMultiple unrelated topics
3-5 components max for beginnersOverwhelm with infrastructure
15-30 minute completion timeMulti-hour marathons

Progressive Complexity

Structure scenarios from simple to complex:

Beginner:     1 component, basic operations

Intermediate: 2-3 components, interactions

Advanced:     Full stack, complex workflows

Real-World Relevance

Use realistic scenarios:

  • Actual use cases from production
  • Industry-standard configurations
  • Meaningful sample data

Instruction Writing

The WHAT-WHY-HOW Pattern

For each instruction:

markdown
## Create a Kafka Topic

**What:** Create a topic called `orders` with 3 partitions.

**Why:** Partitions allow parallel processing. Three partitions
let us scale to three consumers.

**How:** Open the [Kafka terminal](tab:kafka) and run:

\`\`\`bash
kafka-topics --create --topic orders \
  --bootstrap-server localhost:9092 \
  --partitions 3
\`\`\`

**Verify:** List topics to confirm creation:

\`\`\`bash
kafka-topics --list --bootstrap-server localhost:9092
\`\`\`

You should see `orders` in the output.

Show Expected Output

Always tell participants what success looks like:

markdown
You should see output similar to:

\`\`\`
Created topic orders.
\`\`\`

Include Checkpoints

Add verification steps throughout:

markdown
**Checkpoint:** Before continuing, verify:
- [ ] Topic `orders` exists
- [ ] You can produce a test message
- [ ] Consumer receives the message

Anticipate Mistakes

Add troubleshooting tips where errors are common:

markdown
::: tip Troubleshooting
If you see "Connection refused", wait 30 seconds for Kafka
to finish starting, then try again.
:::

Resource Management

Right-Size Components

Start minimal, increase if needed:

ComponentStart WithIncrease If
PostgreSQL250m / 512MiQuery timeouts
Kafka500m / 1GiSlow message processing
Elasticsearch500m / 1GiSearch timeouts
Simple app100m / 128MiOOM kills

Calculate Total Resources

Sum all components and add 20% headroom:

PostgreSQL:    250m / 512Mi
Kafka:         500m / 1Gi
API:           200m / 256Mi
─────────────────────────────
Subtotal:      950m / 1792Mi
+ 20%:         190m / 360Mi
─────────────────────────────
Request:       1140m / 2152Mi

Watch for Resource Contention

If scenarios fail under load:

  1. Check if pods are being OOM-killed
  2. Look for CPU throttling
  3. Consider reducing replica counts

Component Configuration

Use Clear Labels

Labels appear in UI and instructions:

GoodBad
postgresdb1
kafkacomponent-2
api-servermy-app

Set Appropriate Start Order

Order 0:  Databases (postgres, redis)
Order 1:  Message brokers (kafka)
Order 2:  Backend services (api)
Order 3:  Frontend (web)

Configure Health Checks

Ensure components report ready status correctly:

  • Helm charts usually include probes
  • Custom apps need explicit configuration

Testing

Test the Happy Path

Run through your scenario as a participant:

  1. Follow every instruction literally
  2. Copy-paste all commands
  3. Verify all expected outputs

Test Error Recovery

Intentionally break things:

  1. Run commands out of order
  2. Make typos in commands
  3. Skip steps and see what happens

Test Time Limits

Ensure the scenario fits the TTL:

  1. Time yourself completing the lab
  2. Add buffer for slower participants
  3. Set TTL to 1.5x your completion time

Test with Fresh Eyes

Have someone unfamiliar with the topic:

  1. Follow your instructions
  2. Note where they get confused
  3. Refine based on feedback

Common Patterns

Database Initialization

markdown
Wait for PostgreSQL to be ready, then create the schema:

\`\`\`bash
# Wait for database
until pg_isready; do sleep 1; done

# Create tables
psql -U postgres -d mydb -f /scripts/schema.sql
\`\`\`

Service Dependencies

markdown
Before starting the API, verify Kafka is ready:

\`\`\`bash
kafka-topics --list --bootstrap-server kafka:9092
\`\`\`

If this command succeeds, Kafka is ready.

Data Flow Verification

markdown
Let's trace a message through the system:

1. **Produce** a message:
   \`\`\`bash
   echo "test" | kafka-console-producer --topic orders \
     --bootstrap-server kafka:9092
   \`\`\`

2. **Consume** to verify:
   \`\`\`bash
   kafka-console-consumer --topic orders \
     --from-beginning --max-messages 1 \
     --bootstrap-server kafka:9092
   \`\`\`

You should see `test` in the output.

Troubleshooting Guide

Scenario Won't Start

SymptomLikely CauseSolution
Stuck on "Provisioning"Resource quota exceededReduce component resources
Pod stays "Pending"No available nodesCheck cluster capacity
Multiple pods failingShared dependency issueCheck start order

Component Issues

SymptomLikely CauseSolution
ImagePullBackOffImage doesn't existVerify image name and tag
CrashLoopBackOffContainer keeps crashingCheck logs for errors
OOMKilledOut of memoryIncrease memory limit
EvictedNode pressureReduce resource usage

Connectivity Issues

SymptomLikely CauseSolution
"Connection refused"Service not readyWait, check readiness
"Name not resolved"Wrong hostnameUse component label
"Timeout"Firewall/networkCheck service ports

Instruction Issues

SymptomLikely CauseSolution
Tab link doesn't workWrong labelMatch component label exactly
Code not highlightedMissing languageAdd language to fence
Images not showingWrong pathUse /images/filename.png

Studio Issues

SymptomLikely CauseSolution
Canvas won't loadLarge scenarioReduce components
Auto-save failingNetwork issueCheck connection
Changes not reflectingCache issueRefresh browser

Anti-Patterns to Avoid

Don't: Assume Prior Knowledge

markdown
❌ Bad: "Configure Kafka as usual"
✅ Good: "Set the following Kafka configuration..."

Don't: Skip Verification Steps

markdown
❌ Bad: "Create the topic and continue"
✅ Good: "Create the topic, then verify with..."

Don't: Use Hardcoded Values

markdown
❌ Bad: "Connect to 192.168.1.100:9092"
✅ Good: "Connect to kafka:9092"

Don't: Forget Cleanup Instructions

markdown
❌ Bad: (scenario ends abruptly)
✅ Good: "Your lab will automatically clean up in X minutes.
         To stop early, click Stop Lab."

Don't: Over-Engineer

markdown
❌ Bad: 15 components for a basic demo
✅ Good: Minimum viable infrastructure

Checklist Before Publishing

  • [ ] All instructions tested step-by-step
  • [ ] Expected outputs documented
  • [ ] Tab navigation links work
  • [ ] Code blocks have language hints
  • [ ] Resources calculated and appropriate
  • [ ] Start order configured correctly
  • [ ] TTL matches expected completion time
  • [ ] Troubleshooting tips included
  • [ ] Scenario has clear name and description

Getting Help

If you're stuck:

  1. Check component logs in the Interfaces view
  2. Review pod events for deployment issues
  3. Test components individually to isolate problems
  4. Ask in your organization for scenario review

Released under the MIT License.