Decision agent, output quality, and safer defaults

Mar 30, 2026 · Day 25

Today was primarily a code day, not a content day. The main objective was to make Ryva runs more reliable, less noisy, and more defensible under real project conditions.

The core theme: better agent internals first, then distribution.

What I shipped

Content and distribution still shipped today:

Customer-facing execution shipped today:

  • ran Ryva for a fresh repo conversation: Open run
  • reran CyberMinds after analytics updates: Open run
  • closed all pending X/LinkedIn/Reddit DMs and reply threads from yesterday
  • worked 10 targeted Reddit ICP replies/DMs and 10 targeted X replies/DMs

CyberMinds run after analytics update

Engineering deep dive

This was the highest-leverage part of the day.

1) Decision-agent pipeline refactor

I refactored the decision-agent run path to reduce brittleness and improve deterministic behavior:

  • moved to staged source compression instead of one-shot compression
  • added source-cache reuse where valid artifacts already exist
  • replaced streamed raw JSON parsing with AI SDK structured output

Why this matters:

  • streamed raw JSON parsing was fragile under partial/invalid token streams
  • structured output reduces parser errors and makes failures explicit at schema boundaries
  • staged compression improves recovery paths when one stage fails

2) Output quality hardening

I tightened recommendation quality so outputs are tied to concrete repo evidence:

  • recommendations now prioritize exact commit/file/line references
  • missing-decision detection now rejects generic standup/checklist filler
  • low-signal evidence anchors were filtered so fallback output stops over-focusing on imports/frontmatter noise

Why this matters:

  • generic insights create “agree but ignore” behavior
  • commit/file/line anchors increase operator trust and actionability
  • fallback quality is now less repetitive and less cosmetic

3) Timeline noise reduction and write discipline

I reduced recommendation spam and balanced persistence behavior:

  • collapsed repetitive recommendation writes
  • rebalanced persistence to keep:
  • one recommendation block
  • up to two missing-decision blocks

Why this matters:

  • timeline spam dilutes urgency and makes first-screen comprehension worse
  • fewer, stronger blocks improve scan speed and conversion to follow-up action

4) Snapshot/bootstrap reliability fixes

I improved how project state is initialized and loaded:

  • added GitHub snapshot auto-load on project creation
  • added auto-load on first project view
  • fixed race condition causing duplicate snapshot/context blocks

Why this matters:

  • duplicate blocks erode trust and create avoidable confusion
  • first-load reliability is a direct conversion factor in first-run experience

5) Failure handling and observability

I added more useful internal telemetry while keeping logs safe:

  • per-attempt Convex logging for synthesis/compression failures
  • logging includes model names + failure messages only (no secrets, no raw provider payload dumps)
  • stopped caching deterministic fallback compressions
  • added retry flow across stronger models when first pass fails

Why this matters:

  • observability makes failure modes debuggable without leaking sensitive content
  • stronger-model retry improves completion rate on difficult contexts
  • not caching deterministic fallback reduces stale/low-quality repeat output

6) Primary files touched

  • convex/lib/decision_agent/actionsRuntime.ts
  • convex/decisionAgentInternal.ts
  • convex/githubInternal.ts
  • src/components/project/project-page-container.tsx

7) Validation and checks

All targeted checks passed:

pnpm exec eslint convex/lib/decision_agent/actionsRuntime.ts convex/decisionAgentInternal.ts convex/githubInternal.ts src/components/project/project-page-container.tsx
pnpm exec tsc --noEmit
npx convex codegen

Security review and risk posture

Security status on today’s code changes:

  • no new authentication or authorization regression found in touched paths
  • no new input-validation regression found in touched paths
  • new logging is constrained to operational metadata and error messages

Critical existing repo-level risk (not introduced today):

  • real secrets are still present in tracked .env and .env.production files
  • .gitignore helps only for future files; it does not protect already tracked secrets

Required remediation (not auto-applied because operationally destructive):

  • rotate exposed credentials
  • remove secrets from git index/history in a coordinated rollout
  • update deployment/runtime secrets in lockstep

Product updates from direct feedback

Two major product-level insights became clearer:

  • white-glove first runs now generate replies reliably
  • the larger retention problem is second-run inevitability, not first-run acquisition

This reframes product direction:

  • first run = snapshot
  • second run = delta story (“what changed vs last run”)
  • stickiness comes from continuity, not one-time insight quality

CyberMinds remained the strongest behavior-change proof:

  • workflow moved toward GitHub Issues
  • Ryva outputs are now part of recurring review flow
  • Slack migration from WhatsApp increased operational fit for repo-linked execution context

Strategic signal today:

  • inbound from Composio co-founder context indicates Ryva is visible in agent-infra-adjacent circles
  • this is useful mainly as failure-mode learning leverage, not vanity validation

Execution and channel signal

Outreach execution today:

  • replied across all pending channels from yesterday before opening new loops
  • sent 10 high-context Reddit replies/DMs
  • sent 10 high-context X replies/DMs
  • connected with many operators on LinkedIn and crossed 600 connections

Signal quality today:

  • X reply loops continue to convert better than top-level posting
  • Reddit remains strong for pain articulation but can throttle deep thread scanning
  • best-performing ask remains repo-specific and binary: run now vs schedule short review

Analytics snapshot today:

  • Ryva: 1500+ monthly views and 400+ unique visitors
  • egeuysal.com: 2k+ views and ~600 unique visitors in under 25 days

Traffic snapshot

Personal context and consistency

After the lighter travel-day cadence, today was a full deep-work reset focused on shipping core reliability improvements. Energy was directed to internal quality, not just output volume.

The main win was treating engineering stability as the immediate PMF multiplier.

Conversion checklist result

Completed today:

  • closed warm loops across X/LinkedIn/Reddit with value-first follow-up
  • shipped core decision-agent reliability and output-quality improvements
  • reran CyberMinds after analytics implementation and captured fresh evidence
  • enforced outbound safety guardrails (public repos only, sensitive-context avoidance)
  • shipped one X, one LinkedIn, and one Reddit post for continuity

Partially complete:

  • “3 public repos run today” target landed at 2 completed runs
  • second-run conversion sequencing needs explicit productized follow-up template

Friction and risk

  • first-run quality is improving faster than second-run conversion mechanics
  • wide channel scanning can still steal time from high-intent thread follow-up
  • fallback compression can regress quality without strict evidence filtering (partially mitigated today)
  • tracked secret exposure remains a serious operational risk until rotated/removed

Numbers

  • 2 Ryva runs shared (run_Ueft0cdaAZ1I, run_0RaRpmAwKw6b)
  • 20 targeted replies/DMs total (10 Reddit + 10 X)
  • 3 posts published (X, LinkedIn, Reddit)
  • 600+ LinkedIn connections reached
  • 3 engineering checks passed (eslint, tsc --noEmit, convex codegen)
  • 4 core engineering files updated in critical run/snapshot path

Quotes of today

Indeed, that ownership part is where it gets messy fast.

Logs tell you something happened, but not always who was responsible for the decision path.

Main progress today: Ryva became materially more reliable and actionable at the code path level, and that directly supports the next PMF objective, which is converting first-run curiosity into second-run expectation. Ryva actually works now.