Notes on a talk by Dallas Dickinson of QC Games (Developers of Breach) and Abel Matthew, CEO of Backtrace, from GDC 2019
- When you build new stuff, it crashes
- Want to prioritize crashes
- Tracking crashes both in their own stack and in Unreal
- Process (unidirectional flow here)
- Trunk development (70% of dev time spent here)
- New system dev
- Main dev branch—nightly tests
- Support branch (gone through whole QA process; 25% of dev time)
- Live bug fixing
- Emergency fixes
- Release (live game; 5% of dev time)
- Trunk development (70% of dev time spent here)
- Uses something resembling CI/CD so that problems surface quickly
- Crashes get labeled to tell you what branch & build you were running against
- Crash analysis
- Collection
- Triage
- Examine/Diagnose
- Test/treat (may need to re-triage/diagnose afterward)
- Cure
- Phantom crash: fix committed and released, but find it’s still crashing
- Dug into logs, could see the change list it was happening on, realize the fix hadn’t actually gone out
- Could find this in an hour with Backtrace
- Can create a Jira ticket directly from Backtrace; the systems sync!
- Can tag the fix as going out in a certain version, then filter on crashes you expect to go out in a version
- Infrequent crash in QA
- never crossed a must-fix threshold
- When load suddenly rose, it started happening a ton
- Could then look at past callstacks to fix it as they saw the priority bubble up
- Make sure you’re running Backtrace on debug builds too
- “Explore” view
- See how many unique users are affected by the crash
- Can dig in to the exact set of circumstances that produce the bug via filtering
- Filter by process age, or view as a histogram
- Make sure you’re logging assertion failures
- Auto-upload symbols as you build or release