Notes on a roundtable talk at GDC 2019
- Again, see autotestingroundtable.com
- How do you define a sufficient level of test coverage?
- Aim to cover 95% of the weight of the code based on what gets run/used the most
- Cover the hotspots/inner loops
- Add tests as a part of every refactoring, bug fix
- Best place to start is to figure out your current coverage—try to correlate that with number of bugs coming out of each subsystem
- In the areas that are producing bugs, improve your coverage
- Can go low on coverage, but high on assertions
- Run auto-play overnight and find all the assertion failures
- Aim to cover 95% of the weight of the code based on what gets run/used the most
- How to change company culture to write more tests
- Show success stories: such-and-such system has heavy tests and never has any bugs
- Easier to get people to test new code rather than going through old stuff
- Make adding a test to verify a fix a prerequisite to committing
- Practical tips for testing against client/server setups
- See talk “Automated Testing of Gameplay Features in Sea of Thieves” by Robert Masella
- Can spin up the server and client process on the same machine, pass control between them
- Alternatively, mock the input/output on one side
- Can just run AI in networked mode to run overnight, see what fails
- Can you record the I/O from the server for replay?
- Tips for structure code for testability
- Bringing dependency injection in helps
- Writing testable code is an education problem—read Working Effectively with Legacy Code
- Even Google, with their very strong testing culture, has a constant investment in improving education around testing
- Code reviews go a long way toward making this part of the culture
- Start of the code review: Show me the tests!
- Let junior devs review code from seniors (educational—helps trickle down experience)
- Improve testing by modularizing the components, so that you can mock out both sides of the communication
- Google Testing blog is good for education
- At what point is a unit test too trivial? (Testing by rote—is it worth testing “plumbing” code?)
- Purpose of tests can go beyond catching bugs—”living documentation”/specification explaining what the system is doing
- Can you find a more broad test that covers this?
- Keep the tests, but maybe don’t run them all all the time
- What’s the cost of running the test?
- What would be the business cost of this failing?
- Testing against a number of hardware permutations
- Screenshot tests: it’s hard to know whether the screenshot is okay/expected for this GPU?
- Can improve things by giving your QA multiple configurations, make sure it looks okay as they change configs
- Can do “non-deterministic” screenshot tests for cases where you can’t automate it: have a human tester look through the results once a week
- Get a quantitative measure of how different two screenshots are
- On mobile, make sure you talk to the community team to find out what actual chipsets you have problems with
- May be getting an outsized number of issues from particular hardware—go buy it
- Property-based testing or fuzzing for games
- Improve run-time of tests (or stop people from skipping test)
- Have a server run the tests, not individual devs
- Compile tests (run automatically as you build)
- Commit tests (run when you merge)
- Nightly tests (may be very long running)
- Gate all pull requests on having been tested
- Make the people who skip the tests fix the bugs
- Figure out why the are tests slow
- Devs don’t treat tests like production code (don’t pay attention to runtime)
- Have a server run the tests, not individual devs
- UI visual regression testing
- Use screenshots with image detection scripts just like you would a rendering engine
- Versioning test results (comparing images—how do you decide the “master”?)
- Test runner automatically keeps artifacts
- Don’t bother storing images if they were a 100% pass
- All pass/fail results for all tests get recorded in a database
- Allows you to do things like skip tests that have never failed, query flaky tests, etc.