Automated Testing of Gameplay Features in “Sea of Thieves”

Notes on a talk by Robert Masella (@ZipLockBagMan) of Rare/Microsoft Studios

  • Wanted automated testing on all parts of the code, incl. (notoriously difficult to test) gameplay features
  • Challenges
    • Complexity from open world design
    • Constantly evolving game-as-a-service
    • Wanted to cut releases in a week… can’t spend weeks on testing
  • Why use automated testing
    • Manual testing slow, unreliable
    • Subtle bug in AI behavior gets missed by dev and by manual testers; weeks later players notice the bug, and an engineer has to be pulled off their work to find and fix it… then the testers have to spend time verifying the fix
      • What if the bug reoccurs later??
      • Want to check for bugs like this regularly… AI behavior in open areas versus crowded places with lots of cover (so that devs don’t miss a case)
    • Automated testing is precise, faster than manual, and can test the game at different levels
      • Humans can only “eyeball” the game, not verify internal state
    • Humans are better at:
      • visual & audio bugs
      • exploratory testing
      • assessing the experience
  • How the testing framework works
    • Built on Unreal Engine automation system
      • Go to automation tab, select a test you want to fun, it’ll do it and give you a pass/fail result
      • Very useful for test team
    • Unit tests (also registered with GUI automation system!)
      • Given unit tests that cover all aspects of the game, you’re in good shape
    • Integration tests (cover communication between units)
      • If unit tests pass, but integration tests fail, issue might be with art assets, or interactions between units
      • Created as various test maps within the editor
        • Don’t want to load up the whole world if you can avoid it
      • Logic written in Unreal Blueprint system
        • Better than writing tests in code
        • Begin > Delay 2 seconds > Set actor rotation to zero, 
        • Steps can wait until a particular condition is true
        • E.g., test player turning the ship’s wheel
          • Test map is just a player standing on a platform with a wheel in front of them (no ship, no world)
          • Have the player apply an input, assert that the wheel angle has changed (within some tolerance)
      • Make sure you don’t rely too much on the implementation of the feature
        • E.g., Negative input on character input handler > Input handler sends input to wheel > wheel applies input to wheel mesh angle
        • Make sure you won’t fail if the code changes and you delay applying the input change (e.g., for an animation)
          • Don’t want to inspect the animation in your test… too much coupling
        • Don’t want to delay too much (e.g., fixed time)
        • Instead, use “delay until” the wheel gets rotated enough—times out after x seconds
          • Safe to set the timeouts high because they shouldn’t be hit often
    • Networked integration testing
      • Wanted integration tests to be able to transfer control between clients and servers (wow!)
      • Example:
        • Test begins on the server (set up, handshake clients, etc.)
        • Switch over to client and ensure that the setup is correct
        • Sends a message back to server
        • Server transfers control to client #2 to verify that it also sees the same state
      • Other test types
        • Asset audit—check correct setup on assets
        • Screenshot—do visual comparison of levels
        • Performance—collect perf data to spot trends or spikes
        • Bootflow—check communication between client, server, and services
    • Testing infrastructure
      • See “Adopting Continuous Delivery” talk from Jafar Soltani for more
      • Tests run as part of build system
      • Each test runs once every 20 mins
      • Test failure causes “red” build
        • Contact the people likely responsible
      • Merge process
        • Can only merge if the build of master is currently “green”
        • Your change must have reasonable test coverage
          • Engineers themselves responsible for writing the tests
        • Complete a pre-merge first that runs all tests related to this change (don’t have to run all tests necessarily)
      • Process
        • Dev makes a change locally
        • Runs change through pre-merge tests (tests related to the change)
        • Dev submits change
        • Build system runs extensive automated tests of the release branch
        • Build system creates build
        • Manual testers check the latest build of the release branch
        • Build system creates a game update
        • “Insider” players check how game plays with new features
        • Players play game update
  • Problems they ran into
    • Slow to run, slow to create, can be unreliable
    • All issues made way worse for integration tests than unit tests—e.g., 0.1 seconds versus 20 seconds
      • Unit tests don’t have dependencies on Unreal or the game itself
      • Integration tests need everything though
      • The more things the test depends on, the slower it is to run
      • Gameplay tests are the worst offenders here, where they depend on everything
        • Created new kind of test: “Actor tests” (use Unreal’s Actor object)
          • Run on the most minimal world possible
          • Basically just big enough to fit the player and the thing they’re interacting with
        • Essentially a unit test for Unreal game code—actors and components were the primary things under test
          • Super useful for testing the logic of those components (despite not testing “the whole world”)
        • Example:
          • Skeletons become easier to kill during the day than at night (two different states)
          • Actor test (code looks like a unit test):
            1. Spawn shadow skeleton
            2. Set state to night mode
            3. Set game world time to midday
            4. Tick the skeleton 1 frame forward
              • Wouldn’t normally do this, but you want to avoid doing it the “right way” (having the engine tick everything forward) for the sake of speed
            5. Confirm that the skeleton’s state is now day
      • When to use actor tests versus integration tests?
        • Use integration tests for “golden path” (assume everything works)
        • Use actor tests to test all the failure modes
        • Example: giving an item to another player
          • Actor:
            1. Player can’t give the item due to having no item
            2. Player can’t give this particular type of item
            3. Etc.
          • Integration: Player successfully gives the item to another player
        • Wound up with 12-to-1 ratio of actor to integration tests
      • Combine integration tests to test multiple things at once
        • E.g., all 3 skeleton attacks in sequence
    • Transition the same player to new level for each test, rather than reloading the complete state of the world for every test
      • Forces you to fix bugs where state leaks between level loads
    • Fixing intermittent test failures
      • Some level of flakiness is inevitable
      • Google testing blog: Almost 16% of our tests have some level of flakiness associated with them
      • Investigate and fix the causes
        • …but don’t stop all production when the framework finds one of these
      • Have the build system automatically retry failing tests
        • Keep a record of intermittent failures, though, and have engineers look into the worst offenders each week
    • Consistently failing tests
      • Probably badly written
      • Can’t be trusted—worse than having no test at all!
      • “Quarantine” these by continue running them, but not failing a build for it
      • Contact the responsible engineer, ask them to fix it—if they don’t do so in a timely manner, trash it
  • Breakdown of tests
    • 70% of tests were “actor” tests
    • 23% unit tests
    • 5% integration (about half of these network tests)
    • Total of 23,000 tests
  • Benefits of testing
    • Extra build confidence
    • Reduced time to verify build (1.5 days versus 10 days for the previous game)
    • Reduced manual testing team (from 50 people on previous game down to 17)
    • Can use QA team more effectively, for things humans are uniquely good at
    • Very low bug count
      • About 150 open bugs at any given time, compared to over 2000 for previous games
      • That’s a sustainable number, rather than having to panic as you near release
    • No build-up of bugs prior to release
      • Reduced crunch
  • Lessons learned
    • Team buy-in is important for developer velocity concerns
      • Time it takes to write tests is the primary concern
      • Counterargument: might spend the same amount of time or less time, because devs will prevent bugs (and prevent their reoccurrence)
    • Allow time for building testing knowledge and making the infrastructure robust
    • Start testing on a small part of the project, build up over time
    • Iterative development and testing don’t mix
      • If you’re still checking to “find the fun” you’ll have to constantly rework the tests
      • Prototype branch for working on new feature doesn’t have test requirements
      • DIdn’t do TDD; instead, engineer would make a change, then write a test to pin down what they’ve done to make sure it doesn’t break in the future
      • Pragmatism is important; perfect testing is impossible
        • Do what works for you
        • Testing has a cost, so you can’t test everything
          • If it’s trivial, don’t test it
          • If it’s going to be hard to maintain, don’t do it
          • Concentrate testing on code with complicated logic
  • Mocking
    • Did a bit of mocking for network stuff in the actor tests (fake replication across the network—sending information from the client & server)
  • To decide which tests are affected by your code change, can look at past test coverage reports to see which test cover the files you changed

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s