Clinc Conversational UX

Notes on a talk from Dr. Jason Mars of Clinc

  • “The new frontier of free-form, voice enabled gaming”
  • Goal: Reduce complexity of UI interactions
  • Voice UI has been “one shot”—no context, dumb single commands
  • Clinc rejects computational linguistics in favor of recurrent neural nets
    • No parts of speech trees, so that you don’t have to encode what it should understand ahead of time
    • Goal is to understand language like a human does (including previously unknown words and phrases)
    • Goal is to extract key semantic features without parsing like traditional bottom-up methods
  • Live demo
    • Pretty good, but has problems with text to speech, and understanding context is hit or miss
    • Demoer mmmmmay be adding more context to the speech than a real user would
    • Conversations are less constrained, less linear than demos from Google and the like
    • Looks like it’s 70 or 80% of the way to perfect
  • Use cases for us:
    • Control a copilot
    • Interact with ATC
    • Virtual assistant for onboarding
  • Multilanguage is “free” as long as you have training data for that language
  • Demo in Booth P1657
  • Runs on-premises… can even run inference on an Arduino (not training, obviously)
  • Internal model
    • Has conversational flow, each state of which is represented by a “competency” (a thing it can do), like add something to your cart or confirm a transaction
      • This is “a thing the system knows how to do”
      • These are all stateful—doesn’t matter what order it gets the info it needs
      • Can have actions attached to each state transition dependent on what information you’ve gotten so far
  • Speech recognition: the model is trained on text, so you first need speech to text; they integrate into whatever speech to text is already available on the client side (usually the OS provided one)

