Since the @voice message only appears once in however-long period of time, sometimes not in the same scene or even the next one, then willfully listening for the tone, quality, gender and accent of someone's @voice seems like a thing which should be possible.
Maybe it could be an argument to "speaking".
> speaking NinjaBob
could either immediately tell you that Bob "*speaks English, in a mall-ninja poseur voice*" or else just cue up the voice message next time Bob speaks out loud within your immediate earshot.