It's been almost 30 years since the world was introduced to the internet through the world wide web and in this short time it has had a profound impact on our everyday lives. The way that we keep in touch with friends and family across the world has changed significantly and with it, so has the way that we use language with them.
Lots of new words have been introduced with technology, such as "unfriending" and "photobombing". We have also appropriated existing vocabulary to mean something different, such as "tablet", "wireless" or "cloud". This week new research looked into the relatively new world of stretchable words and how they have become part of our everyday communication through social media.
Stretchable words are those used to emphasise a regular word and are often used when communicating by text message or through social media. While rarely used in formal writing, many of us will have written text messages that include "hahahahaha" to emphasise that something was funny, or "gooooooaaaaaalll" when our favourite team scores in a sports game.
While we as humans may be able to guess what is meant by these words, there is generally no correct or defined spelling of stretched words, making it very difficult for artificial intelligence algorithms to be programmed to recognise them.
• Innovation Nation: Is Artificial Intelligence outsmarting NZ business?
• Fears about artificial intelligence are 'very legitimate,' Google CEO says
• Artificial intelligence is 'shockingly' racist and sexist
• An artificial-intelligence first: Voice-mimicking software reportedly used in a major theft
How can you program a computer to know that "suuuuuure" might imply sarcasm whereas "yeeeeeessss" might imply excitement? This is especially difficult when the number of uuuuu's or eeeeee's in each stretched word are often determined by how we are feeling at the time of writing.
The journal PLOS ONE has published one of the most comprehensive studies of stretchable words in social media to help create automated ways of identifying and analysing them. This new recognition method was applied to more than 100 billion tweets published over eight years to see if there was a pattern in the way that we stretch our words online.
The study identified two key ways of identifying the characteristics of stretchable words by looking at both their balance and their stretch. Balance was used to refer to the degree in which different letters were repeated. For example, when laughing online we may write "ha", or "haha" or "hahaha", where we repeat the h and the a equally. Whereas the word "no" can be emphasised by writing "noooooooo", where the balance is unequal and only the letter o is repeated more than other letters.
Stretch was used to refer to how long a word is typically stretched. Short words tended to be stretched more and people often repeated them many times, such as "hahahahaha" while other words like "huuuuuuuge" typically had just one letter repeated.
If we are ever going to get to the point where computers and artificial intelligence can understand the range of communication that people use day to day then being able to model how humans stretch words is important. This new research shows how much more work there still is to do around understanding how humans modify our language online and how few rules we apply when doing it.
Studies like this bring us closer to a world where computers can quantify and translate words, including stretched words and newly appropriated words, in a way that machines can understand, by developing tools to improve natural language processing, search engines and spam filters.
While the future linguistic trends and the effect technology will have on them is still unknown, what is apparent is that it will still take a very loooooong time for computers to understand what we mean.