Jun 22, 2022
In General Discussions
Despite all the advances made by search engines and computational linguists, unsupervised and semi-supervised approaches such as Word2Vec and Google Banner Desing Pygmalion have a number of shortcomings preventing large-scale understanding of human language. It's easy to see how these have certainly held back the progress of conversational search. Pygmalion is not scalable for internationalization Labeling training datasets with tagged Banner Desing part-of-speech annotations can be both time-consuming and costly for any organization. Also, humans are not perfect and there is room for error and disagreement. Which part of speech a particular word belongs to in a given context can lead linguists to debate among themselves for hours. The team of Google linguists (Google Pygmalion) working on Google Assistant, for example, in 2016 consisted of around 100 PhD students. linguists. In an interview with Wired Magazine, Google product manager Banner Desing David Orr explained how the company still needed its team of PhD students. linguists who label parts of speech (calling it the “gold” data), in ways that help neural networks understand how human language works. Orr said of Pygmalion: “The team covers between 20 and 30 languages. But there are hopes that companies like Google can eventually move to a more automated form of AI called “unsupervised learning.” » In 2019, the Pygmalion team was an army of 200 linguists around the world made up of a mix of permanent and agency employees, but was not without its challenges due to the laborious and daunting nature of the work of manual marking and the long hours involved. . In the same Wired article, Chris Nicholson, who is the founder of a deep learning company called Skymind, commented on the Banner Design non-scalable nature of projects like Google Pygmalion, especially from an internationalization perspective, because some voice tagging should be done. by linguists from all the languages of the world to be truly multilingual. Internationalization of conversational research The manual marking involved in Pygmalion does not seem to take into account the transferable natural phenomena of computational linguistics. For example, Zipfs law, a law of frequency power distribution, dictates that, in any given language, the frequency of distribution of a word is Banner Desing proportional to one over its rank, and this is true even for languages not yet translated. One-way nature of 'context windows' in RNNs (Recurrent Neural Networks) Training models such as Skip-gram and Continuous Bag of Words are unidirectional in the sense that the pop-up containing the Banner Desing target word and the surrounding pop-up words on the left and right only go in one direction. The words after the target word are not yet seen, so the entire context of the sentence is incomplete down to the very last word, which carries the risk of missing some contextual patterns. A good example is provided from the challenge of one-way moving pop-ups by Jacob Uszkoreit on the Google AI blog when talking about the Transformer architecture. Deciding on the most likely meaning and appropriate representation of the word “bank” in the sentence: “I came to the bank after crossing the…” requires knowing whether the sentence ends in “…road”. or “…river”. Missing text cohesion One-way training approaches prevent the presence of text cohesion. As Ludwig Wittgenstein, a philosopher said in 1953: " The meaning of a word is its use in language." (Wittgenstein, 1953) Often the tiny words and the way the words are held together are the "glue" that brings common Banner Desing sense to the language. This "glue" is collectively referred to as "text cohesion". It is the combination of entities and the different parts of speech around them phrased together in a particular order that gives a sentence structure and meaning. The order in which a word occurs in a phrase or phrase also adds to this context. Without this contextual collage of these surrounding words in the Banner Desing correct order, the word itself simply doesn't make sense. The meaning of the same word can also change as a phrase or phrase develops due to dependencies on coexisting phrase or phrase members, changing context with it. Also, linguists may disagree about the particular part of speech in a given context to which a word belongs in the first place. Take the example of the word "bucket". As humans, we can automatically visualize a bucket that can be filled with water as a "thing", but there are nuances everywhere. What if the word "bucket word" was in the sentence "He threw the bucket" or "I haven't crossed that Banner Desing off my list yet?" Suddenly, the word takes on a whole new meaning. Without the textual cohesion of the accompanying and often tiny words around “bucket,” we cannot know whether bucket refers to a water-carrying tool or a list of life goals.