Google AI Creates Its Own Language: AI New Language
Google brain neural network AI has reportedly created its own universal language, which allows the system to translate between other languages without knowin...
Skype Begins Dismantling the Language Barrier
Microsoft on Monday announced the first phase of its Skype Translator preview program, which initially will facilitate conversations between English and Spanish speakers. It will convert spoken words ...
Automatic summarization
I made this video to illustrate automatic video sengmentation and summarization, for a course called Advanced Topic in Multimedia in Eurecom (engineer scho...
Parsing
Parsing or syntactic analysis is the process of analysing a string of symbols, either in natural language or in computer languages, conforming to the rules of a formal grammar. The term parsing comes...
Parsing - Wikipedia
Word-sense disambiguation
In computational linguistics, word-sense disambiguation (WSD) is an open problem of natural language processing and ontology. WSD is identifying which sense of a word (i.e. meaning) is used in a sente...
Yarowsky algorithm
In computational linguistics the Yarowsky algorithm is an unsupervised learning algorithm for word sense disambiguation that uses the "one sense per collocation" and the "one sense per discourse" prop...
Lexicography
Lexicography is divided into two separate but equally important groups:A person devoted to lexicography is called a lexicographer.General lexicography focuses on the design, compilation, use and evalu...
Named-entity recognition
Named-entity recognition (NER) (also known as entity identification, entity chunking and entity extraction) is a subtask of information extraction that seeks to locate and classify elements in text in...
Cache language model
A cache language model is a type of statistical language model. These occur in the natural language processing subfield of computer science and assign probabilities to given sequences of words by mean...
TMG (language)
TMG is a compiler-compiler created by Robert M. McClure and presented in 1968, and implemented by Douglas McIlroy. TMG ran on systems like OS360 and early Unix. It was used to build EPL, an early vers...
Textual entailment
Textual entailment (TE) in natural language processing is a directional relation between text fragments. The relation holds whenever the truth of one text fragment follows from another text. In the TE...
Automatic summarization
Automatic summarization is the process of reducing a text document with a computer program in order to create a summary that retains the most important points of the original document. As the problem ...
Word-sense induction
In computational linguistics, word-sense induction (WSI) or discrimination is an open problem of natural language processing, which concerns the automatic identification of the senses of a word (i.e. ...
ROUGE (metric)
ROUGE, or Recall-Oriented Understudy for Gisting Evaluation, is a set of metrics and a software package used for evaluating automatic summarization and machine translation software in natural language...
Machine translation software usability
The sections below give objective criteria for evaluating the usability of machine translation software output.
Do repeated translations converge on a single expression in both languages? I.e. do...
Anaphora (linguistics)
In linguistics, anaphora /əˈnæfərə/ is the use of an expression the interpretation of which depends upon another expression in context (its antecedent or postcedent). In the sentence Sally arrived, bu...
Relationship extraction
A relationship extraction task requires the detection and classification of semantic relationship mentions within a set of artifacts, typically from text or XML documents. The task is very similar to ...
Stemming
Stemming is the term used in linguistic morphology and information retrieval to describe the process for reducing inflected (or sometimes derived) words to their word stem, base or root form—generally...
Stemming - Wikipedia
Construct (python library)
Construct is a python library for the construction and deconstruction of data structures in a declarative fashion. In this context, construction, or building, refers to the process of converting (seri...
Phrase chunking
Phrase chunking is a natural language process that separates and segments a sentence into its subconstituents, such as noun, verb, and prepositional phrases.
Collocation extraction
Collocation extraction is the task of extracting collocations automatically from a corpus using a computer.Within the area of corpus linguistics, collocation is defined as a sequence of words or terms...
Language identification
In natural language processing, language identification or language guessing is the problem of determining which natural language given content is in. Computational approaches to this problem view it ...
Irony (framework)
Irony is a parser generator framework for language implementation on the .NET platform. Unlike most existing yacc/lex-style solutions, it does not employ code generation of a scanner/parser from gramm...
Multilingual notation
A Multilingual notation is a representation in a lexical resource that allows the translation between two or more words.
For instance, within LMF, a multilingual notation could be as presented in ...
Multilingual notation - Wikipedia
Automatic acquisition of sense-tagged corpora
The knowledge acquisition bottleneck is perhaps the major impediment to solving the word sense disambiguation (WSD) problem. Unsupervised learning methods rely on knowledge about word senses, which is...
Text segmentation
Text segmentation is the process of dividing written text into meaningful units, such as words, sentences, or topics. The term applies both to mental processes used by humans when reading text, and t...
Tokenization
Tokenization may refer to:
Lexical substitution
Lexical substitution is the task of identifying a substitute for a word in the context of a clause. For instance, given the following text: "After the match, replace any remaining fluid deficit to preve...
Word sense
In linguistics, a word sense is one of the meanings of a word. A word sense may correspond to either a seme (the smallest unit of meaning) or a sememe (the next larger unit of meaning), and polysemy i...