IBM Is announcing several new IBM Watson technologies designed to help organizations begin identifying, understanding and analyzing some of the most challenging aspects of the English language with greater clarity, for greater insights.
The new technologies represent the first commercialization of key Natural Language Processing (NLP) capabilities to come from IBM Research’s Project Debater, the only AI system capable of debating humans on complex topics. For example, a new advanced sentiment analysis feature is defined to identify and analyze idioms and colloquialisms for the first time. Phrases, like ‘hardly helpful,’ or ‘hot under the collar,’ have been challenging for AI systems because they are difficult for algorithms to spot. With advanced sentiment analysis, businesses can begin analyzing such language data with Watson APIs for a more holistic understanding of their operation. Further, IBM is bringing technology from IBM Research for understanding business documents, such as PDF’s and contracts, to also add to their AI models.
“Language is a tool for expressing thought and opinion, as much as it is a tool for information,” said Rob Thomas, General Manager, IBM Data and AI. “This is why we believe that advancing our ability to capture, analyze, and understand more from language with NLP will help transform how businesses utilize their intellectual capital that is codified in data.”
Today IBM is announcing that it will integrate Project Debater technologies into Watson throughout the year, with a focus advancing clients’ ability to exploit natural language:
A. Analysis – Advanced Sentiment Analysis. IBM has enhanced sentiment analysis to be able to better identify and understand complicated word schemes like idioms (phrases and expressions) and so called, sentiment shifters, which are combinations of words that, together, take on new meaning, such as, “hardly helpful.” This technology will be integrated into Watson Natural Language Understanding this month.
B. Briefs – Summarization. This technology pulls textual data from a variety of sources to provide users with a summary of what is being said and written about a particular topic. An early version of Summarization was leveraged at The GRAMMYS this year to analyze over 18 million articles, blogs and bios to produce bite-sized insights on hundreds of GRAMMY artists and celebrities. The data was then infused into the red carpet live stream, on-demand videos and photos across www.grammy.com to give fans deeper context about the leading topics of the night. It will be added to IBM Watson Natural Language Understanding later in the year.
C. Clustering – Advanced Topic Clustering.Building on insights gained from Project Debater, new topic clustering techniques will enable users to "cluster" incoming data to create meaningful "topics" of related information, which can then be analyzed. The technique, which will be integrated into Watson Discovery later this year, will also allow subject matter experts to customize and fine-tune the topics to reflect the language of specific businesses or industries, like insurance, healthcare and manufacturing.
D. Documents – Customizable Classification of Elements in Business Documents. This technology, which will also be added to Watson Discovery later this year, enables clients to create AI models to more easily classify clauses that occur in such business documents as procurement contracts. Based on Project Debater’s deep learning-based classification technology, the new capabilities can learn from as few as several hundred samples to do new classifications quickly and easily.