New breakthrough in AI for high volume unstructured text

Wed, 13th Jul 2022

FYI, this story is more than a year old

Cortical.io has announced its breakthrough prototype for classifying high volumes of unstructured text.

Classifying documents or messages constitutes one of the most fundamental Natural Language Understanding (NLU) functions for business artificial intelligence (AI). The benchmark was carried out on two similar system setups using the same, off-the-shelve, dual AMD-Epyc server hardware. The “BERT” system, a transformer-based machine learning technique for natural language processing, was augmented by a NVidia GPU. The “Semantic Folding” approach utilized a cost comparable number of Xilinx Alveo FPGA accelerator cards.

The goal of the benchmark was to compare the throughput performance of the classification-inference engine of both systems. To measure performance, Cortical.io classified sixteen different sets of data including well-known data sets such as Enron (Kaminski, Farmer, and Lokay), DBPedia, IMDb, PubMed, Reuters (R8, R52), Ohsumed, Web of Science, BBC news text and others.

Staggering results were achieved by the simultaneous application of three distinct steps:

Improving the machine learning approach by applying Semantic Folding.
Using tooling that enabled the concurrent implementation of software, hardware and networking aspects of the Semantic Folding approach.
Using the parallelism of large gate arrays, practically implemented using FPGA technology in form of COTS datacenter hardware from Xilinx.

Benchmark results
BERT implemented in Python on an AMD Epyc Milan+NVIDA GPU
Performance 0.18 MB / Sec
Acceleration 1x
Power consumption 2,260 mwh / MB
Efficiency 1x

Semantic Folding implemented in Java on an AMD Epyc Milan
Performance 18.2 MB / Sec
Acceleration 100x
Power consumption 15 mwh / MB
Efficiency 150x

Semantic Folding implemented in binary on an AMD Epyc Milan+ 4 card Xilinx FPGA
Performance 528.30 MB / Sec
Acceleration 2856x
Power consumption 0.46 mwh / MB
Efficiency 4298x

Benchmark results show that with Semantic Folding, the operations costs can be reduced from several dollars per classifier to a fraction of a cent, making large-scale classification use cases for the first time commercially viable. Example real world workloads could be hate-speech detection for nearly three billion Facebook users or content filtering the Twitter firehose for hundreds of millions of users.

“Efficiency is the new precision in Artificial Intelligence,” says Francisco Webber, CEO at Cortical.io.

“While large industries are determined to use less energy, the AI and ML industry is headed in the opposite direction: growing its carbon footprint exponentially," he says.

"The future of green computing hangs by the thread of high efficiency AI capabilities.

Cortical.io delivers highly efficient AI-based solutions that help enterprises unlock the value of unstructured text by leveraging an approach to Natural Language Understanding (NLU). Cortical.io SemanticPro is an intelligent document processing solution that accurately extracts, analyses and classifies information based on meaning and builds the basis for document workflow automation.

Cortical.io has offices in the U.S. and Europe.