Thursday, October 29, 2020

Untether AI leverages at-memory computation for inference processing

Untether AI, a start-up based in Toronto, introduced its "tsunAImi" accelerator cards are powered by four of its own runAI200 processors, which feature a unique at-memory compute architecture that aims to re-think how computation for machine learning is accomplished. The company says that 90 percent of the energy for AI workloads in current processing architectures is consumed by data movement, transferring the weights and activations between external memory, on-chip caches, and finally to the computing element itself. 

Untether AI says it is able to deliver two PetaOperations per second (POPs) in its new standard PCI-Express cards -- more than two times any currently announced PCIe cards, which translates into over 80,000 frames per second of ResNet-50 v 1.5 throughput at batch=1, three times the throughput of its nearest competitor. For natural language processing, tsunAImi accelerator cards are rated at more than 12,000 queries per second (qps) of BERT-base, four times faster than any announced product.

“For AI inference in cloud and datacenters, compute density is king. Untether AI is ushering in the PetaOps era to accelerate AI inference workloads at scale with unprecedented efficiency,” said Arun Iyengar, CEO of Untether AI.

“When we founded Untether AI, our laser focus was unlocking the potential of scalable AI, by delivering more efficient neural network compute,” said Martin Snelgrove, co-founder and CTO of Untether AI. “We are gratified to see our technology come to fruition.”

The imAIgine SDK is currently in Early Access (EA) with select customers and partners. The tsunAImi accelerator card is sampling now and will be commercially available in 1Q2021.

Untether AI is funded by Radical Ventures and Intel Capital.