Microsoft’s new image-captioning AI will help accessibility in Word, Outlook, and beyond

Published on October 14, 2020Updated on October 14, 2020

Microsoft has developed a new image-captioning algorithm that exceeds human accuracy in certain limited tests. The AI system has been used to update the company’s assistant app for the visually impaired, Seeing AI, and will soon be incorporated into other Microsoft products like Word, Outlook, and PowerPoint. There, it will be used for tasks like creating alt-text for images — a function that’s particularly important for increasing accessibility.

“Ideally, everyone would include alt text for all images in documents, on the web, in social media — as this enables people who are blind to access the content and participate in the conversation,” said Saqib Shaikh, a software engineering manager with Microsoft’s AI team in a press statement. “But, alas, people don’t. So, there are several apps that use image captioning as way to fill in alt text when it’s missing.”

These apps include Microsoft’s own Seeing AI, which the company first released in 2017. Seeing AI uses computer vision to describe the world as seen through a smartphone camera for the visually impaired. It can identify household items, read and scan text, describe scenes, and even identify friends. It can also be used to describe images in other apps, including email clients, social media apps, and messaging apps like WhatsApp.

Microsoft does not disclose user numbers for Seeing AI, but Eric Boyd, corporate vice president of Azure AI, told The Verge the software is “one of the leading apps for people who are blind or have low vision.” Seeing AI has been voted best app or best assistive app three years in a row by AppleVis, a community of blind and low-vision iOS users.

Microsoft’s new image-captioning algorithm will improve the performance of Seeing AI significantly, as it’s able to not only identify objects but also more precisely describe the relationship between them. So, the algorithm can look at a picture and not just say what items and objects it contains (e.g., “a person, a chair, an accordion”) but how they are interacting (e.g., “a person is sitting on a chair and playing an accordion”). Microsoft says the algorithm is twice as good as its previous image-captioning system, in use since 2015.

The algorithm, which was described in a pre-print paper published in September, achieved the highest ever scores on an image-captioning benchmark known as “nocaps.” This is an industry-leading scoreboard for image captioning, though it has its own constraints.

The nocaps benchmark consists of more than 166,000 human-generated captions describing some 15,100 images taken from the Open Images Dataset. These images span a range of scenarios, from sports to holiday snaps to food photography and more. (You can get an idea of the mixture of images and captions by exploring the nocaps dataset here or looking at the gallery below.) Algorithms are tested on their ability to create captions for these pictures that match those from humans.

It’s important to note, though, that the nocaps benchmarks capture only a tiny sliver of the complexity of image captioning as a general task. Although Microsoft claims in a press release that its new algorithm “describes images as well as people do,” this is only true insomuch as it applies to a very small subset of images contained within nocaps.

As Harsh Agrawal, one of the creators of the benchmark, told The Verge over email: “Surpassing human performance on nocaps is not an indicator that image captioning is a solved problem.” Argawal noted that the metrics used to evaluate performance on nocaps “only roughly correlate with human preferences” and that the benchmark itself “only covers a small percentage of all the possible visual concepts.”

“As with most benchmarks, [the] nocaps benchmark is only a rough indicator of the models’ performance on the task,” said Argawal. “Surpassing human performance on nocaps by no means indicates that AI systems surpass humans on image comprehension.”

This problem — assuming that performance on a specific benchmark can be extrapolated as performance on the underlying task more generally — is a common one when it comes to exaggerating the ability of AI. Indeed, Microsoft has been criticized by researchers in the past for making similar claims about its algorithms’ ability to comprehend the written word.

Nevertheless, image captioning is a task that has seen huge improvements in recent years thanks to artificial intelligence, and Microsoft’s algorithms are certainly state-of-the-art. In addition to being integrated into Word, Outlook, and PowerPoint, the image-captioning AI will also be available as a standalone model via Microsoft’s cloud and AI platform Azure.

Post Views: 296

Tags: Artificial, Intelligence, Microsoft, Tech

Apple’s revived MagSafe charging standard opens the door for a portless iPhone

OnePlus 8T announced with 65W fast charging and a 120Hz display for $749

Microsoft’s new image-captioning AI will help accessibility in Word, Outlook, and beyond

Latest News

Google Pixel foldable and a ‘Pro’ tablet hinted at in Android 13 code

Brand new iPhone features that Android already has

Google’s HD Chromecast will probably look a lot like the 4K one

How to uninstall apps from a Chromebook

Film

Every Marvel and Lucasfilm trailer Disney released today

Every Marvel and Lucasfilm trailer Disney released at D23

The Mandalorian season 3 teaser trailer collects all of our old friends

Cars

Ford will update BlueCruise driver assist with hands-free lane changing

EV startup Bollinger acquired by Mullen Automotive for $148.2 million

Rivian and Mercedes-Benz partner up for electric vans

Artificial Intelligence

DALL-E can now help you imagine what’s outside the frame of famous paintings

Professional AI whisperers have launched a marketplace for DALL-E prompts

An AI-generated artwork’s state fair victory fuels arguments over ‘what art is’

SpaceX

Apple’s Emergency SOS link puts it into the satellite fight with SpaceX and more

How Apple’s iPhone 14 satellite link puts it up against SpaceX and others

NASA eyes late September for another Artemis I launch attempt

You May Also Like

Employees at Activision studio Raven Software formally organize union

The NBA is using Microsoft Teams to bring basketball fans courtside

GitHub’s automatic coding tool rests on untested legal ground

Microsoft Outlook for iOS now lets you use your voice to write emails and more

How to take screenshots on your Windows 11 PC