Computer vision a crucial bridge between AI and human intelligence, says Roboflow CEO
The ultimate aim of modern computing advancements, such as artificial intelligence and machine learning, is to make as much of the human experience as possible programmable.
And with the advancements in generative AI being led by companies such as Roboflow Inc., we might be witnessing the maturity of computer vision and the expansion of modern software capabilities all around.
“Roboflow exists to really make the world programmable,” said Joseph Nelson (pictured), co-founder and chief executive officer of Roboflow. “And our North Star is enabling developers predominantly to build that future. But the limiting reactant is how to enable computers and machines to understand things as well as people can. And, in many ways, computer vision is that missing element that enables anything you see to become software. If software is eating the world, computer vision makes the aperture infinitely wide.”
Nelson spoke with theCUBE industry analyst John Furrier at the AWS Startup Showcase: “Top Startups Building Generative AI on AWS” event, during an exclusive broadcast on theCUBE, SiliconANGLE Media’s livestreaming studio. They discussed the current state of AI and how the playing field has advanced from just a few years ago. (* Disclosure below.)
LLMs and their impact on the AI landscape
Everyone’s talking about large language models, such as ChatGPT and Bard, and taking advantage of their vast spectrum of functions. However, even these super-capable tools have a notable deficiency, according to Nelson.
“The rise of large language models is showing what’s possible, especially with text,” he explained. “Although there’s this core missing element of understanding. The rise of large language models creates this new area of generative AI. In the context of computer vision, it is a lot of creating video and image assets and content. There’s also this whole surface area to understanding what’s already created — basically digitizing physical, real-world things.”
In essence, computer vision links virtual, AI-driven experiences to the physical ones with which we interact in our everyday lives. And mirroring these experiences will be crucial in cases such as the budding metaverse, Nelson added.
“The metaverse can’t be built if we don’t know how to mirror, create or identify the objects that we wanna interact with in our everyday lives,” he said. “Where computer vision comes to play, especially with what we’ve seen at Roboflow, is a little over 100,00 developers now have built with our tools over 10,000 pre-trained models using more than 100M labeled open-source images.”
Human intuition and decision-making, as advanced as it is, remain fallible. Generative AI, as expressed in these LLMs, imbues computers with the logic, reasoning and critical thinking to fully understand visual and auditory input cues and compensate for human shortcomings, Nelson concluded.
Computer vision today vs. a few years ago
Computer vision is used to describe a set of processes by which machines and computers are imbued with capabilities to act on visual data as effectively as humans. Typically, these capabilities have seen immense use in situations such as object identification, classification and manipulation.
“Then you have key point detection, which is where you see athletes on screen and each of their joints is outlined,” Nelson explained. “This is another more traditional type of problem in signal processing and computer vision.”
The subfield is bringing about a reimagining of what’s possible within artificial intelligence, setting the course for nano-level precision and accuracy in the carrying out of tasks. This has already occurred in the example of Rivian Automotive Inc., an electric car company and Roboflow customer.
“One of our customers Rivian, in tandem with AWS, is tackling visual quality assurance and manufacturing in production processes,” Nelson explained. “Now, only Rivian knows what a Rivian is supposed to look like. Only they know the imagery of what their goods that are gonna be produced are. And then between those long tails of proprietary data with highly specific things in the center of the curve, you have a whole kind of messy middle type of problem.”
ML model requirements are only going to become even more complex. And as that happens, companies are going to rely on techniques like computer vision to efficiently and effectively feed those models with the most important resource of all, data.
“My mental model for how computer vision advances is this: You have that bell curve, and you have increasingly powerful models that eat outward,” Nelson stated. “And multimodality has a role to play in that; larger models also have a role to play in that. The existence of more compute and data also has a role to play in that.”
Here’s the complete video interview, part of SiliconANGLE’s and theCUBE’s coverage of the AWS Startup Showcase: “Top Startups Building Generative AI on AWS” event:
(* Disclosure: Roboflow Inc. sponsored this segment of theCUBE. Neither Roboflow nor other sponsors have editorial control over content on theCUBE or SiliconANGLE.)
Photo: SiliconANGLE
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU