Explore Computer Vision APIs

Back to WWDC 2020

Explore Computer Vision APIs

Learn how to bring Computer Vision intelligence to your app when you combine the power of Core Image, Vision, and Core ML. Go beyond machine learning alone and gain a deeper understanding of images and video. Discover new APIs in Core Image and Vision to bring Computer Vision to your application like new thresholding filters as well as Contour Detection and Optical Flow. And consider ways to use Core Image for preprocessing and visualization of these results. To learn more about the underlying frameworks see "Vision Framework: Building on Core ML" and "Core Image: Performance, Prototyping, and Python." And to further explore Computer Vision APIs, be sure to check out the "Detect Body and Hand Pose with Vision" and "Explore the Action & Vision app" sessions.

Resources
Related Videos

WWDC 2021
- Extract document data using Vision
WWDC 2020
WWDC 2019
- Text Recognition in Vision Framework
- Understanding Images in Vision Framework
WWDC 2018
- Object Tracking in Vision
- Vision with Core ML
WWDC 2017
- Advances in Core Image: Filters, Metal, Vision, and More
- Vision Framework: Building on Core ML
Download

Hello and welcome to WWDC.
Welcome to WWDC. My name is Frank Doepke and together with my colleague David Hayward, we're going to explore Computer Vision APIs.
So why would you talk about Computer Vision? Computer Vision can really enhance your application. And even if it's not at the core of your business, it really brings something new to your application.
Let me give you an example. Banking applications allow you to deposit checks. They use Computer Vision for the camera to actually read the check for you, so you don't have to type in the information anymore. And clearly Computer Vision is not at the core of the banking industry. But by doing this you really can save a lot of steps for your user. They don't have to type anything in anymore.
Another thing might be that you want to just, for instance, read a QR code, or when you read a receipt. All of that may not be at the core of what you wanna do for your application, but it really makes it much easier for your users to do this by using the camera. So what APIs do we have available for Computer Vision? At the most high level part, we have VisionKit. It's the home of the VNDocumentCamera that you might have seen in Notes, or in Messages, or Mail to actually scan the document. Then we use Core Image to actually do the image processing of images, Vision for the analysis of images, and last but not least, Core ML to do the machine learning inference. Today we're gonna focus just on Core Image and Vision. But I wanna make sure that you don't just think of them as pillars that stand side by side. They can actually be nicely intertwined. I might actually want to do some image preprocessing, run it into Vision, take the results from there, feed them into Core ML, or back into Core Image to create some of the effects. Now to talk about how we want to use Core Image to preprocess images for Computer Vision, I would like to hand it over to my colleague David Hayward.
Thank you, Frank. I'd like to take this opportunity to describe how you can improve your Computer Vision algorithms using Core Image.
If you are unfamiliar with Core Image, it is an optimized, easy-to-use image processing framework built upon Metal. For a deep dive on how it works, I recommend you watch our WWDC 2017 presentation on the subject.
There are two primary reasons why your app should use Core Image with Vision.
Using Core Image to preprocess an input to Vision can make your algorithms faster and more robust.
Using Core Image to post-process the outputs from Vision can give your app new ways to show those results to your users.
Also, Core Image is a great tool to do Augmentation for Machine Learning training. There's some great examples of this in our presentation from WWDC in 2018.
One of the best ways to prepare an image for analysis is to downscale it for best performance. The scaler with the best overall quality is CILanczosScale.
It is very easy to use this filter in your code. All you need to do is import the CIFilterBuiltins header, create a filter instance, set the input properties, and then get the outputImage. It's that easy.
But that is just one of several resampling filters in Core Image. Depending on your algorithm, it may be better to use the linear interpolated CIAffineTransform.
Morphology operations are a great technique to make small features in your image more prominent.
Performing Dilate using CIMorphologyRectangleMaximum will make brighter areas of the image larger.
Performing Erode using CIMorphologyRectangleMinimum will make those areas smaller.
Better still, is to perform Close using CIMorphologyRectangleMinimum followed by CIMorphologyRectangleMaximum. And this is very useful for removing small areas of noise from your image that may affect the algorithm.
Some algorithms only need monochrome inputs, and for these, Vision will automatically convert RGB to grayscale. If you have domain knowledge about your input images, you might get better results using Core Image to convert to gray.
With CIColorMatrix you can specify any weighting you want for this conversion.
Or with CIMaximumComponent, the channel with the greatest signal will be used.
Noise reductions before image analysis is also worth consideration.
A couple passes of CIMedianFilter can reduce noise without softening the edges.
CIGaussianBlur and CIBoxBlur are also a fast way to reduce noise.
And consider using the CINoiseReduction filter too.
Core Image also has a variety of edge detection filters.
For a Sobel edge detector, you can use CIConvolution3X3.
Even better is to use CIGaborGradients, which will produce a 2D gradient vector that is also more tolerant of noise.
Enhancing the contrast of an image can aid in object detection.
CIColorPolynomial allows you to specify an arbitrary 3rd degree contrast function. CIColorControls provides a linear contrast parameter.
Core Image also has some new filters this year that can convert your image to just black and white.
For example, CIColorThreshold allows you to set the threshold value in your application code, while CIColorThresholdOtsu will automatically determine the best threshold value based on the image's histogram.
Core Image also has filters for comparing two images. This can be useful to prepare for detecting motion between frames of video.
For example, CIColorAbsoluteDifference is a new filter this year that can help with this.
Also, the CILabDeltaE will compare two images using a formula designed to match human perception of color.
These are just a sampling of the more than 200 filters built into Core Image.
To help you use these built-in filters, this documentation includes parameter descriptions, sample images, and even sample code.
And if none of these filters suit your needs, then you can easily write your own using Metal Core Image. And we recommend that you see our session on that that we also made available this year.
With image processing and Computer Vision, it is important to be aware that images can come in a wide variety of color spaces.
Your app may receive images in spaces ranging from the traditional sRGB, to wide gamut P3, even to HDR color spaces, which are now supported.
Your app should be prepared for this variety of color spaces, and the good news is that Core Image makes this very easy. Core Image automatically converts inputs to its working space, which is Unclamped, Linear, BT.709 primaries.
Your algorithm might want images in a different color space though. In that case, you should do the following. You will want to get a variable for the color space that you want to use from CGColorSpace. And you will call image.matchedFromWorkingSpace.
Apply your algorithm in that space, and then call image.matchedToWorkingSpace. That's all you need to do. My last topic today will be using Core Image to post-process the outputs from Vision. One example of this is using Core Image to regenerate a barcodeImage from a Vision BarcodeObservation.
All you need to do in your code is create the filter instance... set its barcodeDescriptor property to be that of the Vision observation, and lastly, get the outputImage. And the result looks just like this.
Similarly, your app can apply filters based on Vision face observations.
As an example, you can use a vignette effect very easily using this.
The code is actually very simple. One thing you need to be aware of is that you will need to convert from Vision's normalized coordinate system to Core Image's Cartesian coordinate system.
And once you create the vignette filter, you can then put that vignette over the image using compositing over.
You can also use Core Image to visualize vector fields, which Frank will be demonstrating later on.
That concludes my part of this presentation. Here's Frank to talk more about Vision.
All right. Thank you, David. So, now I'm gonna talk about how we can understand images by using Vision.
We have a task, the machinery, and the results. The task is what you wanna do. The machinery is what actually performs the work. And the results is, of course, what you're looking for-- what you want to get back. The task could be in our compiler, the VNRequests. Like a VNDetectFaceRectanglesRequest. The machinery is one of two. We have an ImageRequestHandler or a SequenceRequestHandler. And the results that we get back is what we call VNObservation. And these depend on which task you performed, like a VNRectangleObservation for detected rectangles.
We first perform the request on the ImageRequestHandler. And from there, we get our observations. Let's look at a concrete example.
We want to read text, so we use the VNRecognizeTextRequest.
Then I create an ImageRequestHandler with my image.
And out of that, I now get my observations, which is just a plain text.
So, what do we have new in 2020 in Vision? First, we have Hand and Body Pose. To see more about that, please look at the "Hand and Body Pose" session.
Then you might have seen our Trajectory Detection. And more about that, you can see in the "Exploring the Action and Vision Application." Today, we're just going to focus on the Contour Detection and on Optical Flow.
What is Contour Detection? With Contour Detection, I can find edges in my image.
As we saw here, the red lines now show the contours that we found in this graphic.
So we start with an image, and then we create our VNDetectContourRequest.
We can now set the contrast on the image to enhance, for instance, how some of the contrast may come out. We can switch between, do we want to run it on a dark background with this light background, which may separate the foreground versus background? Last but not least, we can insert the maximumImageDimension. That allows you to trade off the performance versus the accuracy.
That means, for instance, if you look at it at a lower resolution you will still get your contours but they might not follow the edge as closely, but it runs much faster because it can run at a lower resolution. In comparison, when we use a higher resolution, which you might want to do in some post-processing, we actually get much more accurate contours but it's gonna take a little bit longer because it has to do more work.
Let's look at the observation that we get back.
Here we have a very simple image of two squares with a circle in it.
We are getting back a VNContoursObservation.
The topLevelContours are our two rectangles that we see.
Inside of those we have childContours. They are nested and those are the circles.
Then we get back the contourCount which I can actually use to walk through all of my contours. But it's much more easier, for instance, to use the index path. As you can see, they are nested in each other and I can now traverse my graph.
Last but not least, I also get the normalizedPath. And this is a CGPath I can use easily for rendering.
Now, what is a VNContour? In our example we get a VNContour here... and that is the most Outer Contour, our Parent. Nested inside of it are childContours. These are the Inner Contours.
My contour has an index path and, of course, with that every childContour has the index path, which I can use again to traverse my graph.
Then I get the normalizedPoints in the pointCount. Now, that is actually the real meat of the contour because it describes each of the line segments that we discover. Because we didn't just discover pixels, we really get a contour which is a path.
We also have an aspectRatio. I'm gonna talk about that on the next slide.
And then we have the normalizedPath to render. When we want to work with contours, there's a few things we need to keep in mind.

May	JUN	Jul
	12
2020	2021	2022

Resources

Related Videos

WWDC 2021

WWDC 2020

WWDC 2019

WWDC 2018

WWDC 2017