Face detection with Core Image on Live Video

In this article I will explain how to do face detection on a live video feed using an iOS 5 device. We will be using Core Image to do the heavy lifting. The code is loosely based on the SquareCam sample code from Apple.

To get started, we need to show the live video of the front facing camera. We use AVFoundation to do this. We start by setting up the AVCaptureSession. We use 640×480 as the capture resolution. Keep in mind that face detection is relatively compute intensive. The less pixels we need to munch, the faster the processing can be done. This is an interactive application, so realtime performance is important. We tell the AVCaptureSession which camera to use as input device.

To show the preview, we create an AVCaptureVideoPreviewLayer and add it to the previewView, that was created in the Xib. Don’t forget to call [session startRunning]. That was the easy part.

NSError *error = nil;
AVCaptureSession *session = [[AVCaptureSession alloc] init];
if ([[UIDevice currentDevice] userInterfaceIdiom] == UIUserInterfaceIdiomPhone){
    [session setSessionPreset:AVCaptureSessionPreset640x480];
} else {
    [session setSessionPreset:AVCaptureSessionPresetPhoto];
}
// Select a video device, make an input
AVCaptureDevice *device;
AVCaptureDevicePosition desiredPosition = AVCaptureDevicePositionFront;
// find the front facing camera
for (AVCaptureDevice *d in [AVCaptureDevice devicesWithMediaType:AVMediaTypeVideo]) {
	if ([d position] == desiredPosition) {
		device = d;
        self.isUsingFrontFacingCamera = YES;
		break;
	}
}
// fall back to the default camera.
if( nil == device )
{
    self.isUsingFrontFacingCamera = NO;
    device = [AVCaptureDevice defaultDeviceWithMediaType:AVMediaTypeVideo];
}
// get the input device
AVCaptureDeviceInput *deviceInput = [AVCaptureDeviceInput deviceInputWithDevice:device error:&error];
if( !error ) {

    // add the input to the session
    if ( [session canAddInput:deviceInput] ){
        [session addInput:deviceInput];
    }

    self.previewLayer = [[AVCaptureVideoPreviewLayer alloc] initWithSession:session];
    self.previewLayer.backgroundColor = [[UIColor blackColor] CGColor];
    self.previewLayer.videoGravity = AVLayerVideoGravityResizeAspect;

    CALayer *rootLayer = [self.previewView layer];
    [rootLayer setMasksToBounds:YES];
    [self.previewLayer setFrame:[rootLayer bounds]];
    [rootLayer addSublayer:self.previewLayer];
    [session startRunning];

}
session = nil;
if (error) {
	UIAlertView *alertView = [[UIAlertView alloc] initWithTitle:
                        [NSString stringWithFormat:@"Failed with error %d", (int)[error code]]
                                           message:[error localizedDescription]
									      delegate:nil
							     cancelButtonTitle:@"Dismiss"
							     otherButtonTitles:nil];
	[alertView show];
	[self teardownAVCapture];
}

Now for the face detection.

We create the face detector itself in viewDidLoad, and keep a reference to it with a property. We use low accuracy, again for performance reasons.

NSDictionary *detectorOptions = [[NSDictionary alloc] initWithObjectsAndKeys:CIDetectorAccuracyLow, CIDetectorAccuracy, nil];
self.faceDetector = [CIDetector detectorOfType:CIDetectorTypeFace context:nil options:detectorOptions];

 

We access the data captured by the camera by creating an AVCaptureVideoDataOutput, using BGRA as pixelformat. We drop frames we cannot process. To do the actual processing, we create a separate processing queue. This feature works via the delegate method, that gets called for each frame on the processing queue.

// Make a video data output
self.videoDataOutput = [[AVCaptureVideoDataOutput alloc] init];
// we want BGRA, both CoreGraphics and OpenGL work well with 'BGRA'
NSDictionary *rgbOutputSettings = [NSDictionary dictionaryWithObject:
                                   [NSNumber numberWithInt:kCMPixelFormat_32BGRA] forKey:(id)kCVPixelBufferPixelFormatTypeKey];
[self.videoDataOutput setVideoSettings:rgbOutputSettings];
[self.videoDataOutput setAlwaysDiscardsLateVideoFrames:YES]; // discard if the data output queue is blocked
// create a serial dispatch queue used for the sample buffer delegate
// a serial dispatch queue must be used to guarantee that video frames will be delivered in order
// see the header doc for setSampleBufferDelegate:queue: for more information
self.videoDataOutputQueue = dispatch_queue_create("VideoDataOutputQueue", DISPATCH_QUEUE_SERIAL);
[self.videoDataOutput setSampleBufferDelegate:self queue:self.videoDataOutputQueue];
if ( [session canAddOutput:self.videoDataOutput] ){
    [session addOutput:self.videoDataOutput];
}
// get the output for doing face detection.
[[self.videoDataOutput connectionWithMediaType:AVMediaTypeVideo] setEnabled:YES];

The actual processing happens in the delegate method, that gets called on the background. First the frameBuffer is created, we use all attachments that come with the captured frame for processing.  We add exif information onto the image, because we need to know which side is up. The actual face detection is done in the method [self.facedetector featuresInImage:ciImage options:imageOptions];

- (void)captureOutput:(AVCaptureOutput *)captureOutput
    didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer
       fromConnection:(AVCaptureConnection *)connection
{
	// get the image
	CVPixelBufferRef pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer);
	CFDictionaryRef attachments = CMCopyDictionaryOfAttachments(kCFAllocatorDefault, sampleBuffer, kCMAttachmentMode_ShouldPropagate);
	CIImage *ciImage = [[CIImage alloc] initWithCVPixelBuffer:pixelBuffer
                                                      options:(__bridge NSDictionary *)attachments];
	if (attachments) {
		CFRelease(attachments);
    }

    // make sure your device orientation is not locked.
	UIDeviceOrientation curDeviceOrientation = [[UIDevice currentDevice] orientation];

	NSDictionary *imageOptions = nil;

	imageOptions = [NSDictionary dictionaryWithObject:[self exifOrientation:curDeviceOrientation]
                                               forKey:CIDetectorImageOrientation];

	NSArray *features = [self.faceDetector featuresInImage:ciImage
                                                   options:imageOptions];

    // get the clean aperture
    // the clean aperture is a rectangle that defines the portion of the encoded pixel dimensions
    // that represents image data valid for display.
	CMFormatDescriptionRef fdesc = CMSampleBufferGetFormatDescription(sampleBuffer);
	CGRect cleanAperture = CMVideoFormatDescriptionGetCleanAperture(fdesc, false /*originIsTopLeft == false*/);

	dispatch_async(dispatch_get_main_queue(), ^(void) {
		[self drawFaces:features
            forVideoBox:cleanAperture
            orientation:curDeviceOrientation];
	});
}

The last step is to actually draw something on the screen where the face has been detected. The method drawFaces:forVideoBox:orientation is called on the main thread to do this.

In this method, we will draw an image onto a CALayer in the previewLayer. For each detected face, we will create or reuse a layer. We have to setup the correct size based on the bounds of the detected face. Take into account that the video has been scaled, so we also need to take that factor into account.  Then we position the image onto the layer. The layer in turn needs to be rotated into the right orientation. This is done based on the device orientation.

// called asynchronously as the capture output is capturing sample buffers, this method asks the face detector
// to detect features and for each draw the green border in a layer and set appropriate orientation
- (void)drawFaces:(NSArray *)features
      forVideoBox:(CGRect)clearAperture
      orientation:(UIDeviceOrientation)orientation
{
	NSArray *sublayers = [NSArray arrayWithArray:[self.previewLayer sublayers]];
	NSInteger sublayersCount = [sublayers count], currentSublayer = 0;
	NSInteger featuresCount = [features count], currentFeature = 0;

	[CATransaction begin];
	[CATransaction setValue:(id)kCFBooleanTrue forKey:kCATransactionDisableActions];

	// hide all the face layers
	for ( CALayer *layer in sublayers ) {
		if ( [[layer name] isEqualToString:@"FaceLayer"] )
			[layer setHidden:YES];
	}	

	if ( featuresCount == 0 ) {
		[CATransaction commit];
		return; // early bail.
	}

	CGSize parentFrameSize = [self.previewView frame].size;
	NSString *gravity = [self.previewLayer videoGravity];
	BOOL isMirrored = [self.previewLayer isMirrored];
	CGRect previewBox = [ViewController videoPreviewBoxForGravity:gravity
                                                        frameSize:parentFrameSize
                                                     apertureSize:clearAperture.size];

	for ( CIFaceFeature *ff in features ) {
		// find the correct position for the square layer within the previewLayer
		// the feature box originates in the bottom left of the video frame.
		// (Bottom right if mirroring is turned on)
		CGRect faceRect = [ff bounds];

		// flip preview width and height
		CGFloat temp = faceRect.size.width;
		faceRect.size.width = faceRect.size.height;
		faceRect.size.height = temp;
		temp = faceRect.origin.x;
		faceRect.origin.x = faceRect.origin.y;
		faceRect.origin.y = temp;
		// scale coordinates so they fit in the preview box, which may be scaled
		CGFloat widthScaleBy = previewBox.size.width / clearAperture.size.height;
		CGFloat heightScaleBy = previewBox.size.height / clearAperture.size.width;
		faceRect.size.width *= widthScaleBy;
		faceRect.size.height *= heightScaleBy;
		faceRect.origin.x *= widthScaleBy;
		faceRect.origin.y *= heightScaleBy;

		if ( isMirrored )
			faceRect = CGRectOffset(faceRect, previewBox.origin.x + previewBox.size.width - faceRect.size.width - (faceRect.origin.x * 2), previewBox.origin.y);
		else
			faceRect = CGRectOffset(faceRect, previewBox.origin.x, previewBox.origin.y);

		CALayer *featureLayer = nil;

		// re-use an existing layer if possible
		while ( !featureLayer && (currentSublayer < sublayersCount) ) {
			CALayer *currentLayer = [sublayers objectAtIndex:currentSublayer++];
			if ( [[currentLayer name] isEqualToString:@"FaceLayer"] ) {
				featureLayer = currentLayer;
				[currentLayer setHidden:NO];
			}
		}

		// create a new one if necessary
		if ( !featureLayer ) {
			featureLayer = [[CALayer alloc]init];
			featureLayer.contents = (id)self.borderImage.CGImage;
			[featureLayer setName:@"FaceLayer"];
			[self.previewLayer addSublayer:featureLayer];
			featureLayer = nil;
		}
		[featureLayer setFrame:faceRect];

		switch (orientation) {
			case UIDeviceOrientationPortrait:
				[featureLayer setAffineTransform:CGAffineTransformMakeRotation(DegreesToRadians(0.))];
				break;
			case UIDeviceOrientationPortraitUpsideDown:
				[featureLayer setAffineTransform:CGAffineTransformMakeRotation(DegreesToRadians(180.))];
				break;
			case UIDeviceOrientationLandscapeLeft:
				[featureLayer setAffineTransform:CGAffineTransformMakeRotation(DegreesToRadians(90.))];
				break;
			case UIDeviceOrientationLandscapeRight:
				[featureLayer setAffineTransform:CGAffineTransformMakeRotation(DegreesToRadians(-90.))];
				break;
			case UIDeviceOrientationFaceUp:
			case UIDeviceOrientationFaceDown:
			default:
				break; // leave the layer in its last known orientation
		}
		currentFeature++;
	}

	[CATransaction commit];
}

There you go. That is the basic principle behind Face Detection in iOS 5. For the nitty gritty details, just have a look at the code on github or download the zip.

There is much more to be explored. Core Image also provides access to the detected location of eyes and mouth. That would be even better to place the mustache correctly. We could also rotate the image, based on the angle of the face on the screen.

Adios!

Any feedback is appreciated in the comments.

7 responses to “Face detection with Core Image on Live Video”

  1. Hugo Avatar
    Hugo

    Hi Jeroen,

    What kind of license is your code under? I’d like to use a few pieces of it in some code of mine. But instead of drawing something on the face, I want to cut out the face and center it in a box. Since your AV setup is pretty good I’d love to use that.

    1. Jeroen Trappers Avatar

      MIT Licence. By the way. Most of the code you see here is based on the Apple Sample. So take into account that Licence.
      http://jtr.mit-license.org/

  2. Manoel Costa Avatar
    Manoel Costa

    I’m trying to save the current view as an image with this code:

    if ([[UIScreen mainScreen] respondsToSelector:@selector(scale)]) {
    if ([[UIScreen mainScreen] scale] == 2.0) {
    UIGraphicsBeginImageContextWithOptions(self.previewView.bounds.size, YES, 2.0);
    } else {
    UIGraphicsBeginImageContext(self.previewView.bounds.size);
    }
    } else {
    UIGraphicsBeginImageContext(self.previewView.bounds.size);
    }

    [self.previewView.layer renderInContext:UIGraphicsGetCurrentContext()];
    UIImage *viewImage = UIGraphicsGetImageFromCurrentImageContext();
    UIGraphicsEndImageContext();

    UIImageWriteToSavedPhotosAlbum(viewImage, self, @selector(image:didFinishSavingWithError:contextInfo:), nil);
    I just want to save a image of the whole thing that is displaying on the screen, the “live camera” plus the overlay image.

    But with this code, I’m getting a completely black image as output.

    Can you help me with this? Thank you so much 🙂

    1. Jeroen Trappers Avatar

      You should have a look a the original Apple Sample “SquareCam”. It shows how to save the image to the camera roll.

  3. Manoel Costa Avatar
    Manoel Costa

    Commenting again just to turn on the Notify me of follow-up comments by email…. option 🙂 THanks

  4. sverre Avatar
    sverre

    Cool thanks, I can use this… Have been playing around with putting “stuff” in face just like you, recently. I am planning to use SVGKit ( https://github.com/SVGKit/SVGKit ) to add a vector mask mask (no pun intended), instead of using raster graphics that could look bad when scaled. Just an idea.

  5. Rajesh Avatar
    Rajesh

    Does it detects iris and eyes?

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.