Saturday, August 02, 2008

Next-Generation Podcasting Solutions

Bertrand Serlet, Senior Vice President of Software Engineering at Apple, has filed a patent application on a next-generation podcasting solution. It has some intriguing features such as automated switching between the video and display graphics, perhaps using a pointer device to detect when attention should be focused on the display graphics.

But I don't think Serlet's idea represents a significant innovation in podcasting technology - it is generally about presentation recording and specifically intelligent switching technology. And as for presentation recording, Serlet's idea is not nearly as impactful as Panopto's innovations, which are available today.

The factor that makes Panopto so different from what Apple is proposing is the multi-stream synchronisation. Apple's idea focuses on the intelligent switching problem from the perspective of producing one final output video stream. This has its limits. If you look at say TED Talks, you can see that no matter how good the switching between video and display graphics, there is still a sense that the recorded presentation is playing catchup with what the audience is able to see.

On the other hand, Panopto's approach inherently scales to synchronisation of many data inputs which might include chats, instant messaging, Twitter feeds, pointer coordinates, assessment data, audience response data, and of course audio, video and display graphics. In fact, one of the feedbacks we see from clients is delight that they can use Panopto to record from two screens at once (eg- PowerPoints and Bloomberg terminal). The final output is not a merged and flattened video file, but a set of separate standards-based data streams which can be rendered in a variety of ways using player skins. This is way better than a flattened video file, and is more flexible to device and bandwidth constraints.

Apple's useful innovation in this patent, if they can do it, is to provide a set of pointer data that can be used for switching. My friend Peter Du and I were noticing that presenters are so comfortable using laser pointers, but that this data is lost during a recorded presentation. He suggested we design a tool to sense the location of the pointer beam relative to the screen, and use it to drive the cursor in realtime. That would be a way to capture the gestures of speakers that prefer laser pointers.

Some work has been done in this area by Johnny Lee. Lee is really an interesting guy with great educational technology projects including the Wiimote (using Wii as a laser pointer). Check out his Wii projects page. One possible application would be a classroom tracking camera which follows the presenter.

Coming back to the switching problem highlighted by Apple's patent, I think the real switching problem is between cameras - TED Talks have at least 2 cameras, usually 3. Say that one camera is wide and another is tight, how do you control the PTZ and switch between them automatically? Audio sensing has too much lag, motion detection is prone to extraneous inputs (eg- a member of the audience walks in front of the presenter and the camera follows him/her).

The best approach I've seen is similar to what Lee proposes - using an IR badge or reflective tape worn by the presenter, which the wide camera locates within the 'stage area'. It then relays the location data to a second camera so that it can PTZ for tight shot on that target. This is smart and reliable way to follow the presenter. You can switch cameras as suggested by Apple's patent application, eg- if the presenter is using the keyboard, or mouse, or has his back to the camera or is using the pointer, switch to the wide shot. Overlay that with a voice sensing or push-to-talk audio subsystem so that a third or fourth camera can zoom in when someone asks questions, and you have a fully automated presentation recording system.

Finally, Apple's patent application leaves me wondering if there is something inherent in podcasting technology that limits delivery to a single stream. I'm sure you can put an XML descriptor of a synchronised presentation into an RSS 2.0 enclosure, but I don't think today's players would know how to fetch it for local storage? Presumably an iPod, eBook Reader or phone wouldn't know what to do with it either.


Unknown said...

My one problem with Panopto (possibly, as we don't yet use it) -- how do you keep multiple streams (say, 3, 4 or 5) all at high bit rates (let's say VGA and HD video) in synch? I would think that much data would overwhelm the capture PC, and streams would fall out of synch over stretches of time (say, greater than 20 minutes). So, that would lead me to believe that the capture has to be managed (i.e. someone turning on/shutting off data streams), which is what we don't have the staffing for.

Like I said, we don't use Panopto, but that is my concern on so many streams captured simultaneously.