learning communications blog: August 2008

Bertrand Serlet, Senior Vice President of Software Engineering at Apple, has filed a patent application on a next-generation podcasting solution. It has some intriguing features such as automated switching between the video and display graphics, perhaps using a pointer device to detect when attention should be focused on the display graphics.

But I don't think Serlet's idea represents a significant innovation in podcasting technology - it is generally about presentation recording and specifically intelligent switching technology. And as for presentation recording, Serlet's idea is not nearly as impactful as Panopto's innovations, which are available today.

The factor that makes Panopto so different from what Apple is proposing is the multi-stream synchronisation. Apple's idea focuses on the intelligent switching problem from the perspective of producing one final output video stream. This has its limits. If you look at say TED Talks, you can see that no matter how good the switching between video and display graphics, there is still a sense that the recorded presentation is playing catchup with what the audience is able to see.

On the other hand, Panopto's approach inherently scales to synchronisation of many data inputs which might include chats, instant messaging, Twitter feeds, pointer coordinates, assessment data, audience response data, and of course audio, video and display graphics. In fact, one of the feedbacks we see from clients is delight that they can use Panopto to record from two screens at once (eg- PowerPoints and Bloomberg terminal). The final output is not a merged and flattened video file, but a set of separate standards-based data streams which can be rendered in a variety of ways using player skins. This is way better than a flattened video file, and is more flexible to device and bandwidth constraints.

Apple's useful innovation in this patent, if they can do it, is to provide a set of pointer data that can be used for switching. My friend Peter Du and I were noticing that presenters are so comfortable using laser pointers, but that this data is lost during a recorded presentation. He suggested we design a tool to sense the location of the pointer beam relative to the screen, and use it to drive the cursor in realtime. That would be a way to capture the gestures of speakers that prefer laser pointers.

Some work has been done in this area by Johnny Lee. Lee is really an interesting guy with great educational technology projects including the Wiimote (using Wii as a laser pointer). Check out his Wii projects page. One possible application would be a classroom tracking camera which follows the presenter.

Coming back to the switching problem highlighted by Apple's patent, I think the real switching problem is between cameras - TED Talks have at least 2 cameras, usually 3. Say that one camera is wide and another is tight, how do you control the PTZ and switch between them automatically? Audio sensing has too much lag, motion detection is prone to extraneous inputs (eg- a member of the audience walks in front of the presenter and the camera follows him/her).

The best approach I've seen is similar to what Lee proposes - using an IR badge or reflective tape worn by the presenter, which the wide camera locates within the 'stage area'. It then relays the location data to a second camera so that it can PTZ for tight shot on that target. This is smart and reliable way to follow the presenter. You can switch cameras as suggested by Apple's patent application, eg- if the presenter is using the keyboard, or mouse, or has his back to the camera or is using the pointer, switch to the wide shot. Overlay that with a voice sensing or push-to-talk audio subsystem so that a third or fourth camera can zoom in when someone asks questions, and you have a fully automated presentation recording system.

Finally, Apple's patent application leaves me wondering if there is something inherent in podcasting technology that limits delivery to a single stream. I'm sure you can put an XML descriptor of a synchronised presentation into an RSS 2.0 enclosure, but I don't think today's players would know how to fetch it for local storage? Presumably an iPod, eBook Reader or phone wouldn't know what to do with it either.

There is a great quote by Thaksin Shinawatra, past Prime Minister of Thailand. Referring to the finance companies that collapsed in 1997 after devaluation of the Thai Baht, he said: "Sometimes the dog gets too mangy and you just have to shoot it".

In that vein, I am no longer reselling Echo360 products. My company Iterate Ptd Ltd has resigned as an Anystream reseller effective 31 July 2008. There are better solutions available for presentation recording, especially Panopto.

The company Panopto is a recent spinoff from Carnegie Mellon University. Their solution is built on a distributed multi-stream architecture, and is priced competitively for large and small installations. In fact, the pricing for corporate customers is less than 50% of what Echo360 charges, and qualifying academic customers need only pay for support services. This is in line with my prediction that capture of display graphics will be commoditised, and the value-add will increasingly be on the server side.

Panopto today supports editing, something that Echo360 has promised for more than a year. You can tell the difference between Panopto and Echo360 - one logo is pointing forward and the other is pointing backwards.

learning communications blog

Saturday, August 02, 2008

Next-Generation Podcasting Solutions

Friday, August 01, 2008

Ending the Relationship

Blog Archive

About Me