iXML in QuickTime Movies

[Movie User Data udType = 'iXML'] * DEPRACATED
Quicktime Metadata = 'info.ixml.metadata'

iXML is useful in many different areas of media recording, playback and transfer. Another file family which can benefit from iXML are QuickTime movies. In areas like Video Assist, it may be extremely desireable to embed scene and take information in to a QuickTime movie. In order to embed iXML into a QuickTime Movie, [we use the Movie User Data technique]* we use the modern Quicktime metadata using the reverse DNS tag 'info.ixml.metadata'. This allows an application to embed arbitrary data into any QuickTime movie in a standardised and compatible manner.

[This is done by passing a Handle containing your iXML data to QuickTime's AddUserData function, along with the udType for iXML, which is (of course..) 'iXML'. Similarly, the data can be retrieved from the movie using the GetUserData function, with the udType 'iXML' and index 1] *
This is done by adding an iXML text block as Quicktime metadata with the tag 'info.ixml.info' using the function QTMetaDataAddItem with kQTMetaDataStorageFormatQuickTime and kQTMetaDataKeyFormatQuickTime.

Since QuickTime supports multiple items of the same type we should restrict ourselves to a single iXML data object, and so applications writing iXML into movie metadata should check before writing to see if the Movie already has an existing iXML object. if so, it should remove it before adding a new one, or write updated data into the existing iXML object.

QuickTime already has a standardised method for storing timecode, based on Frames, so the BWF_TIME_REFERENCE_LOW and BWF_TIME_REFERENCE_HIGH sample count parameters of the BEXT object in iXML are not appropriate for Video. However, they may be useful if you are storing audio in your QuickTime movie. Almost all of the iXML parameters can be equally applied to video QuickTime movies. In Video and Audio movies, the iXML specification requires you to include both picture and sound in your TRACK_LIST. QuickTime movies can contain multiple video tracks and multiple audio tracks. The audio tracks can contain interleaved sound data, so we must be careful about how this information is delivered in iXML.

Let's take a common example - a QuickTime movie with 1 video track and 1 sound track with stereo interleaved data, and a 1 timecode track. How should this be documented in iXML ? Of course, since QuickTime has all this information explicitly detailed in the movie, we dont need iXML to express that. So, first we determine the channel indexing. QuickTime allows you to gather information about each channel by index, reading the type. In this case we will find one video track, one audio track and one timecode track (which we dont document in iXML). In the case of the audio track, we have a slight dilemma with the INTERLEAVE_INDEX since this usually shows us how to pick a channel from a polyphonic data stream where there is only a single data stream in the file. In this example, we treat the Audio track of the QuickTime movie as a single stream, and use INTERLEAVE_INDEX 1 and 2, just as we would with a soundfile. We have 3 entries total for the video track and stereo audio track. The CHANNEL_INDEX can be defined by the application, but typically we use this to identify input connections, and in our example we label the video with a CHANNEL_INDEX of zero. The INTERLEAVE_INDEX is not included in the video track since currently there are no multichannel interleaved video formats. You may use a different representation if you like, and even duplicate the use of CHANNEL_INDEX 1, if you wish to have multiple video tracks with video input indexes. To clarify this position, all video tracks should use the FUNCTION 'VIDEO', so that audio only applications can ignore the metadata. In a mixed audio and video iXML, we should maintain the audio metadata as closely as possible to a parallel audio only iXML object. So, for the example Movie, our track list would look like this :

<TRACK_LIST>
<TRACK_COUNT>3</TRACK_COUNT>
<TRACK>
<CHANNEL_INDEX>0</CHANNEL_INDEX>
<NAME>Video</NAME>
<FUNCTION>VIDEO</FUNCTION>
</TRACK>
<TRACK>
<CHANNEL_INDEX>1</CHANNEL_INDEX>
<INTERLEAVE_INDEX>1</INTERLEAVE_INDEX>
<NAME>Mid</NAME>
<FUNCTION>M-MID_SIDE</FUNCTION>
</TRACK>
<TRACK>
<CHANNEL_INDEX>2</CHANNEL_INDEX>
<INTERLEAVE_INDEX>2</INTERLEAVE_INDEX>
<NAME>Side</NAME>
<FUNCTION>S-MID_SIDE</FUNCTION>
</TRACK>
</TRACK_LIST>

In practice, it may be more common to use the iXML metadata in QuickTime movies for descriptive metadata and skip the description of the track layout (since this is available inside the Quicktime API).