Individual Object Details
In accordance with the XML Standard, each parameter in the iXML chunk data is defined by a standard XML tag, ie:
<TAGNAME>free text data</TAGNAME>.
These parameters may be embedded in object containers which group the parameters together. ie
<TAGNAME>free text data</TAGNAME>
These object containers may also be embedded in other object containers, ie
<TAGNAME>free text data</TAGNAME>
Objects and parameters are discussed in detail below:
All parameters and sub objects are optional.
The entire iXML metadata group for a given file is wrapped inside a single master <BWFXML> object.
The version number of the iXML specification used to prepare the iXML audio file. This version appears in the front page at http://www.ixml.info, and takes the form of x.y where x and y are whole numbers, for example 1.51
The name of the project to which this file belongs. This might typically be the name of the motion picture or program which is in production.
The name of the scene / slate being recorded. For US system this might typically be 32, 32A, 32B, A32B, 32AB etc. For UK system this might typically be a incrementing number with no letters.
The SoundRoll which identifies a group of recordings. Normally, the SoundRoll is a vital component of workflow to differentiate audio recorded with time of day on different days. In other words for 2 (completely different) recordings each covering a period around 11am, the soundroll would differentiate them by (typically) telling you which shooting day this recording applies to. Some projects may turnover sound more than once per day, and increment the soundroll at this point. In any event, the soundroll should change at least once in any 24 hour period. Some systems change the soundroll for every recording which is also a valid option, in effect using the soundroll as a unique file identifer (although this function is explicitly provided with the iXML FILE_UID parameter).
The number of the take in the current scene or slate. Usually this will be a simple number, although variations for things like wild tracks may yield takes like 1, 2, 3, WT1, WT2 etc.
(New in iXML v2.0)
A dictionary based tag allowing selection from a defined list of values to explicitly categorise the type/purpose/function of the current take. This tag overlaps with the existing NO_GOOD / FALSE_START / WILD_TRACK which are deprecated in iXML 2.0, This tag can contain multiple entries, separated by commas and can be expanded in the future with additional dictionary entries, detailed in the TAKE_TYPE dictionary
<NO_GOOD> (Deprecated in iXML v2.0, superceded by TAKE_TYPE)
This parameter allows a recorder to mark this recording as "no good" (ie, of no use whatsoever, and in effect to be deleted). The value should be TRUE or FALSE. If absent, this should be assumed FALSE.
<FALSE_START> (Deprecated in iXML v2.0, superceded by TAKE_TYPE)
This parameter allows a recorder to mark this recording as a false start, this may indicate that another file could exist with the same take number. Typically this file might also be marked as <NO_GOOD>. The value should be TRUE or FALSE. If absent, this should be assumed FALSE.
<WILD_TRACK> (Deprecated in iXML v2.0, superceded by TAKE_TYPE)
This parameter allows a recorder to mark this recording as a wild track, with no specific relationship to any take, although it might be marked with a specific scene, for example when recording ambience for a given location. The value should be TRUE or FALSE. If absent, this should be assumed FALSE.
This parameter allows a recorder to mark this recording as a circle-take. The value should be TRUE or FALSE. If absent, this should be assumed FALSE.
A unique number which identifies this physical FILE, regardless of the number of channels etc. If your system employs a unique SoundRoll per recording, your FILE_UID and TAPE parameters should be the same.
The userbits associated with this recording. This may have been extracted from incoming timecode when the file was recorded, or generated by the recorder from the date, or any other metadata. Typically the userbits are rarely used now because other more explicit metadata supercedes this function.
A free text note to add user metadata to the recording. This might typically used to communicate information such as TAIL SLATE, NO SLATE, or to warn of noise interruptions - PLANE OVERHEAD etc.
<SYNC_POINT_TYPE> - RELATIVE OR ABSOLUTE
<SYNC_POINT_FUNCTION> - SEE SYNC POINT FUNCTION DICTIONARY
These parameters allow the communication of multiple sample based counts which represents a sync point for this recording. Future generations of "METASlate" Clapperboards will include a radio transmitter communicating the clap point to recorders. This will allow the automatic identification of the sync point within the file without manually locating it in telecine. Depending on the <SYNC_POINT_TYPE> parameter, which can be either RELATIVE or ABSOLUTE, which represents a sample frames count from the start of the file, or an absolute sample count since midnight. SYNC_POINT_FUNCTION determines the function of this sync point. There are a number of defined functions for sync points, indicating things like the Pre-Record Sample Count, or the primary slate. See the Sync Point Function dictionary for a list of defined functions. Note the relative sample count needs multiplying by the wordsize and number of channels to translated into a byte count from the start of the file. For 32 bit sample counts, only the _LOW parameter is needed, with _HIGH set to zero. For high sample rates, the _HIGH parameter is also needed to communicate 24 hour sample counts. Since an audio recording may include multiple camera slates, the SYNC_POINT_COMMENT allows a note for each sync point to be entered, for example camera a, camera b etc. Sync Points are very useful for playback on field recorders, where they can be used to store cue points into sound files. SYNC_POINT_EVENT_DURATION allows a sync point to be a region with a defined start and stop point/duration. This can be useful for playbacks where you would like the play to stop automatically at the end of the sync event. A value of 0 in the SYNC_POINT_EVENT_DURATION implies a non-duration (marker) or unknown duration event. SYNC_POINT_COUNT should appear at the start of the SYNC POINT LIST to allow readers to prepare memory for the following list.
The SPEED object allows communication of the ratio of recorded speed to target playback speed. For example, in the case of a slow motion shot, the camera speed might be doubled to 48 FPS, with the intention of ultimately playing the picture back at the normal speed of 24 FPS. In this case, in order for the audio to match the finished shot at 24FPS it would need to be slowed down by 50%. This information is presented in iXML using a pair of ratios. The MASTER_SPEED parameter would typically be X/Y, eg 24/1, 25/1, 30000/1001 (NTSC) - this determines the DEFAULT speed of the project, or the speed at which the material will be replayed. The CURRENT_SPEED parameter identifies the speed of this recording which is different to the MASTER_SPEED. So in the slow motion example, MASTER_SPEED would be 24/1, and CURRENT_SPEED would be 48/1. We use a ratio in each in order to easily accommodate accurate representations of NTSC and HD frame rates. The NOTE parameter allows the user to explain any details about the change.
The SPEED object can also be used by recorders to communication the default timecode frame rate using during record. In this case, the TIMECODE_RATE tag should be included and set to 24/1, 25/1, 30000/1001, 24000/1001, 30/1 etc as appropriate, and the TIMECODE_FLAG parameter can be NDF or DF for non drop or drop timecode (defaults to non-drop). Although the timecode rate is not important to a BWF file, some translations from iXML to other text formats may require the timestamps to be converted to H:m:s:f format, which will require inference of the frame rate. Allowing the recorder to set this will avoid the need to specify the displayed rate during conversion (unless you want to change the displayed frame rate, of course).
All speed and rate parameters in the SPEED object must be integer over integer, so to imply NTSC rates, use the ratios shown below.
Common Speed Ratios:
NTSC - often called 29.97, is actually not exactly 29.97, and should be labelled '30000/1001'
HD 23.98 - same story, needs to be labelled '24000/1001'.
Using the mathematically correct ratio allows the Speed object to be used for precise calculations rather than simply operating as a label.
SampleCount maths in SPEED tags: (New in iXML v1.5)
When complex workflows are developed, particularly for HD oriented workflows, it can be necessary to run recorders at pull up or down rates. In some cases this rate is designed to deliberately introduce a speed change from recording to the playback speed during post. A common example would be the practice of recording audio at 48048 Hz with 24 FPS film, and then transferring both to a 23.976 HD video tape with 48000 Hz sound. Many audio workstations are unhappy with files marked with unusual sample rates and so it can be necessary to label files as 48000Hz, even if they were really recorded at 48048Hz. To resolve all these ambiguities, iXML 1.5 adds some new tags to the SPEED chunk designed to clearly indicate exactly what is going on. <FILE_SAMPLE_RATE> is the same information included in the BWF fmt chunk - ie the declared WAVE sample rate, repeated here for convenience. Being human readable, iXML is the perfect markup language allowing workflow problems to be analysed by viewing information in the SPEED chunk, using Metadata display tools like iXML Reader. <AUDIO_BIT_DEPTH> is the same information included in the BWF fmt chunk - ie the declared WAVE sample rate, repeated here for convenience. <DIGITIZER_SAMPLE_RATE> is where it gets interesting - this is the true wordclock speed of the A-D convertors used during the recording. If a player wants 100% natural speed on playback this is the rate it would use for playback. <TIMESTAMP_SAMPLES_SINCE_MIDNIGHT_HI> and <TIMESTAMP_SAMPLES_SINCE_MIDNIGHT_LO> is the same information included in the BWF fmt chunk - ie the declared WAVE sample rate, repeated here for convenience. <TIMESTAMP_SAMPLE_RATE> indicates the sample rate used to calculate the timestamp, and which must be used in mathematic calculations to recover the timecode timestamp for the file. IMPORTANT - the file samplerate, audio bit depth and timestamp samples since midnight are included redundantly in the SPEED object in order to assemble all the important data in a single place for human readability when troubleshooting workflow problems. iXML Readers should ignore this information and instead take the data from the official fmt and bext chunks in the file. Generic utilities changing file data might change the fmt or bxt chunk but not update the iXML SPEED tag, and for EBU officially specified BWF data like timestamp, the EBU data takes precedence (unlike the unofficial informal bext metadata which is superceded by iXML)
The HISTORY object allows tracking of a file's origins, where it may have been created as a derivation of another file. A very common example would be where a polyphonic (multichannel embedded) file is recorded on set, and later, during post production, child monophonic files are created for editorial. It is extremely desireable to know the parent from which a file is derived, in order to assist in assembling families of tracks for audio conforming. The ORIGINAL_FILENAME allows a permanent record of the name given to this file when it was created. If the file is subsequently renamed by the user, this may cause problems in the workflow. Using the ORIGINAL_FILENAME metadata would allows systems to track back to the original given name. The PARENT_FILENAME and PARENT_UID allow identification of the source of a derived file. PARENT_FILENAME is equivalent to Avid's TAPEID parameter.Multiple mono second generation files may be associated with one another using a common PARENT_UID. Alternatively they could also be associated using a common FAMILY_UID given to the set of files when a polyphonic file was split. See below for details on FAMILY_UID, in the FILE_SET object. If a PARENT_UID equates to another FAMILY_UID group, then it might be fair to assume that this is a mixdown from a multi-file source (especially if the track function says so). If a PARENT_UID equates to the FILE_UID of a single BWFp file, this is probably one of a derived split set. If PARENT_UID equates to the FILE_UID of a single BWFm file, this is probably a derived version, ie reduced bit depth, where the parent is 24 bit, and this file is 16 bit.
Often, it will be necessary to deliver a given recorded event in several files. This might be a group of Mono files, recorded at the same time from different microphones, or it may be a pair of polyphonic files, where one contains discreet microphone recordings and another contains a stereo mixdown. In these cases, it is very useful to know how many companions a given file has, and this information is presented in the TOTAL_FILES paremeter. For a 4 channel mono set of files, TOTAL_FILES would be 4. For a 4+2 6 channel set, spread over a 4 channel file and a stereo mix file, TOTAL_FILES would be 2. Multiple Files which represent a single recording should share a common FAMILY_UID so they can be associated. Their track / channel relationship (across the entire multichannel set) is determined by each file's TRACKLIST object shown below. Where a mixdown is later created from a set of discreet tracks, it should share the same familyUID, but use a CHANNEL_INDEX (see later) picking up from the last channel mixed. For applications which are splitting Poly files into mono, care should be taken to detect TOTAL_FILES > 1 when splitting a Poly. This is important since when splitting a Dual Poly (4+2 channels) into 6 x Mono, it is VITAL that all 6 mono files have the same FAMILY_UID, in order that conforming software can accurately re-assemble the family. Since the FAMILY_UID may be created at the point where a poly is split, 2 seperate operation on the 4 and 2 channel Poly may result in different FAMILY_UID's being created for the 2 mono file sets. As an alternative to this, you may choose to simply suffix the Poly files' FAMILY_UID when creating the new FAMILY_UID for the children, where the same suffix would be added for both Poly conversions. An example of this might be - FAMILY_UID of BOTH parent Poly files (assuming the recorder was iXML compliant) may be 123456789. When splitting either or both of the Poly parent files, you would create new FAMILY_UIDs for the monos, but in this example you would suffix the parent's FAMILY_UID resulting in a FAMILY_UID of all mono files being '123456789-CHILD' or any other suffix. The point here being that the same rule of suffix would apply to both Poly-mono conversion from the original family, even if the 2 conversions were done seperately. Since the original Dual Poly set should have had its Channel Indexes set correctly (1-4 in file 1, and 5-6 in file 2) these will have been carried through and your finished set of 6 mono files should have the correct Channel index, and all share the same FAMILY_UID regardless of how the splitting process occurred. It is not advisable to simply repeat the parent's FAMILY-UID in the child files, since this might cause complications when assembling sets, possibly leading to a future app down the pipeline attempting to re-split the polys. The parent File's FILE-UID will be stored in the child files' PARENT_UID field. Asset management systems could perform more complex assembly of history by tracking back from PARENT-UID, gathering parent FAMILY_UIDs etc. For assisting in metadata editing and tracking of complex file sets like Dual Poly (where you may using naming conventions such as sc_tk and sc+tk), the FILE_SET_INDEX allows tagging of files to indicate their origination index in a group of files. This would allow you to ensure that a file which was originally designated as the + member of the dual poly set, always maintains the + in its name rather than _ which belongs to the other indexed file set member. Since in iXML each file should contain all the data needed to interpret its usage, this is a subtle issue of compatibility with preserving the names used on other non iXML workflows. For mono files, the practice should be to use indexes 1 to n. for multi-poly files, use letters A, B etc. It is strongly recommended that dual poly recordings should use this tag. <FAMILY_NAME> is a non-unique text name for the file set, this is equivalent to the ClipID in a video editing system which identifies then entire editing 'object' clip as a group of audio and video tracks. For film projects the <FAMILY_NAME> might take the form "scene/take", and editing systems have the option of reading the <FAMILY_NAME> tag and using this as the clip ID rather than conjuring one up from other metadata.
NEW: FILE_SETs can also be used to imply grouping of objects in a virtual multichannel sequence, to support recorders which can punch individual tracks in and out during a single recording. In these File Set Groups, tracks are labelled as normal (with multiple files sharing the same TRACK data). In the SYNC_POINT_LIST of each file, a FUNCTION "GROUP_OFFSET" is added as a relative value, indicating the SAMPLE COUNT offset (using the Digitizer sample rate) of the start of this file compared to the group. To locate the group as a whole, each file of a non-coherent FILE_SET group MUST include the additional FILE_SET tag <FILE_SET_START_TIME_HI> and <FILE_SET_START_TIME_LO> using the same counting used for the TIMESTAMP in the <SPEED> tag. The FILE_SET_START_TIME is used to locate the start of the FILE_SET group, with each file offset by its GROUP_OFFSET SYNC_POINT. Where a member of the FILE_SET has not GROUP_OFFSET sync point, it is assumed that this file starts with an offset of 0 from the group. A schematic illustrating this is available here:
One of the primary objectives of iXML is to offer an umabigous way to identify track labels, track indexing (microphone source identification), multichannel / multifile relationship of tracks and track function identification. The TRACKLIST object accomplishes all of this. The TRACK_COUNT parameter determines the number of TRACK objects in this group, which would normally match the number of tracks in the file. For each one, we have a TRACK obect. Within that, we have several parameters. CHANNEL_INDEX communicates the 'input' or 'source' number on a recorder. For an 8 channel recorder, the inputs might be identified as 1 to 8, however for a given recording, only tracks 4 and 6 may be armed. In this file, the tracks for these 2 channels would have CHANNEL_INDEX of 4 and 6 respectively so that anyone reading the file knows that the audio on the 2 audio tracks in the file come from microphone sources 4 and 6. The INTERLEAVE_INDEX supports this function, confirming which tracks we are talking about for a multichannel interleaved file. Since XML does not require the objects to be in any particular order, readers cannot assume that the TRACK objects will be in order, and hence the INTERLEAVE_INDEX identifies which TRACK object this is. The NAME parameter allows a track label to be attached, and the FUNCTION parameter allows explicit identification of a track's purpose, such that software reading the file can apply appropriate treatment to that track's audio data. For example it could communicate LEFT or RIGHT so that a player could pan each channel automatically. Also it can identify MID-SIDE functionality so that a player can apply the appropriate signal processing to these channels to derive a stereo image to a listener. A dictionary of standard FUNCTION values is given in the next chapter. Note that Channel and Interleave Indexes are indexed from a base of 1 (not zero), so the first channel / interleaved track will be designated 1 for both parameters.
<PRE_RECORD_SAMPLECOUNT> (Deprecated - do not use)
This parameter allows a recorder which incorporates a pre-record buffer to communicate how much of the recording came from the pre-record buffer. This is a sample count (like timestamp) offset from the start of the file. Playback applications can, by default skip past the pre-record stage, offering the user a more intuitive playback which starts from the point where the record button was actually pressed. NOTE: iXML 1.4 defines a SYNC_POINT_FUNCTION for Pre-record sample count, which is preferable to using this old iXML 1.3 deprecated root tag. For full backwards compatibility you should support both the root Pre-Record Samplecount tag, and also override it if one is found inside the Sync Point List. Versions of Gallery Metacorder prior to version 1.2 use the root PRE_RECORD_SAMPLECOUNT, whilst later versions use the Sync Point List.
<BWF_DESCRIPTION>all the old stuff</BWF_DESCRIPTION>
Broadcast WAVE files employ a bext chunk to communicate EBU standardised metadata, which complements iXML. In non WAVE files, such as AIFF, it may be desireable to also store an equivalent set of information. For this purpose iXML includes an optional BEXT object which can communicate the standard EBU bext metadata in any file employing iXML. If this <BEXT> object appears (redundantly) in a WAVE file (for continuity of format) it is ESSENTIAL that the values match those in the official bext chunk in the same file.
The USER field is completely defined by the user, hardware or application, it has no defined function, and may contain any kind of human readable data. It is intended that this field be used (comptible with a defined schema) to store miscellaneous information, which is not appropriate for any other field. Applications are free to sub-divide this field with tagging systems like the old bext description, although typically this field is designed to be human readable rather than machine readable, so any tagging should be based on interpretation of human readable, neat text. iXML viewing applications will typically display the entire USER field in one text area. One of the primary functions of this field would typically be to allow extended information about the recording process used, and personnel involved with a field recording. This is typically not file-specific but will be the same for a whole group of recordings, and which appears in the iXML of all recordings. It is ideally suited to storage of the metadata which would normally appear at the top of a sound report, ie. the name of the mixer, contact details etc.
UPDATE: iXML v2.0 - the USER field can now contain XML tagged data, providing more explicit and machine readable information, using the list of fields recommended by AFSI:
The LOCATION object group is designed to hold machine readable information about the location this recording ws made in. In particular to support geotagging of recordings. LOCATION_GPS is specified in standard decimal form: latitude, longitude
<LOCATION_NAME>Human readable description of location</LOCATION_NAME>