Solved: Clipped/cutoff audio when using VXML recordutterance property

benjaminturner · ‎09-22-2016

Hello,

I have an IVR application written with Cisco Unified Call Studio version 9.0(1), running on CVP 9 with Nuance Recognizer 9 as the ASR. I need to capture the caller's utterance at a certain form, and submit it to another application. To do this I'm using the VXML property "recordutterance". I've created a custom element which takes the <formname>$.recording shadow variable and stores it in element data, so that it's accessible to my Studio application and I can turn around and submit it to the external app.

This works fine. However, we've noticed a few problems with the audio data:

1) The audio starts with an SND header (first 4 characters in the data are .snd), which is not supported by our external app. To get around this, we're stripping the first 24 characters off the data (SND headers are static length). Ideally we'd like to not have to do this, but it's no big deal.

2) More of a concern is that the remaining audio data is sometimes incorrect. What I mean is: after the 24-character SND header comes a RIFF header (which is normal for .wav files, and our external application knows how to process RIFFs)--but the metadata within this RIFF header is sometimes wrong! In detail: the RIFF header contains a field that specifies the length of the audio subchunk. We can compare this to the number of bytes (characters) present in the raw audio data. We expect the two numbers should match; the point of the RIFF header is to provide metadata about the audio, so naturally, the header should tell us how big the audio is. BUT, if the true size of the audio (measured by counting the characters in the byte array) is > 32768 and < 40960, then the RIFF header contains the wrong value. It shows a value much smaller than the actual number of characters present. The RIFF header will show an incorrect value between 16174 and 16398.

The effect of this is that our external application processes the RIFF header and assumes the audio data is much smaller than the true size. So we effectively "cut off" the end of the utterance.

Has anyone seen anything like this before? I've checked with Nuance engineers but they don't see anything wrong with the NR9 setup.

Thanks,

Ben

goujain · ‎11-02-2016

Ben,

Haven't heard back from Gateway team till now, you may consider to post it on Gateway devnet at

Voice Gateway

From the CVP point of view I wouldn't recommend chopping off the header, which is causing the issue. But I think Gateway team can best reply to this.

View solution in original post

goujain · ‎10-19-2016

I have forwarded query to VXML Gateway team for inputs, will reply as soon as i have inputs from them.

goujain · ‎11-02-2016

Ben,

Haven't heard back from Gateway team till now, you may consider to post it on Gateway devnet at

Voice Gateway

From the CVP point of view I wouldn't recommend chopping off the header, which is causing the issue. But I think Gateway team can best reply to this.