IPB

Welcome Guest ( Log In | Register )

New opus test build
NullC
post Apr 16 2013, 23:31
Post #1





Group: Developer
Posts: 200
Joined: 8-July 03
Member No.: 7653



I've posted up a new win32 build of libopus git master: https://people.xiph.org/~greg/opus-tools_exp_a8783a1.zip

There aren't many major changes from the prior alpha but a number of small bugs fix, some of which impact quality. I think we're particular interested in knowing about any quality regressions vs prior versions and particular cases where the speech/music detection get confused. (For the actual release we'll likely back that detector off at high rates but for now it's set more aggressively in order to find problems)
Go to the top of the page
+Quote Post
 
Start new topic
Replies
Dynamic
post Apr 17 2013, 19:32
Post #2





Group: Members
Posts: 832
Joined: 17-September 06
Member No.: 35307



I was trying to test the speech/music detection, but wouldn't mind a bit of guidance. I presume it mainly kicks in at the lower end of the target bitrate range.

My source was a digital soundboard recording of nearly three hours length at a small gig recently, captured over USB cable at 48kHz/24bit stereo including some talk from the singers between songs (usually remembering to mute the effects, so fairly clean and dry) before and after the music - mainly ALAC backing tracks via an iPad and one or two singers, occasional live semiacoustic guitar, plus fiddle and crescent tambourine picked up on one of the vocal mics. I mainly jumped to the track boundaries and places I knew the music subsided to listen for music/speech variations.

I could do with some guidance as I'm not entirely sure what bitrate region I should be concentrating on for speech/music detection to have a significant effect, but I presume it's certainly below 48kbps, and most likely a fair bit below 32kbps that the transition might be quite noticeable if speech mode remains active too long.

I used this test version of oggenc with the option --bitrate nn.n which is vbr by default to make four versions at first with 40.0, 32.0, 27.0, 24.0 as the bitrates.

All four versions seemed to me limited to 12 kHz (SWB) audio bandwidth once the music really kicked in, though at 40 kbps there was full bandwidth occasionally during some of the speech and sometimes as the song started (e.g. first cymbal hit) showing about 16-18 kHz then cut down to 12 kHz without a bad transition (saw it on fb2k spectrogram, but barely noticed, and I think it was in CELT mode the whole time, as the bass and percussion were well handled). I guess it's possible that these instances were hybrid mode at FB switching to CELT-only at SWB, but they sounded OK.

I then tried a 48.0 kbps version, which was 20kHz full bandwidth much of the time, especially with percussive backing tracks. In the closing track in particular I noticed it was 12 kHz bandwidth - a track with little high end and subdued percussion, but there were a couple of percussive hits that registered up to 15 or 16 kHz in the original backing track that didn't generate a change in bandwidth in the 48.0 kbps opus, which I think is probably good, as it sounded consistent throughout and probably reduced the likelihood of artifacts among the multilayered vocals they'd laid down in the backing of the closing number, which was recorded many years ago and I have since only tweaked its gain envelope for dynamic variation and increased impact as the layered vocals really build up in the final verse and chorus.

So I got the impression that bandwidth detection was doing a reasonably good job with nothing sounding egregiously out-of-place, and also I got the impression that bandwidth restriction to reduce artifacts from bitrate starvation was keeping obvious artifacts mostly at bay (whereas I personally dislike what I've heard of HE-AACv2 at 24 to 32 kbps as far as treble reproduction is concerned - albeit far superior to the old RealAudio proprietary codec which over-reached in audio bandwidth at the expense of bad treble artifacts in my experience).

I also did 16.0 and 20.0 kbps versions. Both were limited to 8kHz (WB) bandwidth throughout, it seemed, with the 16.0kbps being pretty rough in a number of spots and almost certainly in SILK (LP) mode most of the time. I felt that the 20.0kbps version was a good deal more musical, and would in many cases be hard to distinguish from a 16.0 kHz sampled PCM file, though I found some rough spots.

One of these rough spots was on the first beat of a new song after a speech section between songs, and I worked up in bitrate to find it was still a bit poor at 27.0 kbps, but reasonably well-handled at 32.0 kbps and might be a speech/music detection candidate. Then again, it might just be somewhere that it music and just needs more bitrate to handle the simultaneous instruments present.

I trimmed the lossless version on whole second boundaries to make a 56 second clip (less than 30 seconds of copyright material each side, includes tail of one track, speech between tracks and the start of the second track) to ensure the frames line up with the original opus (20.0ms frames). It's a longer clip than the usual 30 seconds because is contains less than 30 sec of music and I sought to include the music-speech-music transitions. I doubt it's important, but the source of

The problem is in the first note of the second track, starting at the 0:36 point, which includes a jumble of percussion, arpeggiated chords and bass note but is the first appreciable sound since the preceding speech sound.

I attach this as a FLAC (48kHz, 24bit stereo, 1058kbps, 7.24MiB captured via BOSE Tonematch USB & Audacity) and also attach a selection of Opus encodes using this build at the target bitrates 16.0, 20.0, 24.0, 27.0, 32.0, 40.0 and 48.0 kbps, which I uploaded in a single ZIP archive (1.33MiB)

The whole show recording had a ReplayGain of +2.98 dB (no tagging in attached files) and the live vocals are, I believe, dead-centre.
Attached File(s)
Attached File  CT_00_42_53_to_00_43_49.flac ( 7.24MB ) Number of downloads: 114
Attached File  opus_16_20_24_27_32_40_48kbps_versions.zip ( 1.33MB ) Number of downloads: 173
 
Go to the top of the page
+Quote Post
jmvalin
post Apr 17 2013, 21:13
Post #3


Xiph.org Speex developer


Group: Developer
Posts: 485
Joined: 21-August 02
Member No.: 3134



QUOTE (Dynamic @ Apr 17 2013, 14:32) *
I was trying to test the speech/music detection, but wouldn't mind a bit of guidance. I presume it mainly kicks in at the lower end of the target bitrate range.


Yes, speech/music is mostly used at lower rates. The range over which it has an impact is 20-64 kb/s, though it's most noticeable between 24 kb/s and 48 kb/s for stereo.

QUOTE (Dynamic @ Apr 17 2013, 14:32) *
All four versions seemed to me limited to 12 kHz (SWB) audio bandwidth once the music really kicked in, though at 40 kbps there was full bandwidth occasionally during some of the speech and sometimes as the song started (e.g. first cymbal hit) showing about 16-18 kHz then cut down to 12 kHz without a bad transition (saw it on fb2k spectrogram, but barely noticed, and I think it was in CELT mode the whole time, as the bass and percussion were well handled). I guess it's possible that these instances were hybrid mode at FB switching to CELT-only at SWB, but they sounded OK.


Yeah, bandwidth switched as still a known issue. It's not quite clear how to solve this. It would be nice if it didn't change like this along with the decisions, but at the same time it's trying to do what's optimal for each. I don't yet have a good solution for this -- even a conceptual one. Open to suggestions.

QUOTE (Dynamic @ Apr 17 2013, 14:32) *
One of these rough spots was on the first beat of a new song after a speech section between songs, and I worked up in bitrate to find it was still a bit poor at 27.0 kbps, but reasonably well-handled at 32.0 kbps and might be a speech/music detection candidate. Then again, it might just be somewhere that it music and just needs more bitrate to handle the simultaneous instruments present.


The good news is that Opus already supports "delayed decisions" in the speech/music detector, so it's possible to avoid having the transitions happening too late. It's not yet supported in the opusenc tool, but that should happen soon.
Go to the top of the page
+Quote Post

Posts in this topic
- NullC   New opus test build   Apr 16 2013, 23:31
- - Dynamic   I was trying to test the speech/music detection, b...   Apr 17 2013, 19:32
|- - jmvalin   QUOTE (Dynamic @ Apr 17 2013, 14:32) I wa...   Apr 17 2013, 21:13
- - Gainless   Good to see the Muse Breaks fix finally bundled. B...   Apr 17 2013, 20:32
|- - jmvalin   QUOTE (Gainless @ Apr 17 2013, 15:32) Goo...   Apr 17 2013, 21:13
|- - Gainless   QUOTE (jmvalin @ Apr 17 2013, 22:13) I th...   Apr 19 2013, 14:57
- - CoRoNe   Using this test build (libopus 1.1a-67) to encode ...   Apr 18 2013, 20:27
- - Gainless   Forgot to add this one: Probably the worst of the...   Apr 20 2013, 10:52
- - IgorC   Gainless, Can You provide some ABC/HR logs compar...   Apr 20 2013, 16:14
|- - Gainless   QUOTE (IgorC @ Apr 20 2013, 17:14) Gainle...   Apr 21 2013, 10:33
- - IgorC   Here are some results for 96 kbps 1.0.2 https://f...   Apr 21 2013, 02:32
- - Kamedo2   I visualized IgorC's precious results on the p...   Apr 21 2013, 03:17
- - IgorC   Thank You, Kamedo2. I was thinking to add it but h...   Apr 21 2013, 04:00
- - Gainless   Ok, I've done ABC-HR tests with the samples at...   Apr 21 2013, 15:18
- - IgorC   There are two first samples (01 and 05) those have...   Apr 25 2013, 05:39
- - Gainless   What I don't really get, is why Opus's VBR...   Apr 25 2013, 10:41
|- - jmvalin   QUOTE (Gainless @ Apr 25 2013, 05:41) Wha...   Apr 25 2013, 13:06
|- - Gainless   QUOTE (jmvalin @ Apr 25 2013, 14:06) That...   Apr 25 2013, 14:00
|- - jmvalin   QUOTE (Gainless @ Apr 25 2013, 09:00) Can...   Apr 25 2013, 16:45
|- - Gainless   QUOTE (jmvalin @ Apr 25 2013, 17:45) Depe...   Apr 25 2013, 21:27
- - IgorC   Gainless, Some time ago You have submited one sam...   Apr 25 2013, 21:56
|- - Gainless   QUOTE (IgorC @ Apr 25 2013, 22:56) Gainle...   Apr 25 2013, 22:11
- - IgorC   Yes, it was Project 100 Sample. http://www.hydrog...   Apr 27 2013, 19:35
- - Gainless   Ok, then this is at least clear. What's probab...   Apr 27 2013, 21:11
- - IgorC   Have tried "Iron Man" sample with the ex...   May 11 2013, 19:26
- - Gainless   Igor, you may also take this sample into considera...   May 12 2013, 12:26
- - IgorC   Gainless, Yes, 1.0.2 was better than the experi...   May 12 2013, 21:42
- - Gainless   Maybe some framesize otpimation/detector can be do...   May 13 2013, 20:33
|- - jmvalin   QUOTE (Gainless @ May 13 2013, 15:33) Btw...   May 14 2013, 04:52
|- - Gainless   Sorry^1000 to Jmvalin and Igor, was my fault with ...   May 14 2013, 08:10
- - Gainless   Here's another sample that could benefit from ...   May 23 2013, 17:15
|- - jmvalin   QUOTE (Gainless @ May 23 2013, 12:15) Her...   May 24 2013, 08:43
|- - Gainless   QUOTE (jmvalin @ May 24 2013, 09:43) QUOT...   Jun 2 2013, 21:41
- - ChristianK   Hi, is this the place to also discus the latest ve...   Jun 14 2013, 19:02
- - ChristianK   Well, I'll just go ahead and post my question ...   Jun 16 2013, 22:02
|- - jmvalin   QUOTE (ChristianK @ Jun 16 2013, 17:02) I...   Jun 16 2013, 22:16
- - ChristianK   Yes, I'm curious on how to control the stereo....   Jun 17 2013, 19:10
|- - jmvalin   QUOTE (ChristianK @ Jun 17 2013, 14:10) Y...   Jun 17 2013, 19:25
|- - kabal4e   QUOTE You should replace the second line simply wi...   Jun 18 2013, 23:06
- - ChristianK   QUOTE (jmvalin @ Jun 17 2013, 20:25) Wher...   Jun 17 2013, 21:22
|- - jmvalin   QUOTE (ChristianK @ Jun 17 2013, 16:22) O...   Jun 18 2013, 00:23
|- - ChristianK   QUOTE (jmvalin @ Jun 18 2013, 01:23) You ...   Jun 30 2013, 20:50
- - Bostedclog   Could you tell me which opus encoder we should be ...   Jun 25 2013, 18:00
- - IgorC   Previously I gave a try to new temporal VBR at 48...   Jul 1 2013, 04:40
- - Gainless   A question: Does the tonality detector inside Opus...   Jul 4 2013, 15:03


Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 29th November 2014 - 11:00