IPB

Welcome Guest ( Log In | Register )

 
Reply to this topicStart new topic
DAMAGE - File damaging tool, Useful to test the error resistance of lossless codecs
TBeck
post Jul 5 2006, 21:29
Post #1


TAK Developer


Group: Developer
Posts: 1098
Joined: 1-April 06
Member No.: 29051



Well, i know that there is an Upload section. But i first would like to ask, if anyone would find this little tool useful. It's one several tools i have written to test Yalac and other lossless audio codecs.

The documentation is in html, but i don't know how to insert it here. Therefore i perform a plain text copy. May look a bit strange.

Overview

I wrote Damage to test the error recognition and recovery abilities of my new lossless audio compressor YALAC.

Damage generates a copy of user selected files, applies the extension '.err' and then damages the copies. You can define damage (bit-) patterns and the frequency of the damage. A list of the changes of the data is beeing written to a protocol file.

Command line options

Helpscreen
CODE
DAMAGE files [-e -f -s -w]

files       specify file or directory (Dir\*.ext) to be processed
-e x1 x2... specify up to 5 errors xn of type i or r:
              i 3     = (i)nvert a sequence of 3 bits
              r 10010 = (r)eplace a bit sequence with bits 10010
                        (msb left, up to 40 bits)
            The default definition is:
              i 1 i 2 i 3 i 36 r 000000000000000000000000000000000000
-f r x      relative frequency of damage as errors per MByte.
              Maximum: 128 Default: 1
-f a x      absolute frequency of damage as errors per file.
              Up to 1024, but will also be limited to relative maximum.
-s x        no damage until file position x in bytes. Default: 4096
-w          wait for enter key when finished

files

Specify a single file or use wildcards.

Examples:

d:\VocComp_Data\Sample.wav

Damage file "Sample.wav" in directory "d:\VocComp_Data".

d:\VocComp_Data\*.wav

Damage any file with the extension ".wav" in directory "d:\VocComp_Data".

*.*

Damage any file ".wav" in the current directory.

Damage creates files with the same name as the source, but with the extension '.err':

Sample.wav -> Sample.wav.err

Existing files will always be overwritten without any warning!

-e x1 x2...

Specify up to 5 errors xn of type i or r:

i 3

(I)nvert a sequence of 3 bits. All bits are beeing flipped. Valid range: 1 to 40 bits.

r 10010

®eplace a bit sequence with bits 10010. The leading (left bit) is the most significant. Valid range: 1 to 40 bits.

The default definition is:

i 1
i 2
i 3
i 36
r 000000000000000000000000000000000000

-f

Specify the frequency of the errors. The error patterns (see -e) will be randomly repeated if necessary.

Specify a relative frequency as errors per MByte:

-f r 8

Generates 8 errors per MByte. Maximum: 128.

Or specify the absolute frequency of damage as errors per file:

-f a 25

Generates 25 errors per file. The count will be limited to the relative maximum of 128 per MByte, if the file is small.

In both cases the error count is limited to 1024 errors per file.

The default setting is: -f r 1.

-s x

No damage until file position x in bytes. Default: 4096.

Useful if you don't want to damage a file header.

Protocol file

Damage generates a protocol file "Damage_Result.txt" in the source file directory. It contains a detailed list of any changes performed on the files.

Example:
CODE
D:\VocComp\Tools\ATrain.yaa

No   Position              BitOfs BitNum Original           New value
   1      142614  00022D16      0     36  B1 DA 8A 49 DD     00 00 00 00 D0  
   2      444685  0006C90D      0     36  6F 17 E4 F2 14     90 E8 1B 0D 1B  
   3      771488  000BC5A0      5      1  95                 B5              
   4     1046975  000FF9BF      5      3  9A                 7A              
   5     1264657  00134C11      1      2  78                 7E              
   6     1454473  00163189      5     36  7A F5 C2 36 2A 36  1A 00 00 00 00 36

The file name is beeing followed by a list of the applied errors, one per line.

Position

File position of the first affected byte in bytes. First in decimal, then in hexadecimal representation.

BitOfs

Position (0-7) of the first affected bit in the byte specified by file position.

BitNum

Count of affected bits.

Original - New value

Comparison of the original and the new values after the damage. Both in hexadecimal notation, least significant (lowest adress) byte left.
Go to the top of the page
+Quote Post
Shade[ST]
post Jul 5 2006, 21:43
Post #2





Group: Members
Posts: 1189
Joined: 19-May 05
From: Montreal, Canada
Member No.: 22144



This program is interesting, but it would be nice if you could set a damage frequency and generate damage randomly. Maybe using the Mersenne Twister could help you; Also, it would be nice having random bits instead of just inversed ones, or replaced with known bits.

Compare feature is nice, though. So is the copy feature, but it should be toggleable.

Also, you should be able to limit damage to a specific zone (in bytes?) in the file, or to limit the damage per zone (eg, 3 bits changed every 30 bytes, maximum)

Please tell me if this is clear, or if I am asking too much.

You're my favourite developper wink.gif

Peace,
Tristan.
Go to the top of the page
+Quote Post
TBeck
post Jul 5 2006, 22:17
Post #3


TAK Developer


Group: Developer
Posts: 1098
Joined: 1-April 06
Member No.: 29051



QUOTE
' date='Jul 5 2006, 22:43' post='409194']
This program is interesting, but it would be nice if you could set a damage frequency and generate damage randomly. Maybe using the Mersenne Twister could help you; Also, it would be nice having random bits instead of just inversed ones, or replaced with known bits.

There is some randomness in the positions. The frequency specification defines intervals, for instance 1 MB per error. The first error will be inserted between 0 to 1.5 MB the next between the end of the previous and 2.5 MB and so on.

My first implemementation has used random bit patterns. But that was not optimal for my purposes. I like to have control over the conditions. Otherwise the results would not be easy to interpret. And it seems to make even more sense to use controled conditions for comparisons between compressor. You can evaluate, how resistent they for instance are to 1 bit or 2 bit errors, but if you apply random errors and get different results for two compressors, you will not know, if this is beeing caused by the difference of the compressors or by the diffence of the test data caused by the randomness.

QUOTE
' date='Jul 5 2006, 22:43' post='409194']
Also, you should be able to limit damage to a specific zone (in bytes?) in the file, or to limit the damage per zone (eg, 3 bits changed every 30 bytes, maximum)

That's indeed useful!

QUOTE
' date='Jul 5 2006, 22:43' post='409194']
You're my favourite developper wink.gif

You know, this one is especially for you...
Go to the top of the page
+Quote Post
jcoalson
post Jul 6 2006, 06:58
Post #4


FLAC Developer


Group: Developer
Posts: 1526
Joined: 27-February 02
Member No.: 1408



neat program.

QUOTE (TBeck @ Jul 5 2006, 16:17) *
My first implemementation has used random bit patterns. But that was not optimal for my purposes. I like to have control over the conditions. Otherwise the results would not be easy to interpret. And it seems to make even more sense to use controled conditions for comparisons between compressor. You can evaluate, how resistent they for instance are to 1 bit or 2 bit errors, but if you apply random errors and get different results for two compressors, you will not know, if this is beeing caused by the difference of the compressors or by the diffence of the test data caused by the randomness.


this can mostly be solved by using a pseudo-random generator and exposing the seed as a command-line option.

Josh
Go to the top of the page
+Quote Post
SebastianG
post Jul 6 2006, 08:06
Post #5





Group: Developer
Posts: 1318
Joined: 20-March 04
From: Göttingen (DE)
Member No.: 12875



I just want to say that real errors are usually "bursts". (a group of consecutive bytes that are totally wrong -- ie a whole sector).

This could be modeled via a two-state system. Depending on the state the current bit will either be kept or replaced by a randomly chosen one. After processing a bit you either stay in the same state or change to the other based on a randomly chosen number and a threshold. Obviously the model's parameter are the two thresholds (one for each state). The initial state should be the "keep-original-data-state".

I just realized that this is actually a simulation of a Markov process.

This post has been edited by SebastianG: Jul 6 2006, 08:16
Go to the top of the page
+Quote Post
TBeck
post Jul 6 2006, 08:29
Post #6


TAK Developer


Group: Developer
Posts: 1098
Joined: 1-April 06
Member No.: 29051



QUOTE (jcoalson @ Jul 6 2006, 07:58) *
neat program.

Yes, nothing special. But it's possibly easier than using a hex editor for the intended purpose.

QUOTE (jcoalson @ Jul 6 2006, 07:58) *
this can mostly be solved by using a pseudo-random generator and exposing the seed as a command-line option.

Good idea! I will add this as option.
Go to the top of the page
+Quote Post
TBeck
post Jul 6 2006, 09:31
Post #7


TAK Developer


Group: Developer
Posts: 1098
Joined: 1-April 06
Member No.: 29051



QUOTE (SebastianG @ Jul 6 2006, 09:06) *
I just want to say that real errors are usually "bursts". (a group of consecutive bytes that are totally wrong -- ie a whole sector).

Good point! Possibly i will add an option to generate such erors.

I know, that the current implementation is very limited. It would be nice to be able to specify some model of the expected errors: error types, distribution of the error types, distribution of distances between errors and possibly more. But this is currently beyond the scope of my quick and dirty tool.

QUOTE (SebastianG @ Jul 6 2006, 09:06) *
This could be modeled via a two-state system. Depending on the state the current bit will either be kept or replaced by a randomly chosen one. After processing a bit you either stay in the same state or change to the other based on a randomly chosen number and a threshold. Obviously the model's parameter are the two thresholds (one for each state). The initial state should be the "keep-original-data-state".

I just realized that this is actually a simulation of a Markov process.

That's interesting. Honestly i don't know nearly nothing about Markov processes, but this might be a good starting point for me, if i should like to optimize this tiny tool.
Go to the top of the page
+Quote Post
cabbagerat
post Jul 6 2006, 16:10
Post #8





Group: Members
Posts: 1018
Joined: 27-September 03
From: Cape Town
Member No.: 9042



While I think a complete error simulation would be overkill, support for both random bit errors and burst errors would be useful. This is because error correction schemes respond differently to these two different classes of errors and may perform very well on one and really badly on the other.


--------------------
Simulate your radar: http://www.brooker.co.za/fers/
Go to the top of the page
+Quote Post
TBeck
post Jul 6 2006, 18:17
Post #9


TAK Developer


Group: Developer
Posts: 1098
Joined: 1-April 06
Member No.: 29051



QUOTE (cabbagerat @ Jul 6 2006, 17:10) *
While I think a complete error simulation would be overkill, support for both random bit errors and burst errors would be useful. This is because error correction schemes respond differently to these two different classes of errors and may perform very well on one and really badly on the other.

I don't see a need for random patterns. The default test set allready provides some randomness:

- The test patterns are beeing applied to random positions within the file.
- The inversion of existing bits at random positions creates different patterns according to the variations of the original bits. Obvious exception: if the data bits don't vary (for instance all zero), the inversion brings no variation. But this is very unlikely to happen with compressed data, that should be random to some degree.

I am interested into 2 test cases:

1) If the data is beeing protected by a CRC, which bit patterns will stay undetected.

To simplify it a bit: CRC's should be able to detect any 1 bit error and any (single) 2 bit error if the data size isn't to big. Bursts should be detected up to the bit count of the CRC (again simplified). Damage's default test set damages 1, 2, 3 and 36 bits. Allready a small chance to fall though the CRC-32.

2) If the data is not protected, what happens to the decoder.

In this case allready a single bit error in the right place can bring the decoder into trouble.
Go to the top of the page
+Quote Post
cabbagerat
post Jul 6 2006, 18:31
Post #10





Group: Members
Posts: 1018
Joined: 27-September 03
From: Cape Town
Member No.: 9042



QUOTE (TBeck @ Jul 6 2006, 09:17) *
QUOTE (cabbagerat @ Jul 6 2006, 17:10) *

While I think a complete error simulation would be overkill, support for both random bit errors and burst errors would be useful. This is because error correction schemes respond differently to these two different classes of errors and may perform very well on one and really badly on the other.

I don't see a need for random patterns. The default test set allready provides some randomness:

- The test patterns are beeing applied to random positions within the file.
- The inversion of existing bits at random positions creates different patterns according to the variations of the original bits. Obvious exception: if the data bits don't vary (for instance all zero), the inversion brings no variation. But this is very unlikely to happen with compressed data, that should be random to some degree.
That's more or less what I meant about "random bit errors" - the sort of errors that noise introduces on some sorts of channel. Maybe what i should have said is "errors where every bit in the file has the same finite probability of being flipped" and the second kind being "errors where the probability a bit is flipped depends on recent bit flips". The first kind you have already implemented, and some people have already suggested the second kind.

These two types of errors will provide decent coverage for testing commonly used ECC schemes, ranging from the simple (CRC) to more complex things like VRS and Turbo codes. I don't know if any user file formats actually use these more advanced schemes, but it would still be a cool feature, IMHO.

Anyways, it looks like a cool piece of software and is a great idea. Good work.


--------------------
Simulate your radar: http://www.brooker.co.za/fers/
Go to the top of the page
+Quote Post

Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 30th September 2014 - 13:25