Reading Flac tags using C#

Topic: Reading Flac tags using C# (Read 31388 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

Reading Flac tags using C#

Reply #25 – 2014-05-05 06:21:51

Quote from: lvqcl on 2014-05-04 12:42:59

Quote from: pkfox on 2014-05-04 09:25:12
I don't know if I'm reading the files correctly as the data after the "fLaC" marker is very odd looking ( I'm reading 4 bytes at a time into a byte array ) smiley faces and musical notes - do I need to convert these values ?

What did you expect to see - numbers in text format? FLAC is a binary format so you have to interpret these 32 bits correctly:
Quote from: nu774 on 2014-05-04 11:52:01
1bit flag, followed by 7bit BLOCK_TYPE, followed by 24bit length of the metadata

Hi there and thanks, how do I interpret the data ?

Reading Flac tags using C#

Reply #26 – 2014-05-05 07:13:31

Quote from: pkfox on 2014-05-05 06:21:51

Hi there and thanks, how do I interpret the data ?

See the structure here:

https://xiph.org/flac/format.html#metadata_block_header

But basically bit zero tells you if its the last block header, bits 1-8 encode the block type, and the remaining 3 bytes are the length of the block.

Reading Flac tags using C#

Reply #27 – 2014-05-05 10:09:15

Quote from: lvqcl on 2014-05-04 12:42:59

Quote from: pkfox on 2014-05-04 09:25:12
I don't know if I'm reading the files correctly as the data after the "fLaC" marker is very odd looking ( I'm reading 4 bytes at a time into a byte array ) smiley faces and musical notes - do I need to convert these values ?

What did you expect to see - numbers in text format? FLAC is a binary format so you have to interpret these 32 bits correctly:
Quote from: nu774 on 2014-05-04 11:52:01
1bit flag, followed by 7bit BLOCK_TYPE, followed by 24bit length of the metadata

How do I interpret these 32 bits correctly ?

Reading Flac tags using C#

Reply #28 – 2014-05-05 10:14:20

Quote from: saratoga on 2014-05-05 07:13:31

Quote from: pkfox on 2014-05-05 06:21:51
Hi there and thanks, how do I interpret the data ?

See the structure here:

https://xiph.org/flac/format.html#metadata_block_header

But basically bit zero tells you if its the last block header, bits 1-8 encode the block type, and the remaining 3 bytes are the length of the block.

Pardon my ignorance but I'm reading the data into a 4 byte array which I take to mean 32 bits am I right in thinking this ?

Reading Flac tags using C#

Reply #29 – 2014-05-05 13:05:09

Quote from: pkfox on 2014-05-05 10:14:20

Pardon my ignorance but I'm reading the data into a 4 byte array which I take to mean 32 bits am I right in thinking this ?

Yes - but remember that the sign convention is Big-Endian. With this in mind, you have to be sure as to how your compiler is treating the reads. If they are read as little-endian then you need to take that into account.

Reading Flac tags using C#

Reply #30 – 2014-05-05 16:59:16

Quote from: pkfox on 2014-05-05 10:14:20

Pardon my ignorance but I'm reading the data into a 4 byte array which I take to mean 32 bits am I right in thinking this ?

Yes that is correct. There are 8 bits in a byte, so if you have 4 bytes you also have 32 bits.

Anyway hydrogen audio may not be the best place for programming questions, and parsing compression formats may not be the best way to learn about programming. I would use the libraries other people linked.

Reading Flac tags using C#

Reply #31 – 2014-05-05 18:03:29

Your have a unsigned char[4], possibly as a pointer. Best to make it unsigned because you don't want large numbers getting treated as negative.

The first member of the array has to be treated as a bitmap. The first bit indicates whether this header block is the last one before the audio data. You can test it using byte_one & 0x80; Or ignore it using byte_one & 0x7F;

The rest of the first byte is a number indicating the type of header block. The info header is type 0, the metadata block is type 4.

The other three bytes are to be interpreted as a big-endian number, meaning the first byte (byte two of the array) is the most significant (largest) part of the number. Best not to rely on the compiler and architecture (unless you want to get into different code based on defines about which endian is in effect) and just build yourself a 24-bit number (unsigned still!):

Code: [Select]

uint32_t length = byte[1]<<16 | byte[2]<<8 | byte[3];

Note that I had to use a 32 bit integer even though the number is only 24 bits long, because there is no 24-bit numeric data type in C.

So now you know how much metadata there is However, you probably don't need to know that since the metadata block itself contains its own internal length indicators. The overall length can be a valuable sanity check though. You don't want to be parsing out the the metadata block if the length field says it is zero characters. Your next challenge comes with parsing the metadata block itself where the numbers are all little-endian

Reading Flac tags using C#

Reply #32 – 2014-05-05 18:56:52

Quote from: lithopsian on 2014-05-05 18:03:29

The other three bytes are to be interpreted as a big-endian number, meaning the first byte (byte two of the array) is the most significant (largest) part of the number. Best not to rely on the compiler and architecture (unless you want to get into different code based on defines about which endian is in effect) and just build yourself a 24-bit number (unsigned still!):
Code: [Select]
uint32_t length = byte[1]<<16 | byte[2]<<8 | byte[3];

Surely big-endian also means that the bytes are in reverse-bit order compared to little-endian ints in C?

Reading Flac tags using C#

Reply #33 – 2014-05-05 19:15:36

C is "endian-neutral". On some platforms it will use little endian, on others big-endian. Unless you are exchanging data with a fixed format such as a file or internet stream then you never see these details. This is driven by the underlying processor architecture.

While the byte order of a particular multi-byte data type varies with the endianness, C will always treat such types as a continuous sequence of bits, in this case 32 of them. You can shift up and down those 32 bits without worrying about which order the underlying bytes are arranged in.

What you can't do is map or cast those 32 bits onto a non-endian data type such as a character array and expect the bytes to be consistent between platforms. There are only two possibilities, and you can determine whether the high order byte is the first or last using compiler-defines, but usually it is best to avoid this sort of assumption. While fixed byte orders in files such as Flac correspond to big- or little-endian byte arrangements in memory, you do yourself a favour if you just treat them as raw arrangements of bits and don't try to get cute about whether they will map into one particular memory arrangement or another.

Reading Flac tags using C#

Reply #34 – 2014-05-05 19:16:40

I just tried to edit my first reply to clarify a few expressions, but apparently too late, so apologies if anything is a little confusing or unclear.

Reading Flac tags using C#

Reply #35 – 2014-05-05 19:18:45

Quote from: Nick.C on 2014-05-05 18:56:52

Surely big-endian also means that the bytes are in reverse-bit order compared to little-endian ints in C?

No, bytes are bytes. http://en.wikipedia.org/wiki/Endianness#At...ment_size_8-bit
Also, http://en.wikipedia.org/wiki/Endianness#.22Bit_endianness.22

Reading Flac tags using C#

Reply #36 – 2014-05-05 19:29:27

Ta both.

Reading Flac tags using C#

Reply #37 – 2014-05-05 22:30:33

The equivalent if the bytes were specified to be little-endian (internal metadata block numbers are 32 byte, 4 bytes, little endian):

Code: [Select]

uint32_t length = byte[0] | byte[1]<<8 | byte[2]<<16 | byte[3]<<24;

One or other of those would work on your machine if you just mapped the three bytes onto a long, but the other one would not. Which on depends on the machine you compile on/for.

Reading Flac tags using C#

Reply #38 – 2014-05-06 08:25:17

Quote from: lithopsian on 2014-05-05 18:03:29

Your have a unsigned char[4], possibly as a pointer. Best to make it unsigned because you don't want large numbers getting treated as negative.

The first member of the array has to be treated as a bitmap. The first bit indicates whether this header block is the last one before the audio data. You can test it using byte_one & 0x80; Or ignore it using byte_one & 0x7F;

The rest of the first byte is a number indicating the type of header block. The info header is type 0, the metadata block is type 4.

The other three bytes are to be interpreted as a big-endian number, meaning the first byte (byte two of the array) is the most significant (largest) part of the number. Best not to rely on the compiler and architecture (unless you want to get into different code based on defines about which endian is in effect) and just build yourself a 24-bit number (unsigned still!):
Code: [Select]
uint32_t length = byte[1]<<16 | byte[2]<<8 | byte[3];
Note that I had to use a 32 bit integer even though the number is only 24 bits long, because there is no 24-bit numeric data type in C.

So now you know how much metadata there is However, you probably don't need to know that since the metadata block itself contains its own internal length indicators. The overall length can be a valuable sanity check though. You don't want to be parsing out the the metadata block if the length field says it is zero characters. Your next challenge comes with parsing the metadata block itself where the numbers are all little-endian

Good lord no wonder I wasn't getting anywhere, how do I check if the block is of type 4 ? And thanks for your time.

Reading Flac tags using C#

Reply #39 – 2014-05-06 11:14:01

Quote from: lithopsian on 2014-05-05 19:16:40

I just tried to edit my first reply to clarify a few expressions, but apparently too late, so apologies if anything is a little confusing or unclear.

Hi there and thanks very much for your help, I'm a professional IT man but have never messed with stuff at bit level so forgive me if my questions seem a bit basic, my basic read code in c# is this

Code: [Select]

int BlockSize = 4;
int BytesRead = 0;
byte[] block = new Byte[BlockSize];

FileStream fs = new FileStream(filename, FileMode.Open);
BinaryReader br = new BinaryReader(fs);

// Read the file in 4 byte chunks

while ((block = br.ReadBytes(4)) != null)
{
    BytesRead += BlockSize;
    // on the first pass has block == "fLaC" as expected.
    // the second has block[0] = 0 , block[1] = 0, block[2] = 0, block[3] = 34.
    // if I apply your code
    uint length = block[1]<<16 | block[2]<<8 | block[3];
    // Unsurprisingly I get 34 which is the value I have in block[3] and the other elements are 0 - but interestingly 
    // 34 hex = 52 decimal which is the ASCII code for '4' which is the block type I'm looking for.
}

fs.Close();

In my ignorance of bit manipulation I'm guessing that on my second read the values of my byte array are interpreted thus

Code: [Select]

block[0] = 0 // which means this is not the last block.
block[1] + block[2] + block[3] = the block type, which in this case is 4 if my guess that the 34 value is in hex is correct.

How am I doing ?

Reading Flac tags using C#

Reply #40 – 2014-05-06 15:02:13

Quote from: lithopsian on 2014-05-05 18:03:29

Your have a unsigned char[4], possibly as a pointer. Best to make it unsigned because you don't want large numbers getting treated as negative.

The first member of the array has to be treated as a bitmap. The first bit indicates whether this header block is the last one before the audio data. You can test it using byte_one & 0x80; Or ignore it using byte_one & 0x7F;

The rest of the first byte is a number indicating the type of header block. The info header is type 0, the metadata block is type 4.

The other three bytes are to be interpreted as a big-endian number, meaning the first byte (byte two of the array) is the most significant (largest) part of the number. Best not to rely on the compiler and architecture (unless you want to get into different code based on defines about which endian is in effect) and just build yourself a 24-bit number (unsigned still!):
Code: [Select]
uint32_t length = byte[1]<<16 | byte[2]<<8 | byte[3];
Note that I had to use a 32 bit integer even though the number is only 24 bits long, because there is no 24-bit numeric data type in C.

So now you know how much metadata there is However, you probably don't need to know that since the metadata block itself contains its own internal length indicators. The overall length can be a valuable sanity check though. You don't want to be parsing out the the metadata block if the length field says it is zero characters. Your next challenge comes with parsing the metadata block itself where the numbers are all little-endian

Hi again, I checked the "Endianness" of my c# compiler with BitConverter.IsLittleEndian and it reports it is Little Endian so I guess I need to reverse the byte order ?

Reading Flac tags using C#

Reply #41 – 2014-05-08 11:56:22

9 out of 0x0A programmers never have to go near endianness, but it is an important concept when transferring multi-byte numbers in a completely portable way. The whole point of the bitshifting code is that you still shouldn't have to worry about the endiannness of your machine, only about arranging 24 or 32 bits in the correct way in your file.

I see you are also struggling with the concept of whether "4" should be encoded as a binary 4 or ascii 4, and also between bits and bytes. The numbers in the Flac metadata blocks are not stored as ascii digits. A "4" is stored as 0x04 (or \4 in octal), not 0x34, ignoring for now multi-byte mappings.

The metadata block type is only the first byte, or to be more accurate the lowest 7 bits of the first byte. The other three bytes are the length of the metadata block. So the "4" is stored simply as 0x04 in a single byte. If it is the last metadata block, then set the high order bit and you get 0x84. You could set the bit simply by adding 0x80 to your block type number., but since you are setting a bit, doing 0x04 | 0x80 would be clearer. You could also do it with octal, for example '\4' | '\200', giving '\204'. Similarly, for reading the first character (block[0] is the last block bit and the block type number), you can check the bit using block[0] & 0x80, and you can obtain the block type number (excluding the high order bit) using block[0] & 0x7F. Note that endianness is not relevant for this single byte.

For the block length, now you are constructing a multi-byte number and should be doing your bit-shifting. Remember, still numbers, not ascii digits. If block[3] really does contain 0x34 then (ignoring the two higher order, bigger, bytes) then that represents a length of 52 characters, the length of the metadata block itself (excluding the four header bytes). Quite a coincidence, but might be right. Are you parsing a real Flac file? Lengths smaller than 256 bytes will only have block[3] set, so if you want to really test the bit-shifting code then you'll need bigger lengths.

P.S. Your compiler is neither big endian nor little endian. It can do either, but it detects and reports the relevant endiannness for the machine you are running it on.

Reading Flac tags using C#

Reply #42 – 2014-05-08 15:41:05

Quote from: lithopsian on 2014-05-08 11:56:22

If block[3] really does contain 0x34 then (ignoring the two higher order, bigger, bytes) then that represents a length of 52 characters

I'm sure that it contains 34 (dec), not 0x34.

Reading Flac tags using C#

Reply #43 – 2014-05-09 12:30:40

Quote from: lithopsian on 2014-05-08 11:56:22

9 out of 0x0A programmers never have to go near endianness, but it is an important concept when transferring multi-byte numbers in a completely portable way. The whole point of the bitshifting code is that you still shouldn't have to worry about the endiannness of your machine, only about arranging 24 or 32 bits in the correct way in your file.

I see you are also struggling with the concept of whether "4" should be encoded as a binary 4 or ascii 4, and also between bits and bytes. The numbers in the Flac metadata blocks are not stored as ascii digits. A "4" is stored as 0x04 (or \4 in octal), not 0x34, ignoring for now multi-byte mappings.

The metadata block type is only the first byte, or to be more accurate the lowest 7 bits of the first byte. The other three bytes are the length of the metadata block. So the "4" is stored simply as 0x04 in a single byte. If it is the last metadata block, then set the high order bit and you get 0x84. You could set the bit simply by adding 0x80 to your block type number., but since you are setting a bit, doing 0x04 | 0x80 would be clearer. You could also do it with octal, for example '\4' | '\200', giving '\204'. Similarly, for reading the first character (block[0] is the last block bit and the block type number), you can check the bit using block[0] & 0x80, and you can obtain the block type number (excluding the high order bit) using block[0] & 0x7F. Note that endianness is not relevant for this single byte.

For the block length, now you are constructing a multi-byte number and should be doing your bit-shifting. Remember, still numbers, not ascii digits. If block[3] really does contain 0x34 then (ignoring the two higher order, bigger, bytes) then that represents a length of 52 characters, the length of the metadata block itself (excluding the four header bytes). Quite a coincidence, but might be right. Are you parsing a real Flac file? Lengths smaller than 256 bytes will only have block[3] set, so if you want to really test the bit-shifting code then you'll need bigger lengths.

P.S. Your compiler is neither big endian nor little endian. It can do either, but it detects and reports the relevant endiannness for the machine you are running it on.

Hi there and thanks again, to be clear, are all the values I see in my byte array in hex ? obviously 0x04 is just 4, the problem ( one of them ! ) I'm having is, I'm reading the file in 4 byte chunks and the first byte which you say should be 0x04 for a comment block is never there ! If I did find a value of 0x04 in the first byte would your bit shifting code on the remaining 3 bytes give me the length of the meta block ? Also can you show me how I would code the the test for 0x04 being in the first byte of the block array ? if(block[0] & 0x7F) ? thank you very much for your help

Reading Flac tags using C#

Reply #44 – 2014-05-09 12:42:36

Quote from: lvqcl on 2014-05-08 15:41:05

Quote from: lithopsian on 2014-05-08 11:56:22
If block[3] really does contain 0x34 then (ignoring the two higher order, bigger, bytes) then that represents a length of 52 characters

I'm sure that it contains 34 (dec), not 0x34.

Hi there, why do you think that ?

Reading Flac tags using C#

Reply #45 – 2014-05-09 12:46:21

The vorbis comment block is never the first block after fLaC. The first block is always streaminfo and its magic byte is 0x00 Almost never 0x80 because it is rarely the last block. The streaminfo block has a fixed length, but still has a length indicator in the three bytes following the 0x00: always 0x00, 0x00, and 0x22, indicating 34 bytes.

Following that there are other metadata blocks, which may or may not be vorbis comment blocks. The seektable seems to be a common second block, with magic byte 0x03, followed by the vorbis comment block, usually with a padding block last. You have to parse through the blocks in order: read the type, read the length, then either parse that length, or skip over it to the next block.

You check the block type very easily:

Code: [Select]

if (block[0] & 0x7F) == 4)) // then vorbis comment block

If you're confused, look at a Flac file with a hex editor. If you can't read through it that way then you'll never be able to code your way through it.

Reading Flac tags using C#

Reply #46 – 2014-05-09 13:22:28

Quote from: lithopsian on 2014-05-09 12:46:21

The vorbis comment block is never the first block after fLaC. The first block is always streaminfo and its magic byte is 0x00 Almost never 0x80 because it is rarely the last block. The streaminfo block has a fixed length, but still has a length indicator in the three bytes following the 0x00: always 0x00, 0x00, and 0x22, indicating 34 bytes.

Following that there are other metadata blocks, which may or may not be vorbis comment blocks. The seektable seems to be a common second block, with magic byte 0x03, followed by the vorbis comment block, usually with a padding block last. You have to parse through the blocks in order: read the type, read the length, then either parse that length, or skip over it to the next block.

You check the block type very easily:
Code: [Select]
if (block[0] & 0x7F) == 4)) // then vorbis comment block
If you're confused, look at a Flac file with a hex editor. If you can't read through it that way then you'll never be able to code your way through it.

Ok thanks - apologies for pestering you but I really want to understand all this stuff :-)

Reading Flac tags using C#

Reply #47 – 2014-05-09 14:01:02

The vorbis comment metadata block is a good one to look at in a hex editor, because the field contents are readable ascii (usually!) and so it is easy to find the length markers between them. Looking at the length indicators inside a vorbis comment block and those of the metadata block headers will also make very clear to you the meaning of big- and little-endianness. Again, don't get hung up on names, but see that the four (or three) bytes making up the length are in a different order in the file.

Reading Flac tags using C#

Reply #48 – 2014-05-09 17:06:10

Quote from: lithopsian on 2014-05-09 14:01:02

The vorbis comment metadata block is a good one to look at in a hex editor, because the field contents are readable ascii (usually!) and so it is easy to find the length markers between them. Looking at the length indicators inside a vorbis comment block and those of the metadata block headers will also make very clear to you the meaning of big- and little-endianness. Again, don't get hung up on names, but see that the four (or three) bytes making up the length are in a different order in the file.

I'll have a go thanks again

Reading Flac tags using C#

Reply #49 – 2014-05-16 19:19:54

Not exactly C# and a little messy in design
https://github.com/drogatkin/JustFLAC
However it works for me like a charm

Notice