Media player Data Structures

Topic: Media player Data Structures (Read 11426 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

Media player Data Structures

2006-05-31 21:31:24

Let me provide a little background before I ask my question:

I have been experimenting with various linux-based media players in regards to how they handle very large music libraries and so far I have been generally dissatisfied with the performance of all the major offerings (ie, rhythmbox, bmp/bmpx, banshee, etc). While they all pass as acceptable media players, none have obviously come close to serving as a drop-in solution for everyone's favorit media player, foobar2000. My biggest complaint about these media players is how they handle my obscenely large music collection (27,000+ tracks). I have been spoiled by the experience of foobar where I can open the player and have it come up near instantaneously and immediately have access to my album list in my nifty album list panel. From here I can immediately click on an artist/album/track/etc. and start playing. This is with the very same music collection. This feat however seems to elude all of the developers of linux based media players as there are unacceptable wait times in between each of the steps I mentioned. I know for a fact that this is not a performance bottleneck in linux/gnome because everything else on the same system, for lack of a better term, "hauls ass".

Given the size of the library it has to manage, I see their being 2 bottlenecks. The first is the backend storage method. Some use relational databases, some use xml files, and so on. The performance of the backend will have a huge impact on how quickly the media player loads as it will need to query the entire database on startup. The second is the method used to juggle all that info in memory (ie data structures). These can be as simple as arrays, lists and hashtables, or as complex as binary tree derivatives or various styles of linked lists. This influences the responsiveness of playlists and searches which would consume a majority of the cpu time actual playback and misc processing notwithstanding.

(I will use banshee as my primary example since I have analyzed it's source code). Banshee uses sqlite for its backend. From what I've read, this seems to be a preferred low calorie sql oriented storage method. Internally it uses a managed array for its playlists, and Hashtables for its library (meta info, tracknames, etc). Ok fine, this is an acceptable way to do it and works fine for most applications.

So take this implementation and now put it under stress. Lets say we import 160Gb of mp3s, mpcs, m4as, flacs, oggs, and so on, all properly tagged. This obviously takes a while to process on the initial import so no big deal, go grab a cup of joe, or a smoke or what have you, even foobar takes its time during this process which is understandable. As an every day end user, my expectation for the next time I open my media player is to have it come up without any noticeable delay since all of the data has already been processed, packaged, and filed away, so in theory all it needs to do is do a quick sql query just to populate its internal library. Well in practice this is not what happens. Banshee can take as long as one minute before my library is presented to me and many times longer. Rhythmbox typically takes about 5 minutes before it begins to settle down(This surprises me since it uses some sort of tree structure internally). Either way, this is unbearable. Bmpx just crashes (super alpha software, so it's off the hook).

So let me get to the point now. Let's say I want to reimplement these features in these media players or maybe I was going to write a player from scratch. I now must make a decision on what data structures and backends would be most efficient for my purpose. So my question is what would a really intelligent developer do in this scenario. Obviously the solution is not to rely on vanilla language constructs like the existing players do, it would make more sense to use something a little more high performance. What I need is a data structure that can handle very large capacities of data and has respectable performance. Ideally what I would like is to get ahold of the foobar2000 source code and see wwpd (what would peter do?). It is apparent to me that whatever data structure/algorithm he uses to manage the internal library and playlists clearly rocks. Anyone have any clue regarding his secret ingredients?

I have been doing some reading on efficient data structures what seems to have the most potential in this scenario for me is the SkipList. This structure seems like it would perform phenomenally in rapid searching, requiring only a handful of comparisons/operations (12-18) to search a list numbering in the hundreds of thousands. This sounds like a pretty good deal compared to other methods. The best explanation for how a skiplist works I found here: MSDN Data Structues (Skiplist). What is the general concensus on this? Would anyone agree or disagree that this would be an ideal method? Is there something better?

With that out of the way, I would then want to know what would be an ideal backend data store? Performance being the key motive, would it be worthwhile to just serialize our skiplist into a binary file or are relational databases more suitable? I'm not at my personal computer right now so I can't analyze foobar's database file but I assume it's a binary format of Peter's design?

I would certainly appreciate any insight into these types of problems. I pose the question on HA because I know almost everyone here is a developer (some for media players), not to mention there are some brilliant mathematical minds as well. I am even open to the theoretical.

On a tangent, if these points were optimized on a linux based player, combined with the recent ability of mono to support Windows.Forms, it does not seem like such a stretch that a near identical clone of foobar2000 could actually come to life on the linux platform. Any plans for peter to embrace .net?

Media player Data Structures

Reply #1 – 2006-05-31 22:57:53

A book called Managing Gigabytes might be of help.

Are you into research? If so, there's a doctor at uni that has some papers about efficient data structures. Search citeseer for Edgar Chavez.

s

Media player Data Structures

Reply #2 – 2006-06-01 00:04:11

Quote from: legg on 2006-05-31 22:57:53

A book called Managing Gigabytes might be of help.

Are you into research? If so, there's a doctor at uni that has some papers about efficient data structures. Search citeseer for Edgar Chavez.

s

Thanks for the info, I'll check it out. Actually thought I do intend to do some implementations of these algorithms for the purpose I described and do some benchmarks. If I like the results I may take it further and build infrastructure around it.

Media player Data Structures

Reply #3 – 2006-06-03 16:53:58

Hmm... I was hoping to get more bites than this. Can someone recommend a good forum to post this question to?

Media player Data Structures

Reply #4 – 2006-06-09 17:40:49

I think you can learn a lot of foobar2000's "secrets" by studying the SDK. Not using a relational database to store metadata is one of them. If I should name the three most important points, it would be the following ones:

Memory management

I am not talking about garbage collection in any form here although foobar2000 does use reference counting for several kinds of objects, but about fast memory allocation. The foobar2000 SDK supports fast allocators for array-like structures - arrays, strings, linear lists, etc. - that allocate memory for a power of two number of items instead of the current item count, which speeds up insert operations (especially at the end).

Batch operations

This is probably best illustrated with an example: Inserting 1000 items into a list is usually slower than inserting one item, but inserting 1000 items in a single operation is a lot faster than inserting 1000 items one by one. The situation is similar for deletions. Batch operations are used almost everywhere in the SDK and API, as are linear lists by the way.

Metadata handling

Titleformatting scripts are used pervasively in foobar2000, so fast access to the metadata of a given track is essential to achieving good performance. Let me give you a short description of how metadata handling works in foobar2000.

Every audio track in foobar2000 is identified by a playable location which is the combination of an URL and a subsong index. (The subsong index is used to handle multiple tracks in the file.) Metadata for a single track is accessed through the "file_info" interface which contains technical information (a string-to-string map) and user-defined metadata (a string-to-list-of-string map). Locations and cached metadata are tied together by handles managed by the metadata cache. Each such handle is a pointer to a C++ object, and there is exactly one handle for any location that is currently in use. These handles are the usual form in which tracks or list of tracks are passed around between components including the foobar2000 core with the notable exception of certain parts that deal with I/O. To access the cached metadata for a given handle requires merely to dereference a few pointers - not more than two or three, I think. In reality, there are also a few virtual method calls and the aquisition of a mutex to guarantee thread-safety, but it is still a fast operation.

Media player Data Structures

Reply #5 – 2006-06-16 21:16:36

This is fantastic information. Thank you foosion. I will study the foobar sdk fastidiously. I did get a chance to look at the foobar "database" file which is obviously a binary file. It appears to be nothing more than a serialized data structure, is this a correct assumption?

I have been working on my experiment for the last week or so and I've made some pretty decent progress. I have implemented a track object and a playlist object. The track object contains a url reference as well as a string dictionary which contains all the user-defined meta tags. The playlist object implements one of my nifty skiplists which is optimized by the sorting method. I've done some minimal testing to verify that filling one of these data structures is a fairly quick process (Instantiating 200,000 track objects, setting all the common meta fields (artist, album, tracknum, genre, date, etc) and then adding them to the playlist takes about 4-5 seconds). Granted this is all done in memory without having to load any data from disk but this seems fairly reasonable. Ultimately I plan to rely heavily on caching similar to foobar in order to avoid any major overhead. Needless to say, I am optimizing this wherever I can, wherever I need to use a collection, I make sure to use a C# generic to avoid the costs of casting/boxing. For persistance, I think I will follow peter's example and just serialize one of my skiplists and dump it to a binary file.

In the process of conducting this experiment I have started another related experiment involving the rendering of a playlist. Initially I had intended to follow the herd and just make a dumb list box, this was actually another reason I settled on gtk over windows.forms because gtk seemed to be one of the only gui kits with mono bindings that had a list that utilized the model-view-controller model as I felt it would be self-defeating to not use my optimized skiplist also as my list data model. I've analyzed the Banshee player source code for an example of implementing a custom MVC based list but then I decided to try out an idea I've had for some time instead. Thanks to availability of gecko#, I have embedded the gecko rendering engine into my application and I am using it to render my playlist. Furthermore, I have written a custom xml-based template language to define custom playlist formatting (I call it PML, or Playlist Markup Language). This template language essentially allows someone to specify the placement of meta info, definition of blocks, loop definitions, etc, not at all dissimilar to the way an html template language works like (Zope Page Templates or Cheetah or KID). Here as an example:

Code: [Select]

<?xml version="1.0" encoding="ISO-8859-1"?>
<playlist grouping="album">
    <loop collection="playlist.members" iter="album">
        <block name="album_block">
            <image name="coverart" value="album.coverart"/>
            <block name="album_title_block">
                <label name="artist" value="album.artist"/>
                <label name="album" value="album.album"/>
            </block>
            <loop collection="album.members" iter="track">
                <block name="track_block">
                    <label name="tracknumber" value="track.tracknumber"/>
                    <label name="title" value="track.title"/>
                    <label name="duration" value="track.duration"/>    
                </block>
            </loop>
        </block>
    </loop>
</playlist>

When this is loaded and associated with a playlist, the tags are associated with actual meta info and converted into displayable html. This html is then combined with a CSS stylesheet and fed to the gecko rendering canvas presenting us with a very pretty dhtml based playlist. (I can post a screenshot if anyone is interested). I think this method allows for some very interesting customization possibilities which I will leave open for speculation. I have already written a functional parser and tested it successfully with the very PML above. There is still alot of work to be done with it, as my parser is not very efficient in how it handles the loops yet (It completely chokes on my 200,000 track test playlist), but I already know what I need to do to optimize it.

Anyways, I welcome comment.

Edit:
P.S.: In order to be fair to the developer of the Banshee player, I determined that the reason it was taking so long to load my playlist was actually due to the DAP plugin and not the player itself. Banshee actually rocks and I am studying the source code for ideas. I have actually considered contributing to this project during the course of my experimentation.

Media player Data Structures

Reply #6 – 2006-06-16 23:19:53

You would get nothing from looking at the code. Having fast load times isn't about developing the best data structure, writing the most interesting algorithm or about having the coolest code. It's about managing priorities. Parsing XML, building dynamic data structures and making objects is going to be slooooow.

Media player Data Structures

Reply #7 – 2006-06-17 17:54:54

Quote from: benski on 2006-06-16 23:19:53

You would get nothing from looking at the code. Having fast load times isn't about developing the best data structure, writing the most interesting algorithm or about having the coolest code. It's about managing priorities. Parsing XML, building dynamic data structures and making objects is going to be slooooow.

Would you care to elaborate benski? The purpose of doing this experiment is to see what "will" work. Now that you have pointed out what "will not" work, can you offer some suggestions? Thanks.

Media player Data Structures

Reply #8 – 2006-06-17 20:28:52

First of all, parsing is slow, ALWAYS. Especially XML. Guess what, it does O(n) comparison operations to find the file entry, then you have to parse the entry, O(m), m - the number of tags. This gives O(n*m) for whole database operations in an efficient implementation.

The problem with other programs is that they load too much of the database at once. Read: aren't lazy.
You should read the metadata only when they're needed, and then in batch if one is required,
e.g. when the window is being scrolled to display the file which metadata hasn't yet been loaded.
You should also put the metadata update into another thread, which of course makes the code much more complicated, but more responsive. Then signal that thread of metadata to update, send it a list of files and use LIFO strategy, and pausing the current update for the more recent to complete first.

As for relational DBs: XMMS2 uses SQLite quite efficiently.

Media player Data Structures

Reply #9 – 2006-06-17 21:02:52

Quote from: AstralStorm on 2006-06-17 20:28:52

As for relational DBs: XMMS2 uses SQLite quite efficiently.

So does Amarok, for ~6000 tracks at least.

Media player Data Structures

Reply #10 – 2006-06-17 22:24:02

Quote from: AstralStorm on 2006-06-17 20:28:52

First of all, parsing is slow, ALWAYS. Especially XML. Guess what, it does O(n) comparison operations to find the file entry, then you have to parse the entry, O(m), m - the number of tags. This gives O(n*m) for whole database operations in an efficient implementation.

Yeah I ran into this problem during my initial implementation of my pml parser so I am experimenting with another implementation which I believe should be quite a bit more efficient. The largest bottleneck that I was running into was a combination of the xml getting parsed but also the extreme overhead involved with creating new xml readers to handle looping blocks since I am using a forward-only reader. I came up with another idea that would only require me to parse the xml data once rather than repeatedly looping through it.

The idea is to parse the xml once to determine where the loops are and which tags are associated to which meta data. I then basically cache a list of tasks to be done to actually create the playlist html. The list is List collection containing delegate functions (ie anonymous methods) which are executed whenever a playlist needs to be rendered. The list of functions is stored in a cache and then re-used with subsequent playlists. Additionally, I have made my best attempts to make the initial parsing process as quick as possible through the use of NameTables. I am not so concerned with parsing speed with this implementation however as my PML files are relatively small and simple. If I really wanted to get anal about it, I suppose I could actually take the cached function list and compile it to it's own assembly.

Regarding the use of relational databases, I can see having this as an option, but I really don't see the necessity of requiring all that extra overhead regardless of how efficient one engine is over another. Wouldn't pure binary in an efficient structure be the best route?

Regarding managing meta data, that has been on my mind as well. I would think the most efficient practice would be to, as you put it, only load it at the last possible moment when its needed in order to keep the memory foot print down. I already know that something like this will require alot of cache-based and batching trickery. I am of the opinion that I would like to avoid as many disk-reads as I can as that is undisputably the most problematic bottle neck.

Media player Data Structures

Reply #11 – 2006-06-18 01:28:08

Quote from: sphoid on 2006-06-17 22:24:02

The idea is to parse the xml once to determine where the loops are and which tags are associated to which meta data. I then basically cache a list of tasks to be done to actually create the playlist html.

Lazy evaluation, works quite well indeed.

Quote

The list is List collection containing delegate functions (ie anonymous methods) which are executed whenever a playlist needs to be rendered. The list of functions is stored in a cache and then re-used with subsequent playlists.

I've lost you a bit - how is this any better than hardcoding the common case with ifs?
Or are you talking about memoisation of the processed data?
(Pardon, I'm not much into C#, but I do know enough about dynamic languages due to using Python a lot)

I'd try a hash table.
You'd have to create a good and fast hash from the name of the file. There are some viable examples available in the literature. Bonus points for keeping file name locality while not colliding too often.

Maybe even a C# hash table would be small and fast enough.

Quote

Regarding the use of relational databases, I can see having this as an option, but I really don't see the necessity of requiring all that extra overhead regardless of how efficient one engine is over another. Wouldn't pure binary in an efficient structure be the best route?

Maybe, it depends on how much sorting and filtering you want to do.

RDBS isn't really that high overhead, especially if you can precompile common SQL queries. You gain flexible query language, with great filtering and indexing possibilities.

Quote

Regarding managing meta data, that has been on my mind as well. I would think the most efficient practice would be to, as you put it, only load it at the last possible moment when its needed in order to keep the memory foot print down.

Lazy evaluation again. :-)

Quote

I already know that something like this will require alot of cache-based and batching trickery. I am of the opinion that I would like to avoid as many disk-reads as I can as that is undisputably the most problematic bottle neck.

The size of the file would be 16 MB if there's 1024 bytes of metadata stored on average and ~100000 files in the database. Quite cacheable nowadays and not all that slow even if not. You could even just load it all into memory, but of course in the background in order not to stall the application.

Required operation would be: defragmenting of the index, which on any sane FS (VFAT is not one of them) could be done by rewriting it into a new file; pruning - removal of old entries; rechecking the file metadata.

More or less it's just an RDBS index. ;-)

Media player Data Structures

Reply #12 – 2006-06-22 22:51:34

Ok, the "lazy evaluation" method worked superbly. I was surprised that that it worked almost flawlessly on the first compile. Xml is parsed only once and the playlist datastructure is handed off to an array of anonymous functions which do all the work outside of the xml parser. Playlist creation time through the pml parser is negligable compared to what it was. Large playlists which took 45+ seconds to generate using the old parser now complete in 1-2 seconds which is obviously a major improvement.

Quote

I've lost you a bit - how is this any better than hardcoding the common case with ifs?
Or are you talking about memoisation of the processed data?
(Pardon, I'm not much into C#, but I do know enough about dynamic languages due to using Python a lot)

I'd try a hash table.
You'd have to create a good and fast hash from the name of the file. There are some viable examples available in the literature. Bonus points for keeping file name locality while not colliding too often.

Maybe even a C# hash table would be small and fast enough.

What's getting memorized is the collection of anonymous functions which actually generate the html using whatever playlist data it is handed. Providing the user doesn't switch playlist templates (which would result in the old function list getting discarded and a new one generated), the steps required to render a playlist will be identical regardless of the contents of the playlist. The method I used to do this was actually pretty interesting as it takes advantage of most of the new optimization features in the latest C# spec.

My anonymous functions are prototyped as such:

Code: [Select]

private delegate string PMLTask<T>(T param);

This basically states that each step of the playlist rendering process can possibly take a value of an arbitrary datatype (ie tracklists, tracklist groupings, the playlist, etc), and will return a string (our html chunk).

When I go to parse the xml, a generic list is initialized to hold these functions:

Code: [Select]

List<PMLTask<object>> local_tasks = new List<PMLTask<object>>();

For every loop structure I encounter in the pml, I create a list of these and add another PMLTask to process one of these lists creating the ability to perform recursion:

Code: [Select]

local_tasks.Add(delegate{
    string html = "";                                            
    foreach (SkipList<Track> tracklist in this.groups){
         foreach(PMLTask<object> task in loop_tasks){
        html += task(tracklist);
    }
    }                                            
    return html;
});

At the very end I process the list:

Code: [Select]

foreach (PMLTask<object> task in tasklist){
    html += task(null);
}

Which works on whatever playlist is currently stored in memory. When the playlist is altered, only the very last step needs to be done again. Hence our nifty cached instruction set. . When the user decides to load a different pml file, this cache is cleared out and a new instruction set is created.

Now that that is out of the way, I am running in to another issue which may end up being a show stopper. The problem is that using the gecko# wrapper which is what allows me to embed the gecko engine, I do not have access to the DOM once the html is rendered. This is a limitation of the bindings unfortunately and most work being done to overcome this is pretty immature atm. I was hoping to be able to dynamically interact with the DOM direct from c# to handle any number of tasks but as it stands this is impossible. I am currently experimenting with a home brewed xpcom wrapper that someone else wrote in order to bridge c# <-> javascript but I don't know how that's going to turn out yet. These kind of hacks make me really uncomfortable so I may end up looking for alternatives. Anyone have any suggestions on how to go about this?

In the process of developing this, I've gotten a buddy of mine interested in the project, and I may actually push this into a prototype for a real media player at some point if I can maintain this pace. More to come on that.

Media player Data Structures

Reply #13 – 2006-06-23 04:43:22

Quote from: sphoid on 2006-06-16 21:16:36

This is fantastic information. Thank you foosion. I will study the foobar sdk fastidiously. I did get a chance to look at the foobar "database" file which is obviously a binary file. It appears to be nothing more than a serialized data structure, is this a correct assumption?

I *think* you'll be interested in this topic: How does the Database Store Info, Datatypes and such...

Media player Data Structures

Reply #14 – 2006-06-23 15:48:19

Quote from: kjoonlee on 2006-06-23 04:43:22

Quote from: sphoid on 2006-06-16 21:16:36
This is fantastic information. Thank you foosion. I will study the foobar sdk fastidiously. I did get a chance to look at the foobar "database" file which is obviously a binary file. It appears to be nothing more than a serialized data structure, is this a correct assumption?

I *think* you'll be interested in this topic: How does the Database Store Info, Datatypes and such...

Thanks for the link kjoonlee. I think I have that issue pretty well managed, I'm using a Dictionary<string, string> currently to store meta data. I fail to see the advantage of allowing multiple values to be stored using the same key. Can anyone enlighten me on that?

Notice