<?sphp $this->text('pagetitle') ?>
 
Home of the Squeezebox™ & Transporter® network music players.

NewSchema

From SqueezeboxWiki

Revision as of 15:20, 30 June 2010 by Soulkeeper (Talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Background and Design Goals

Bug #8303 is the catch-all for issues that have been piling up for the last few years that require schema work.

The design goals here are to solve a bunch of problems with the current music database and scanner, specifically:

  • Database/Search/Scan speedups
  • Sorting out the Artists mess once and for all to satisfy all niche users:
    • Band, Composer, Artist, AlbumArtist, Conductor, Track Artist, Various Artists ...
    • Orchestra, Ensemble, Performer, SampledArtists, Drummer ...
    • MixingArtist, ArtworkArtist, CoverBand ...
  • More flexible tagging / tag-scanning
  • More flexible search/filter/sort capabilities
  • Saved searches (by parameters, not by results)
  • Support multiple music libraries
    • Separate directories or mountpoints with separate databases
    • Allow search/browse to use specific libraries or all libraries
  • Support temporary music libraries
    • Plugging a USB hard drive full of music into your SC machine that your friend brought over
    • Or an iPod...
  • Support multiple users
    • With permissions for configuration editing
    • And per-library permissions (i.e. JoeJr can't listen to DaddysLibrary)

Design

It should be noted that everything from here down is subject to change as we are enlightened by the process of actually trying to implement it:

A rough high-level design was presented at the Sept. 2008 engineering on-site, the first draft implementation of which isn't quite ready yet. The core of it will be a new independent chunk of code that implements a generic media library service and API, which can then be plugged into the back of SC in place of the current code. It will probably steal significant code from the existing stuff, but only where it makes sense. The functional domain of this new chunk of code is that it manages on-disk media libraries. It will encapsulate all physical access to the on-disk library of media files itself as well as the database of metadata, the scanning to import that metadata, and the APIs for search/filter/browse/etc.

A key component of the design is a move back to SQLite in place of MySQL. SQLite has seen a lot of improvement since we last used it, and removing the disk/code size and complexity of shipping and running an independent MySQL server is a win for us, especially on small platforms. We believe the original key limitation that prompted the move (no ability for scanner and SC to simultaneously update the database in some situations) can be overcome now in the new schema design.

Another key change is moving to multiple independent physical databases, one per library, stored with the library itself. By default, if your music is in /my/music/folder, the database will be stored there as well. There should probably be an option (either global or per-library) to store the database elsewhere of course, for the case where access to the music drive is fast enough for streaming, but too slow for scanning/searching. This could be implemented in those cases by storing a small file at the library root in place of the database which just redirects to the real location elsewhere.

The other big game-changing component of the design is to drop the idea of a one-size-fits-all database schema completely. Instead, that's replaced with a language for defining the schema (somewhere between a config language and a DSL) which allows specification of arbitrary typed attributes for albums and "tracks", as well as arbitrary mapping of tags the scanner sees in various music formats to those attributes, and support for transformation code (to allow things like: concatenate these two scanned tags X, Y from the FLAC format files, and strip the leading << from them, putting the result into the database attribute Z). There would still be a singular central SC database to handle things not specific to libraries - these new databases are just library index metadata and nothing else.

We would ship a default specification which is close to what the current code does, and we can provide alternatives (as can the community) to support all niche uses of nonstandard schemas, such as "Classical libraries in Ogg Format tagged by MyObscureTagProgam1.33", or "iTunes", or whatever the case may be. Aside from allowing us to solve all the currently-known niche cases, it also provides future flexibility for other media formats that don't fit our current "albums of music tracks" concept as well.

These library-type specs would be single files stored in a directory in the SC tree. Additionally, libraries that use them would store a copy of the spec inside the database itself, so that the code can always read the spec out of a fixed table at library init time to know how to handle this library, regardless of the user having deleted the original spec from an SC directory or whatever. Changing the library spec will always require a full rescan (destroy database and rebuild).

Implementation Issues

The surface has barely been scratched on implementation. Soon I'll have some very rough draft code running, which will make it easier for us to discuss and fix all of the issues. These are the notable implementation speedbumps on the horizon:

  • Defining the spec language (mostly complete)
  • Ensuring that we can use modern SQLite from Perl reliably (DBD::SQLite is outdated)
  • Subclassing parts of DBIx::Class to support connecting to dynamically-defined library schemas by reading the spec from them (and for creating them initially), (or finding another solution or making one from scratch - but the ResultSet, Storage, and perl->SQL code in DBIx::Class helps solve a lot of problems for us, and would suck to re-implement specifically for this)
  • Identifying all the special cases that can't be handled generically, such as:
    • Duration - it's special because it's universal and we'll want to sum durations for albums, playlists, searches, etc (iow, it's not like other arbitrary typed attributes that just support basic flexible search/sort in a generic way)
    • For that matter, Duration and a few other things will probably form a fixed subset of the schema that can't be modified by the spec
    • For some special cases, we'll probably just want additional metadata in the spec language, for example to say "this attribute X is a type of artist attribute"