Meta Filesystem (MetaFS)

This is a prototype I wrote of a filesystem that uses metadata stored in files to create dynamic directories, plus gives the user the ability to add any arbitrary attributes to files in the system.

It looks like this:

bmills:> ls location/
Pennsylvania/    Mexico/   California/  Pittsburgh/   Gibsonia/
bmills:> ls location/Pennsylvania
Topic/    Date/   DSC0042.jpg   Screaming_monkeys.mp3
bmills:> ls location/Pennsylvania/Topic
Monkeys/   Computers/
bmills:> ls location/Pennsylvania/Topic/Monkeys
Screaming_monkeys.mp3

This prototype provides the user with a way to store arbitrary attributes about files in their system. The system will run in parallel with another filesystem and provides the indexing, searching, and attribute storage for the files. This is implemented as an ordinary filesystem that has a few additional functions available on each file: Add Attribute and Delete Attribute. These functions allow the user to set any arbitrary key/value pair to any document in the system.

The system then allows the user to browse the files by any attribute set. This enables the user to create dynamic groupings of the files in their document space. The query mechanism implemented is a set of dynamic directories created as attributes changed. For instance, if an attribute is created called location then there would be a directory in the root of the document space called location. This directory is called an attribute directory and will list all the values for the attribute location in the document space. Each possible value will also define a directory called a value directory. The value directories are then dynamically created as the user selects an attribute directory. The value directory will contain all the files that have the selected value for the selected attribute. As one drills down, the dynamic directories exist again so that they can further refine their search query simply by defining a path more fully.

The system itself is implemented in user space using FUSE. FUSE was chosen because it works cleanly with Linux kernel 2.6.10 and provided a Python interface. The Python interface was a big advantage because it allowed for a quick prototype of the system. As a result, the entire metadata filesystem is implemented in pure Python. The layer between FUSE and Python is implemented as a C-extension to Python. For a quick how-to on getting python and FUSE to play together look here.

This was motivated because I really wanted a filesystem to better organize my digital media files. The traditional filesystem doesn’t provide the user with enough organizational constructs to fully express the files contents. One solution to this problem involves encoding information about files that could then take on different organizations or meanings depending on the use of that data. This prototype was an experiment in exploring this notion. I’m currently using it on my desktop computer to organize both my digital images and music. If you want to read more details you can read the paper that I wrote for my operating systems class here.

Download the prototype here, read the README document to learn how to setup the system. The tar also includes FUSE and various other packages needed.

One Response to “Meta Filesystem (MetaFS)”

  1. kirk Says:

    hi!

    i’ve read your paper about this fs,
    and i also thinked about something similar to this, but i think that doing it this way would cause some extra problems, if the user can have some freedom do describe in what way he would like to see the data, it can be better - i’ve tried to assemble an example:
    i imagined a two sided approach to came up with an acceptable solution
    * object tagging:
    it is obvious that people can give very precise information to a given
    object, in this example i would take some papers
    [obj1] - filename: ’something1.pdf’
    type = “paper”
    filetype = “application/pdf”
    title = {Title1},
    author = {Author1},
    year = {2005},
    [obj2] - filename: ‘xasd1.XXXpdf’
    type = “paper”
    filetype = “pdf”
    title = {Title2},
    author = {Author2},
    year = {2005}
    [obj3] - filename: ‘qqq.pdf’
    type = “paper”
    filetype = “pdf”
    title = {Title3},
    author = {Author3},
    /** year = {2005} no set **/
    [obj4] - filename: ‘asd.png’
    type = “image/png”
    filetype = “png”
    title = {Title2},
    author = {Author2},
    year = {2005}

    all this data is accessible thru extended attributes

    * layout rules
    the user must give some guidance - about in what way he want’s to see
    those files in this example i will choose the following rules:
    x_node0 = _root_
    x_node1 = {parent:x_node0}
    {entry:”paper”,
    require:{type = “paper”},
    unassigned:{collect,”unassigned”}
    }
    x_node2 = {parent:x_node1} {entry:”by_year”}
    x_node3 = {parent:x_node2} {entry:#year,
    type:collection,
    unassigned:{collect,”year-not-set”}
    x_node4 = {parent:x_node3} {entry:#author|’_'|#title,
    unassigned:{cascade} }

    i hope that this understandable

    the desired layout would be for those files:

    /
    /paper
    /paper/by_year
    /paper/by_year/2005
    /paper/by_year/2005/Author1_Title1.pdf –> [obj1]
    /paper/by_year/2005/Author2_Title2.pdf –> [obj2]
    /paper/by_year/year-not-set/qqq.pdf –> [obj3]
    /unassigned/asd.png –> [obj4]

    and if someone adds some new rules
    x_node5 = {parent:x_node2} {entry:”by_author”,
    type:collection,
    unassigned:{collect,”unknown-author”}
    x_node6 = {parent:x_node5} {entry:#title|’_'|#year }
    then the following nodes should appear
    /paper/by_author
    /paper/by_author/Author1/Title1_2003 –> [obj1]
    […]

    i want to forget the original filename, because there are alternate ways to obtain (accurate) metadata (ex:blitzi) and in most cases the filename is doesn’t matter much anymore, because you are in the artist’s directory, you won’t need to read his name at every file, you precisely know where you are, and it will just confuse things (and wastes display space)

    kirk

Leave a Reply