Geek Thoughts: September 2010

Thursday, 9 September 2010

Of the Perception of Open Source Tools

Today, while discussing options for an identity federation solution, I had the following comment from one of my colleagues (I paraphrase as I don't remember the exact words):

The advantage with a Microsoft product compared to an open source product is that you get a good administration interface.

Note that no specific package had actually been mentioned so this was an obvious generalisation which may or may not be true. However, irrespective of the validity of the statement, why is it that some IT professionals have this view of open source software? Is there any grounding in this comment? And is there something that we, as the open source community, can do to change this mindset?

Sunday, 5 September 2010

Adding Support for Alien Database Import in Shotwell

When developing the import from F-Spot feature in Shotwell, I made sure I isolated the F-Spot specific code from the generic code so that the same feature could be easily implemented to import photographs from other photo management application. So if you want to contribute an import from Picasa feature, here's a quick guide on how to do it.

Alien Database Framework

Everything you need in order to implement a new import feature is provided by the alien database framework. It is composed of 5 classes that you will need to know about and 5 interfaces that you will need to implement, as shown in the class diagram below:

Alien database framework class diagram

Here's a quick overview of the different components:

AlienDatabaseHandler

The class where everything starts from. This object is responsible for managing a set of drivers. At the moment, there is only one, the F-Spot driver but hopefully, by the time you finish reading this, there will be a new one.

AlienDatabaseDriverID

A light-weight struct that identifies a unique driver. This struct wraps a simple string. In the case of the F-Spot driver, that string is f-spot.

AlienDatabaseDriver

The interface that specifies what the driver implementation needs to provide. It includes a few simple methods that enable the driver support to be included in the Shotwell interface, as well as three more heavy-weight methods that actually perform the database handling. Those methods are called in two steps:

get_discovered_databases is called first so that the driver can provide the UI a list of databases that are automatically discovered, typically database files found in well known locations such as ~/.config/f-spot/photos.db for F-Spot;
open_database or open_database_from_file gets called when the user has selected what database to load: those are the methods that will do the heavy lifting.

DiscoveredAlienDatabase

A light-weight wrapper that identifies a discovered database and implements lazy loading of the real database.

AlienDatabaseID

A light-weight struct that behaves exactly the same as AlienDatabaseDriverID but uniquely identifies a given database, including its related driver.

AlienDatabase

The interface that specifies what the database implementation needs to provide. The method that does all the work is get_photos.

AlienDatabaseVersion

A light-weight class that implements a version number in the format x.y.z and is able to compare versions between each other. This is meant to make it possible to validate whether the version found is actually supported by the driver. This is also used heavily in the F-Spot implementation to provide support for different versions of the database.

AlienDatabasePhoto, AlienDatabaseTag and AlienDatabaseEvent

A set of interfaces that define data objects handled by the database: photos, tags and events.

Implementing a new driver

Now that you've decided to implement a new driver, let's go through it step by step.

Driver implementation

The first thing to do is to provide an implementation of the AlienDatabaseDriver interface. Let's get the UI related methods out of the way first.

get_id: This should return a hard-coded AlienDatabaseDriverID used to identify the driver.
get_display_name: This should return a display name, most likely the name of the application the driver is for. You may want to make it translatable if you know that the application doesn't have the same name in all languages.
get_menu_name: The name to be used for the menu identifier, which is referenced in the action (see below). This should be a simple hard-coded string.
get_action_entry: A method that returns a Gtk.ActionEntry that will be used to construct menu items. The return value should be hard-coded and be consistent with what the get_menu_name method returns. I know, it looks like there's redundant code in there but it seems that Vala has issues with building structs with non-hard-coded strings. This may be simplified in the future. Don't forget to set the label and the tooltip for the action and to make them translatable: see the F-Spot implementation for an example.

That's the UI out of the way. Now let's have a look at database discovery and load.

get_discovered_databases: This method will be called when the user selects the import menu item and the dialog box appears. It should look in all the well-known locations for a database and return a collection of DiscoveredAlienDatabase objects. Those objects are created from an AlienDatabaseID object that contains two pieces of information: the driver ID and a driver specific string that identifies the database. That driver specific string can be whatever you want. At its simplest, it can just be the path to the database file.
open_database and open_database_from_file: Those two methods are the ones that do the heavy lifting. They basically provide exactly the same function with a slightly different signature so you will probably want to factorise the code and make both of them call an internal private function that performs the bulk of the work. At this point, you need to open the database file and extract the version number out of it. If there is any error, it's the time to report it as this is called while the import dialogue is still displayed to the user and can provide early feedback. There are two error domains you can use for that: use DatabaseError for any generic database issue, such as problems opening the file or reading the tables; and use AlienDatabaseError to report a database that you can read but which has a version number that you don't support. If everything goes well, return an implementation of AlienDatabase.

Database implementation

Now that you've loaded the database, it's time to extract data out of it. For that purpose, you now need to provide an implementation of the AlienDatabase interface. Once again, there are a few UI related methods and some data related ones so let's start with the UI bits.

get_uri: This method returns the driver specific URI for this database. It must be consistent with what is used in the AlienDatabaseID struct.
get_display_name: A string that is suitable for display in the UI and that identifies the database to the user.

And now for the bulk of the implementation:

get_version: Return the version of this database. By the time this method is called, the database should already have been opened so it should only fail if something really unexpected happens.
get_photos: This is the important method, where the content of the database is read and photo references are extracted. Try to be as lenient as possible in this method so that it doesn't throw an exception. If an unexpected piece of data is returned, try to recover from it rather than throw an exception. If a photo entry can't be read properly, just ignore it and continue with the next one. Throwing an exception will abort the whole import so make sure you only do it if you're absolutely sure that you can't do anything else.

Data objects implementation

The last three interfaces define light-weight data objects that should all be created by the get_photos method in the AlienDatabase implementation. I will only detail AlienDatabasePhoto as the other two are trivial and only return a name.

get_folder_path and get_filename: Both strings together provide the fully qualified path of the image file for the photo.
get_tags and get_event: Return the objects that contain the details of the tags and event for the photo. Note that because Shotwell only supports one event per photo, only a single instance can be returned, not a collection. Also note that because Shotwell does not currently support hierarchical tags, all tags just contain a name. If the database you import from supports hierarchical tags, you should decide whether you want to import all tags or only leaf tags.
get_rating: Return a five-star rating for the photo. If the database you import from doesn't support ratings, just return Rating.UNRATED. If your rating is stored as an integer, use the Rating.unserialize method.
get_title: A title for the photo, if the source database supports it. Otherwise, return null.
get_import_id: This method returns a values that identifies an import roll. It should be a value that is equivalent to a time stamp. If the source database doesn't support this, just return null.

Register your driver

The only thing left to do is to register your new driver with the handler. For this, you will need to modify the AlienDatabaseHandler constructor to add a call to register_driver with an instance of your driver. I know, this is not ideal but there is a ticket on the Yorba tracker to implement a real plugin mechanism. Once this is done, this last step should go away. And as an added bonus, if libpeas is used for this as expected, you should then be able to write your plugins in other languages than Vala!

Go and write great code!

That's it. If you want to enable import from your (old) favourite photo management application to Shotwell, just follow this guide and contribute a patch. Of course, the reality of things means that it will probably not be that simple, it all depends on the complexity of the source database. You may also want to be able to support several versions of that source database. For an example on how to do this, have a look at the F-Spot implementation.

Sound Issue in Ubuntu 10.10 Beta

This morning when logging in to my newly-upgraded-to-Ubuntu-10.10 laptop, sound was not working. It appears that the solution was very simple: my user was not authorised to use audio devices. I don't know why it was disabled as it had always worked fine before but it's very easy to solve so if you have the same problem check that first. To resolve it, go to System → Administration → Users and Groups, select your user, click on the Advanced Settings button, enter your password, click the User Privileges tab and make sure the Use audio devices box is checked. While you're at it, do the same for the other users on your system.

User Settings dialogue

If that doesn't work, there is a handy wiki page on debugging sound problems.

Saturday, 4 September 2010

Memory Usage Graphs with ps and Gnuplot

When developing the import from F-Spot feature in Shotwell, a user who tested the patch found out that there was a bit of a memory leak. After finding the cause, I produced a patch to fix it but I also wanted to identify what the difference was between the development trunk and the patch. So here's how I did it.

Gathering Data

The first step was to gather relevant memory usage data. For this I needed a repeatable test and perform that test both with the trunk build and the patch build. As I had a test F-Spot database, that proved quite straightforward:

Delete the Shotwell database,
Build the trunk version,
Import the test F-Spot database using the trunk build,
Delete the Shotwell database again,
Build the patched version,
Import the test F-Spot database using the patched build.

With the test process sorted, I needed to gather memory data during steps 3 and 6. That's easily done using the ps command in a loop and sending the output to files. So, for the trunk build, I just started this command in a terminal before starting Shotwell and stopped it once finished:

$ while true; do
ps -C shotwell -o pid=,%mem=,vsz= >> mem-trunk.log
sleep 1
done

The one for the patched version is virtually the same:

$ while true; do
ps -C shotwell -o pid=,%mem=,vsz= >> mem-patch.log
sleep 1
done

Note the the equal sign (=) after each field specification tells ps not to output the column header. So at the end of this, you end up with two files that contain 3 columns of data each: the PID, the percentage of memory used and the total virtual memory used for the process at intervals of one second. In both cases, I let Shotwell run idle for a few seconds at the end of the import before closing it to ensure that everything had stabilised.

Next, I checked how many lines of data I had in each file:

$ wc -l mem-*.log

And truncated the longer file to the length of the sorter one, just to make sure I had the same number of data points for both.

Creating the Graph

After that, I wanted to create a single graph that included four lines: VSZ and %MEM for both trunk and patch. And I wanted to output the result to a PNG file. Gnuplot can do all of this, you just need to know how to set its myriad of options. So here's the Gnuplot script in details.

set term png small size 800,600
set output "mem-graph.png"

Gnuplot works with the concept of terminals. So the first line tells it to use the special terminal called png with a small font and a size of 800 pixels wide by 600 pixels high. The second line is fairly self-explanatory: output to the given file.

set ylabel "VSZ"
set y2label "%MEM"

The two sets of values I am interested in have very different ranges. VSZ is a number of bytes and will have values in the hundreds of thousands if not millions, while %MEM is a percentage so will have a value somewhere between 0 and 100. So to make sure that both types of graphs fit in the output, I will use the ability that Gnuplot has to use left and right Y axes with different ranges: VSZ will go on the default Y axis (left, called y), while %MEM will go on the other one (right, called y2). So I set the labels for both.

set ytics nomirror
set y2tics nomirror in

As the right Y axis is not used by default, I need to enable it and to set where the tics go. To do that, I first disable the mirror option on the left Y axis and enable tics on the right Y axis by telling Gnuplot that their position will be in.

set yrange [0:*]
set y2range [0:*]

The last piece of setup is to customise the range on both axes. By default, Gnuplot will adjust the range so that there is as little white space as possible above or below the graph. But in this case, I want both sets of graphs to start at zero so that I can have a better idea of total memory used.

The next bit is quite long so I will start by explaining the instruction for a single graph before bringing all four together.

plot "mem-trunk.log" using 3 with lines axes x1y1 title "Trunk VSZ"

In the line above, I tell Gnuplot to take its data from the third column in the file called mem-trunk.log. The with lines section specifies that I want a line graph. The axes x1y1 specifies that I want it to be drawn against the first X axis and the first Y axis (the default, but here for completeness). And the last bit specifies what I want the title for this graph to be. Then it's just a case of plotting all four graphs in a single plot command separated by commas. So here's the full script:

set term png small size 800,600
set output "mem-2334-graph.png"
set ylabel "VSZ"
set y2label "%MEM"
set ytics nomirror
set y2tics nomirror in
set yrange [0:*]
set y2range [0:*]
plot "mem-trunk.log" using 3 with lines axes x1y1 title "Trunk VSZ", \
     "mem-patch.log" using 3 with lines axes x1y1 title "Patch VSZ", \
     "mem-trunk.log" using 2 with lines axes x1y2 title "Trunk %MEM", \
     "mem-patch.log" using 2 with lines axes x1y2 title "Patch %MEM"

Make sure that there is absolutely no white space between the backslash characters and the end of lines in the plot command otherwise Gnuplot will complain. Save the script to a file called mem.gnuplot and run it:

$ gnuplot mem.gnuplot

And here is the output I got, which shows the improvement in memory usage between trunk and patch:

Memory usage graph

Ubuntu 10.10 Beta First Impressions

Ubuntu released the first Maverick beta a couple of days ago. As I had some time on my hands today (including the time to re-install Lucid if it all went pear shaped), I decided the upgrade my ThinkPad T42 so, as instructed, I typed this in a terminal:

update-manager -d

And here's how it went.

The Good

The upgrade took a few hours, was extremely smooth and just worked.
Everything that I've tried so far just works out of the box, no regressions (apart from a small glitch, see below).
I thought Ubuntu 10.04 was fast but 10.10 is even faster! Firefox and Evolution in particular feel snappier.
The new keyboard layout indicator is bigger and clearer.
Shotwell replaces F-Spot.

The Bad

Some old bugs like ThinkPad sound notifications still don't seem to have been fixed but from the bug report it's not trivial so hopefully it will get sorted at some point.
A bug for which I provided a patch a couple of months ago is back as librsvg has been updated to the latest version which doesn't contain the fix.

The Ugly

The default background really doesn't look good so the first thing I did was change to a different background image.

All in all, an excellent upgrade!