All articles, tagged with “dissertation”

Presentation material for OSS2009 PC

Click here to get the latest copy of my dissertation essays and my OSS2009 presentation.

Hard Disc Corruption

After recently upgrading to Jaunty Jackelope, I thought everything went well. I found the UI very responsive compared to Hardy Heron and all the applications I used were running perfectly. That is, until I tried to check the data I was storing in MySQL.

I knew there was a problem when tried to retrieve data using a Django ORM and the queryset was returning some data when I do qs.objects.all(). But when I do qs.objects.count() it returned 0. Turnes out that there was a corruption in the hard disc. Luckely I had some back ups, but much of the manual donkey work that I was doing was lost and I have to redo it all over again (that is to look at 80K log messages).

After restoring everything, I finally realized how large is the amount of data I am dealing with. I downloaded over 200 Python and C based source code for open source projects, which amounts to over 46GB! I have been parsing all these source files for data for the past couple of months. I hope this effort finally pays off and I graduate. I am kind of excited to look at the results of all this analysis. But I have to complete the theoretical development part of my dissertation.

Dissertation Donkey Work Has Started!

It’s a boring part of my dissertation, but it must be done. After trying to automate the parsing of patch contributors from RCS logs and deciding the approach wasn’t very reliable, I decided to use django to display the log messages with suspected contributor names ten at a time. I sift through these messages and approve them ten at a time.

Thank god Linux Kernel development is done using git which stores the author’s name and I don’t have to parse the log message. This saved me from sifting through 50K log messages. I will check out Wine too to see if the author information is captured by git also. But even with this, I got 90K log messages to go through.

To the Postgres committers out there, I appreciate what you’re doing, but let me say this: I hate you for making my life a living hell. Would it hurt to put “patch by” in front of author’s name in the log message?