Click here to get the latest copy of my dissertation essays and my OSS2009 presentation.
All articles, tagged with “dissertation”
After recently upgrading to Jaunty Jackelope, I thought everything went well. I found the UI very responsive compared to Hardy Heron and all the applications I used were running perfectly. That is, until I tried to check the data I was storing in MySQL.
I knew there was a problem when tried to retrieve data using a Django ORM and the queryset was returning some data when I do qs.objects.all(). But when I do qs.objects.count() it returned 0. Turnes out that there was a corruption in the hard disc. Luckely I had some back ups, but much of the manual donkey work that I was doing was lost and I have to redo it all over again (that is to look at 80K log messages).
After restoring everything, I finally realized how large is the amount of data I am dealing with. I downloaded over 200 Python and C based source code for open source projects, which amounts to over 46GB! I have been parsing all these source files for data for the past couple of months. I hope this effort finally pays off and I graduate. I am kind of excited to look at the results of all this analysis. But I have to complete the theoretical development part of my dissertation.
It’s a boring part of my dissertation, but it must be done. After trying to automate the parsing of patch contributors from RCS logs and deciding the approach wasn’t very reliable, I decided to use django to display the log messages with suspected contributor names ten at a time. I sift through these messages and approve them ten at a time.
Thank god Linux Kernel development is done using git which stores the author’s name and I don’t have to parse the log message. This saved me from sifting through 50K log messages. I will check out Wine too to see if the author information is captured by git also. But even with this, I got 90K log messages to go through.
To the Postgres committers out there, I appreciate what you’re doing, but let me say this: I hate you for making my life a living hell. Would it hurt to put “patch by” in front of author’s name in the log message?