Archive for March 20, 2009

Encoding Blues

I’ve been working today on data collection for my dissertation and wrote some python scripts to parse the logs of a number of FLOSS repositories and store the data into a Django model to make querying the data easier. So I run a script to collect the log messages for the year 2008, and everything seems to be progressing fine. You can see the names of projects my script is working on flying by the screen, until it hit the Linux Kernel 2.6.

The activity on the project is absolutely enormous compared to the other projects in my data sample (which includes Wine and Django). The names that were flying up my screen simply stopped as if we hit a brick wall.

So I wait for a minute, then I got a flat tire: File "/usr/lib/python2.5/site-packages/django/utils/", line 77, in force_unicode raise DjangoUnicodeDecodeError(s, *e.args) django.utils.encoding.DjangoUnicodeDecodeError: 'utf8' codec can't decode bytes in position 705-708: invalid data. You passed in '...Signed-off-by: Bj\xf6rn Steinbrink <> Signed-off-by: David S. Miller <>\n\n' (<type 'str'>)

I beleive everything is configured for UTF8 encoding on my end, but I suspect this part of the string is problematic Bj\xf6rn. I normally would replace the character, but since im dealing with 200+ projects and well over 100K commit messages, I don’t think this would be a good option.

I hope this doesn’t take very long to fix.

Update on situation: Took me an hour of playing with encoding only to have conceded in the end. I decided to simply modify the text. I don’t think I have the time for this.

Dumping PHP in favor of Python and Django — Part 1

We have been using PHP at since we have started at year 2000. In 2004 we started looking for a CMS that will make our life easier, our choice was Joomla! which covered our needs pretty well at that time, we have deployed Joomla! in 2005, after couple of years of using it we have discovered many strange behaviors of the CMS and poor performance when it comes to speed. The situation was annoying which lead to looking for an alternative by the end of 2006 we started CMS and Framework research and comparison which took couple of months, the comparison included typo3, drupal, joomla 1.5, zope, plone. In 2007 we were about to choose between Drupal and Plone. I dont remember how we have discovered Django framework (Thank God!), we were very much interested in Django features, we started tracking the project till the end of 2007.

It is, I dont feel like continuing my post now. I will probably continue the post when I feel like doing so.

First Blogging Attempt

Hello world!

Hopefully this will speed up my dissertation writing process. I'll try to post news about my progress and what I am doing every now and then. Hopefully, someone will find it interesting. More importantly, if my adviser is reading this, I hope you find this a valuable tool for our communication.

Setting up ByteFlow was a breeze, and let me tell the world, Django and Python are amazing. No more Java crap for me.