All articles, tagged with “python”

Introductory Python Programming Sessions in Kuwait

KBSoft will be conducting a series of introductory sessions designed to introduce programmers to new programming tools and to help improve their programming skills. The sessions are:

Sunday, Nov 21st 2010 7pm — 9pm: An Introduction to Revision Control with Git.

The goal of this session is to introduce how revision control systems can be an indispensable tool to programmers. Git will be the tool of choice for this session and we will go through a number of exercises to show how useful it can be to both individual programmers and programming teams. We will also be introducing a number of best practices for using using Git and help the attendees get more familiar with the system.

Tuesday, Nov 23rd 2010 7pm — 9pm: An Introduction to the Python Programming Language.

The goal of this session is to introduce python as a general purpose programming language that can be used to solve most problems faced by programmers in Kuwait. There will be a number of exercises to introduce the language syntax and features. In addition to an overview of some of the useful packages in the standard library, language best practices, and how to setup a functional development environment.

Thursday, Nov 25th 2010 7pm — 9pm: An Introduction to Web Application Development with Django.

The goal of this session is to introduce Django as the tool of choice for web development. Our approach will be to contrast the Django development model with that of the common PHP model that most attendees might be familiar with as they explored and learned PHP. We will be introducing the main components of the Django framework and go through a simple exercise that would give the users an appreciation of how useful and time-saving this framework can be.

The sessions will be held in Kuwait Information Technology Society (KITS, formerly KCS) in AlRawda. The building is at the very corner of AlRawda directly in front of AlJabriya and on the intersection of the 4th ring road with King Fahad Highway.


  • Understanding of at least a single programming language (e.g., php, vb, c, Java)

  • Laptop with the following installed to go through the exercises (No love will be shown for Windows users, your on your own ;) ):

  • Strongly recommended: bring your own internet connection, as the connection there might not be reliable

Making Sense of Unicode in Python

I finally managed to make sense out of string encoding in Python. Seems like we have two types of strings, unicode that is identified as u’unicode string’, and regular ascii encoded strings, which are the standard python strings.

The replace solution which I suggested in an earlier post kept failing with ascii decoder errors, which I couldn’t explain at that time given that I was encoding the strings using ‘utf-8’ as such: encodedstring = rawstring.encode('utf-8','replace')

I even tried out the excellent chardet module that predicts what encoding is used for the strings thinking that I needed to identify the correct encoding and use it instead of utf-8. But as it turns out, rawstring is an ascii based python string and encoding it will also result in an ascii based python string. Before I could properly encode to utf-8 I needed to ensure that the string is a unicode python string.

So I tried the following, but to no avail: encodedstring = unicode(rawstring).encode('utf-8','replace')

The solution to my problem lied with the codecs module. As it turns out, you need to convert your strings to unicode as early as possible in the lifetime of your program to avoid undefined behavior. For me, it was the moment i read the byte streams from the log file I was trying to parse. What I did was:

import codecs file ='utf-8','r','replace') for line in file.readline(): #do some work on unicde, utf-8 encoded line

line is now a unicode string encoded as utf-8. I was finally able to parse log messages with weird international characters and store them using django orm.


Parsing xml using xml.etree was not as straight forward, I’ll keep that discussion for another time.

Solution to my Encoding Problem

When you analyse over 300K commit messages, changing problematic encodings gets a bit tiresome. This got me motivated to look for a solution and here it is:

obj.text = parsed_log_message.encode('utf-8','replace')

Saving obj will stop DjangoUnicodeDecodeError from squealing and replace problematic characters with a ‘?’. I guess this is part of the zen of Python:

Errors should never pass silently. Unless explicitly silenced.

Django Evolution Gotcha

Django evolution is the closest thing to a steroid when it comes to enhancing the productivity of a Django developer working with RDBMSs. So close, it’s even got some nasty side effects if you rely on it so much.

I made the mistake of doing ./ reset app after changing a model structure for an app that was being tracked by django-evolution. Almost all ./ commands gave nasty errors whenever I try to use them afterwords. So I removed the django-evolution app from my installed app list, and got things working again.

Two days pass, and I fall into withdrawal from relying on this drug known as django evolution, I had to have it again. I installed it again after doing a ./ flush, and db related management commands refused to work. Then it hit me. All I had to do was:

./ reset django_evolution

Things then got back to normal. I believe what happened is django evolution got out of sync with my db state after the reset that I did. This solution surely fixed it, but I lost all the evolution history of my database. If anyone out there knows a better way to fix this problem, and still maintain previous history, I would be thankful.

Encoding Blues

I’ve been working today on data collection for my dissertation and wrote some python scripts to parse the logs of a number of FLOSS repositories and store the data into a Django model to make querying the data easier. So I run a script to collect the log messages for the year 2008, and everything seems to be progressing fine. You can see the names of projects my script is working on flying by the screen, until it hit the Linux Kernel 2.6.

The activity on the project is absolutely enormous compared to the other projects in my data sample (which includes Wine and Django). The names that were flying up my screen simply stopped as if we hit a brick wall.

So I wait for a minute, then I got a flat tire: File "/usr/lib/python2.5/site-packages/django/utils/", line 77, in force_unicode raise DjangoUnicodeDecodeError(s, *e.args) django.utils.encoding.DjangoUnicodeDecodeError: 'utf8' codec can't decode bytes in position 705-708: invalid data. You passed in '...Signed-off-by: Bj\xf6rn Steinbrink <> Signed-off-by: David S. Miller <>\n\n' (<type 'str'>)

I beleive everything is configured for UTF8 encoding on my end, but I suspect this part of the string is problematic Bj\xf6rn. I normally would replace the character, but since im dealing with 200+ projects and well over 100K commit messages, I don’t think this would be a good option.

I hope this doesn’t take very long to fix.

Update on situation: Took me an hour of playing with encoding only to have conceded in the end. I decided to simply modify the text. I don’t think I have the time for this.

Dumping PHP in favor of Python and Django — Part 1

We have been using PHP at since we have started at year 2000. In 2004 we started looking for a CMS that will make our life easier, our choice was Joomla! which covered our needs pretty well at that time, we have deployed Joomla! in 2005, after couple of years of using it we have discovered many strange behaviors of the CMS and poor performance when it comes to speed. The situation was annoying which lead to looking for an alternative by the end of 2006 we started CMS and Framework research and comparison which took couple of months, the comparison included typo3, drupal, joomla 1.5, zope, plone. In 2007 we were about to choose between Drupal and Plone. I dont remember how we have discovered Django framework (Thank God!), we were very much interested in Django features, we started tracking the project till the end of 2007.

It is, I dont feel like continuing my post now. I will probably continue the post when I feel like doing so.