All articles, tagged with “django”

Introductory Python Programming Sessions in Kuwait

KBSoft will be conducting a series of introductory sessions designed to introduce programmers to new programming tools and to help improve their programming skills. The sessions are:

Sunday, Nov 21st 2010 7pm — 9pm: An Introduction to Revision Control with Git.

The goal of this session is to introduce how revision control systems can be an indispensable tool to programmers. Git will be the tool of choice for this session and we will go through a number of exercises to show how useful it can be to both individual programmers and programming teams. We will also be introducing a number of best practices for using using Git and help the attendees get more familiar with the system.

Tuesday, Nov 23rd 2010 7pm — 9pm: An Introduction to the Python Programming Language.

The goal of this session is to introduce python as a general purpose programming language that can be used to solve most problems faced by programmers in Kuwait. There will be a number of exercises to introduce the language syntax and features. In addition to an overview of some of the useful packages in the standard library, language best practices, and how to setup a functional development environment.

Thursday, Nov 25th 2010 7pm — 9pm: An Introduction to Web Application Development with Django.

The goal of this session is to introduce Django as the tool of choice for web development. Our approach will be to contrast the Django development model with that of the common PHP model that most attendees might be familiar with as they explored and learned PHP. We will be introducing the main components of the Django framework and go through a simple exercise that would give the users an appreciation of how useful and time-saving this framework can be.

The sessions will be held in Kuwait Information Technology Society (KITS, formerly KCS) in AlRawda. The building is at the very corner of AlRawda directly in front of AlJabriya and on the intersection of the 4th ring road with King Fahad Highway.

Requirements:

  • Understanding of at least a single programming language (e.g., php, vb, c, Java)

  • Laptop with the following installed to go through the exercises (No love will be shown for Windows users, your on your own ;) ):

  • Strongly recommended: bring your own internet connection, as the connection there might not be reliable

The Mythical Django Pony, a blessing?

On my way to have lunch on the first day of DjangoCon in Portland, I met Eric Holscher in one of the corridors of the hotel holding a pink unicorn. It seemed odd to me so I approached and asked him about what he had in his hands. He explained that this was the unofficial Django mascot, which was a pony. I didn’t make the obvious observation that what he had in his hands was a unicorn, but asked how this came to be. He explained that one of the core dev on the Django mailing list responded to one of the feature requests saying “no, you can’t have a Pony!” as a way of politely refusing the feature request.

I was surprised to be in a DjangoCon session two days later where Russell Keith-Magee talks about declined feature requests and how they are referred to as ponies, and suggests ways in which your features are likely to get accepted. He also explained the whole story behind the mythical Django pony (which is really a unicorn!).

Why am I bringing this up now? well as I write the concluding chapters of my dissertation, I notice an odd relationship between the number of modules present in a code base and the number of new contributors. The statistical model suggests that adding modules to a code base is associated with an increase in the number of new contributors. However, this relationship reverses itself for projects that have an above average number of modules. So adding modules when there are already a high number of modules results in fewer contributors joining the development effort over time. The same effect could be observed for average module size (measured in SLOC), where an increase in the average size of a module is associated with an increase in new contributors up to a certain point. Then, the relationship starts to reverse (Quadratic effect for those of you who are statistically inclined).

It dawned on me that one of the explanations for such an observation is that an increase in the number of modules or an increase in the average size of a module is a result of the increased complexity in the code base from adding or implementing a feature. Assuming that the projects in my sample are not mature to a point where there is no need for new contributors to join, then we can attribute the decrease or increase in numbers of new contributor to the balancing act of complexity where the community correctly decides to include just enough features as to not make the codebase overly complex for new participants and yet valuable enough for new members to start using and contributing to it (Please don’t bite my head off for implying causation here, I am just forwarding a hypothesis that seems to be supported by the data. It’s up to you whether to accept it or not).

So how does this relate to ponies? Well, I just might have put my finger on one of the things that makes Django unique, and that’s the core developers know how to play this balancing act by knowing when to include or refuse a new feature. This I believe is possible because there is what we can refer to as a Django philosophy in deciding which feature requests are considered ponies. This seems to be paying off as participation in Django is way off the charts. Don’t ask me what the Django philosophy is, as I have no idea. I am just observing its results. If someone out there thinks he knows what it is, or has a link, please do share.

Take away from this, at least for other FLOSS projects that want to learn from the Django community. Be clear on the goals you want to achieve with your project, and don’t be afraid of saying “No! you can’t have a pony!”. As for the Django community, you are already on the right track in trying to explain why features are refused. Keep at it!

Update: Here is the Django philosophy summed up quite eloquently by Dougal Matthews:

I think the philosophy is quite simple generally. “Does this need to be in the core?”

You can read his comment for more explanation.

Django … an outlier

While analyzing the development activity and code metrics for over 240 of the most actively developed FLOSS projects, guess which project popped out?

Yes Django! Its an outlier in terms of its activity. It’s influencing the results of my statistical analysis more than any other project as per the Cook’s distance diagnostic index. Let me bring your attention to the lonely dot that is close to the value 1 at the top right corner. I missed it at first, but noticed it when I looked at the sorted values.

This is telling us that at least among the sample that I have ,Python, C and C++ based actively developed FLOSS projects, Django (including its community) is quite unique.

I leave you with the graph of sorted Cook’s D values from my analysis:

Click to Enlarge Image

Contributor analysis

Some of you might find it interesting to know that over 900 unique contributors have participated in Django’s development since Jan 1st, 2007, as attributed by the svn log messages. I would say the community is very healthy especially if you compare it to other well known projects:

Django: 906
Pylons: 80
PyPy: 240
Linux Kernel: 4043
PostgreSQL: 150
Appache HTTP server: 118
SQLAlchemy: 36 (Contributors identified from trac tickets mentioned in svn log)
Python: 428

Note that I consider anonymous or guest contributors as a single contributor, so these number can be considered conservative if the project allows anonymous contributions.

How Great is Django’s Documentation?

One aspect of Django that never ceases to amaze me, is how well it is documented. I believe this aspect of the Django project got many of us to use it, me included. While doing some boring graduate donkey work, Django’s name popped out, and not surprisingly, as one of the highly documented code bases out there (only projects focusing on documentation came close!). So lets me take this opportunity to thank all those who might be reading this, who had a hand in making Django what it is (Thank you!)

The Lines of documentation to SLOC ratio is plotted against SLOC for 1st of Jan, 2009. Django has one of the highest ratios compared to other open source projects of similar size (See the top part of the Graph). Of course I could have plotted doc lines to SLOC, but then I have to show the deviation of Django from the regression line. This graph just makes it more obvious and is easier to plot:

Click to enlarge, and don’t forget to zoom in.

I also have a few questions should any Django contributors pass by. Any of you guys think that you might have over done it? Is it difficult to maintain the quality of the documentation and keep it up-to-date and did anything change after moving to a regular release schedule? Is it only contributors experienced with the Django code base, who are able to write documentation to ensure quality?

Since I also live in a world where ponies roam and correlation is always seen as causation, let me end this post by saying:

Good documentation is the cause of success for Django, and high ice cream consumption is the cause of increased deaths in neighborhoods with swimming pools during summer!

Django Code Base Modularity

Let me start first by defining what I mean by modularity. Modularity is how well source code files are arranged into groups that share maximum dependency (i.e. imports) within group and minimum dependency between groups.

Groups that share a high degree of dependency are said to be cohesive, and they usually serve a single function. When these cohesive groups have little dependencies between them the code base is said to be loosely coupled. When a code base is non-modular, then the whole group of source files share a high level of dependency between one another which makes the code base seem as a single monolithic unit.

This obsession with modularity and dependency graphs was actually sparked by Mark Ramm’s presentation in Djangocon. He had some rather excellent lessons learned for the community, but one part of his presentation stuck out for me where he compared django’s dependency graph with that of turbogears (around the 9th minute). I am no graph expert, but I am almost certain that eye balling graphs is not a good way to compare them or decide how well they are arranged. I think you now see where this is going.

I went ahead and generated the dependency graphs for both django trunk and turbogears trunk. For the fun of it, I also included other python based projects, CherryPy, SqlAlchemy and Genshi. Let me be clear on what I mean by dependency graph of trunk. I actually went through the whole trunk history of these projects and generated the dependency graph for each commit.

I ended up with a lot of graphs and eyeballing is certainly not a good way to compare them. As it turns out, the concept of modularity exists in graph theory and it matches the definition I just gave. I used a method by newman which identifies groups in graphs using a clustering method that attempts to maximize modularity. Modularity in graph theory is basically a characteristic of how a graph is partitioned.

When applying the method on a source dependency graph, the method groups files that share dependencies into groups (i.e. modules) and the identified groups would maximize the modularity of the graph. The identified modularity value from this method would be an upper limit for how modular the code base is. So without further ado, I give you the the result of the analysis where I calculated the modularity of the dependency graph after each commit, and averaged the values per month:

Modularity graphs

Some highlights

  • Django seems to have a good increasing trend (Django community, keep up the good work!).
  • Turbogears, what happened? this is Turbogears trunk btw so it’s V2.0, I think they should have listened to Mark Ramm’s presentation. Seems like something went wrong, maybe backwards compatibility?
  • I marked out the two highest jumps in Django’s modularity. I attributed the first to the Boulder sprint, since I couldn’t find any other significant news during April 2007. The second can be attributed to newform-admin branch merging into trunk.
  • If you are wondering where queryset-refactor is, look 3 points prior to merging of newforms-admin. I dont think it had an effect on modularity, any ideas why?
  • SQLAlchemy, well done guys! anyone worked on SQLAlchemy and can confirm that indeed their code is modular? I would appreciate any comments to confirm that there is some level of reliability in the method I am using (I need to graduate people).

I hope you find this all interesting. I’ll be sharing some more analysis about other FLOSS projects. I’m currently working on Pylons, Twisted, and Trac. I thought about doing Zope but my computer begged for mercy. Stay tuned!

Django Forms Gotcha

Well, I just wasted 4 good hours of my time thinking I was facing a bug. I set the prefix for a form while trying to set the initial value for for a field. Only the first instance of the instantiated form would include the initial value. This code illustrates my problem:

    from django import forms

    class MyForm(forms.Form):
        names = forms.CharField(required=False)
        id = forms.CharField(widget=forms.HiddenInput,required=False)


    print [MyForm({'id':y},prefix=y).as_p() for y in range(2)]

    #output:
    #
    #[u'<p><label for="id_names">Names:</label> <input type="text" name="names" #id="id_names" /><input type="hidden" name="id" value="0" id="id_id" /></p>',
    # u'<p><label for="id_1-names">Names:</label> <input type="text" name="1-names" #id="id_1-names" /><input type="hidden" name="1-id" id="id_1-id" /></p>']
    #
    # Notice how 2nd form instance doesnt have value=1 for hidden input field.

Solution was simple. Thanks to the guys at #django on freenode, all I needed to do was instantiate my form as such:

MyForm(initial={'id':y})

That fixed it for me. I was told that when using data=, I need to match the data key to field name including the prefix. With initial keyword, you dynamically set that.

Making Sense of Unicode in Python

I finally managed to make sense out of string encoding in Python. Seems like we have two types of strings, unicode that is identified as u’unicode string’, and regular ascii encoded strings, which are the standard python strings.

The replace solution which I suggested in an earlier post kept failing with ascii decoder errors, which I couldn’t explain at that time given that I was encoding the strings using ‘utf-8’ as such: encodedstring = rawstring.encode('utf-8','replace')

I even tried out the excellent chardet module that predicts what encoding is used for the strings thinking that I needed to identify the correct encoding and use it instead of utf-8. But as it turns out, rawstring is an ascii based python string and encoding it will also result in an ascii based python string. Before I could properly encode to utf-8 I needed to ensure that the string is a unicode python string.

So I tried the following, but to no avail: encodedstring = unicode(rawstring).encode('utf-8','replace')

The solution to my problem lied with the codecs module. As it turns out, you need to convert your strings to unicode as early as possible in the lifetime of your program to avoid undefined behavior. For me, it was the moment i read the byte streams from the log file I was trying to parse. What I did was:

import codecs file = codecs.open('utf-8','r','replace') for line in file.readline(): #do some work on unicde, utf-8 encoded line

line is now a unicode string encoded as utf-8. I was finally able to parse log messages with weird international characters and store them using django orm.

YAY!

Parsing xml using xml.etree was not as straight forward, I’ll keep that discussion for another time.

Solution to my Encoding Problem

When you analyse over 300K commit messages, changing problematic encodings gets a bit tiresome. This got me motivated to look for a solution and here it is:

obj.text = parsed_log_message.encode('utf-8','replace')

Saving obj will stop DjangoUnicodeDecodeError from squealing and replace problematic characters with a ‘?’. I guess this is part of the zen of Python:

Errors should never pass silently. Unless explicitly silenced.

Django Evolution Gotcha

Django evolution is the closest thing to a steroid when it comes to enhancing the productivity of a Django developer working with RDBMSs. So close, it’s even got some nasty side effects if you rely on it so much.

I made the mistake of doing ./manage.py reset app after changing a model structure for an app that was being tracked by django-evolution. Almost all ./manage.py commands gave nasty errors whenever I try to use them afterwords. So I removed the django-evolution app from my installed app list, and got things working again.

Two days pass, and I fall into withdrawal from relying on this drug known as django evolution, I had to have it again. I installed it again after doing a ./manage.py flush, and db related management commands refused to work. Then it hit me. All I had to do was:

./manage.py reset django_evolution

Things then got back to normal. I believe what happened is django evolution got out of sync with my db state after the reset that I did. This solution surely fixed it, but I lost all the evolution history of my database. If anyone out there knows a better way to fix this problem, and still maintain previous history, I would be thankful.