Solution to my Encoding Problem
When you analyse over 300K commit messages, changing problematic encodings gets a bit tiresome. This got me motivated to look for a solution and here it is:
obj.text = parsed_log_message.encode('utf-8','replace')
Saving obj will stop DjangoUnicodeDecodeError from squealing and replace problematic characters with a ‘?’. I guess this is part of the zen of Python:
Errors should never pass silently. Unless explicitly silenced.






Comments