Unicode in Flask

    Flask has a few assumptions about your application (which you can changeof course) that give you basic and painless Unicode support:

    • the encoding for text on your website is UTF-8

    • encoding and decoding happens whenever you are talking over a protocolthat requires bytes to be transmitted.

    So what does this mean to you?

    HTTP is based on bytes. Not only the protocol, also the system used toaddress documents on servers (so called URIs or URLs). However HTML whichis usually transmitted on top of HTTP supports a large variety ofcharacter sets and which ones are used, are transmitted in an HTTP header.To not make this too complex Flask just assumes that if you are sendingUnicode out you want it to be UTF-8 encoded. Flask will do the encodingand setting of the appropriate headers for you.

    The same is true if you are talking to databases with the help ofSQLAlchemy or a similar ORM system. Some databases have a protocol thatalready transmits Unicode and if they do not, SQLAlchemy or your other ORMshould take care of that.

    • as long as you are using ASCII code points only (basically numbers,some special characters of Latin letters without umlauts or anythingfancy) you can use regular string literals ().

    • if you need anything else than ASCII in a string you have to markthis string as Unicode string by prefixing it with a lowercase u.(like u'Hänsel und Gretel')

    • if you are using non-Unicode characters in your Python files you haveto tell Python which encoding your file uses. Again, I recommendUTF-8 for this purpose. To tell the interpreter your encoding you canput the into the first or second line ofyour Python source file.

    • Jinja is configured to decode the template files from UTF-8. So makesure to tell your editor to save the file as UTF-8 there as well.

    If you are talking with a filesystem or something that is not really basedon Unicode you will have to ensure that you decode properly when workingwith Unicode interface. So for example if you want to load a file on thefilesystem and embed it into a Jinja2 template you will have to decode itfrom the encoding of that file. Here the old problem that text files donot specify their encoding comes into play. So do yourself a favour andlimit yourself to UTF-8 for text files as well.

    Anyways. To load such a file with Unicode you can use the built-instr.decode() method:

    To go from Unicode into a specific charset such as UTF-8 you can use the method:

    • Emacs: either use an encoding cookie or put this into your .emacsfile:

    • Notepad++:

      • Go to Settings -> Preferences …

      • Select the “New Document/Default Directory” tab

      • Select “UTF-8 without BOM” as encoding

    It is also recommended to use the Unix newline format, you can selectit in the same panel but this is not a requirement.