Building Web-Applications with Python (and WSGI)

a (maybe not so) gentle introduction to the basics of WSGI

Building web apps in Python is actually quite easy!


from some_magical_place import app

@app.route('/')
def hello():
    return 'Hello World!'
 
app.run()

That's actually (almost) working code.

Flask web framework

If you want to get started quickly pip install flask and replace


from some_magical_place import app

with


from flask import Flask
app = Flask(__name__)

Flask!


from flask import Flask
app = Flask(__name__)

@app.route('/')
def hello():
    return 'Hello World!'
 
app.run()

That's working code.

Doing is fun...

...but it's obviously more fun if you actually know what you're doing.

So let's take a look behind some of the magic behind that innocent looking snippet of code.

Goal of todays talk.

Understand the basic principles of HTTP and Gateway interfaces and build a little web app in Python without any thirdparty libraries.

Disclaimer: don't expect anything that's production-ready. In reality you will want to rely on frameworks to do the "dirty work" of handling the HTTP stack for you.

Before we get our hands dirty...

...here's some theory.

HTTP and the principle of REST

HTTP is...

Communication between client and server via request and response.
Request and response each contain a header and a body.
Header: collection of key-value pairs
Body: pretty much any type of data, encoded as string (i.e. plain text, JSON, BASE64-encoded binary)
Body may be empty, Header cannot.

Request types

Request can be of different types (called HTTP verbs)

GET: "get" a certain HTTP resource
POST: send data to HTTP resource
PUT: send data to specific location on server.
DELETE: delete data from specific location.
some more (HEAD, OPTIONS, PATCH, ...)

Response stati

Server can respond with a number of status codes

2xx: everything OK.
3xx: look somewhere else.
4xx: you're doing it wrong.
5xx: sorry, I screwed up. Try again.

The principle of REST

HTTP is a stateless protocol.

"Stateless" means, all information necessary for communication is contained in request and response. The client cannot expect the server to remember what happened in a previous request.

Example request


$ curl -v http://www.google.com/?q=some_search
> GET /?q=some_search HTTP/1.1
> User-Agent: curl/7.27.0
> Host: www.google.com
> Accept: */*
>

Example response


< HTTP/1.1 200 OK
< Date: Fri, 13 Dec 2013 15:01:16 GMT
< Expires: -1
< Cache-Control: private, max-age=0
< Content-Type: text/html; charset=ISO-8859-1
<
<!doctype html>
<html itemscope="" itemtype="http://schema.org/WebPage">
<head>
<meta content="Search the world's information, including webpages, images, videos and more. Google has many special features to help you find exactly what you're looking for." name="description">[...]

What happens here is the following:

client requests to GET the ressource `/?q=some_search` using the protocol specification HTTP/1.1 at the host www.google, identifying himself as the program curl, accepting any content format Google may consider appropriate.
the client does not send any content other than what's specified in the request headers (this is the most common scenario).
Google tells the client everything went ok, and gives some caching information
Google also informs us that it will deliver us html, encoded in ISO-8859-1 encoding.
After the headers there is a blank line
As content, Google delivers us the search results page as html.

That is, in a nutshell, the way HTTP works (there's obviously more know about that, see references). With that in mind we can move on to the idea of Gateway interfaces.

CGI - the archetypal Gateway Interface

CGI (Common Gateway Interface) was devised to specify how a web server could deliver dynamic content to the client by calling executable scripts

server exposes cgi binary directory to client
client can request execution of cgi-scripts
cgi script can be any executable file
request is read from stdin, response is written to stdout, logging information to stderr.
application can access request and server data via environment variables

Advantages of CGI

simple and effective
any programming language supported by server OS can be used (Perl, Java, C++, Haskell, Python... even bash)
server stack is simple, web server and application communicate via IO-streams
communication over IO-streams is convenient in Unix-environments (pipe-and-filter architecture).
many languages have CGI convenience wrappers (Python: import cgi)
all major web servers support CGI

Drawbacks of CGI

Performance...:

effective does not imply efficient (unfortunately)
every request starts new instance
every instance is discarded after request
if a cgi script takes long for startup it affects server performance

Also: CGI has some security concerns, even though they can be worked around by a good server configuration.

Performance is a big problem

The performance issue is a dealbreaker for many modern web applications.

Large webframeworks can take seconds to start up.
Large web applications (e.g. Redmine) can take half a minute to deliver first request (depending on server performance).

CGI-based deployment is often infeasible.

"Preforking" Gateway Interfaces

Modern gateway interfaces "prefork" the application process.

Application process is started before first request hits.
Process lives on after request is processed.
App process may serve more than one request.
For concurrent requests the server has to spawn multiple application processes.

"Preforking" Gateway Interfaces

Examples:

WSGI (Python)
Rack (Ruby)
JSGI/Jack (JavaScript)
PSGI/Plack (Perl)
WAI (Haskell)

WSGI inspired Rack, which in turn inspired Jack and Plack

"Preforking" Gateway Interfaces

Application is dispatched via a callable, e.g. a function.
Callable takes a dictionary (or list of key-value pairs) of enviroment variables as parameter.
Callable returns HTTP status code, HTTP headers and content as response.
CGI communicates via pipe-and-filter, preforking gateway interfaces communicate via functional interface.

"Preforking" Gateway Interfaces

example in pseudo-code


function application(environment) {
  status = "200 OK";  // set http status 
  headers = [ "content-type: text/plain" ];  // set headers
  content = [ "Hello world!\n", "How are you today?\n" ]; // prepare content
   
  return (status, headers, content); // return http response
}

WSGI - The Webserver Gateway Interface

WSGI was specified to propose a standard interface between web servers and Python web applications.

Distinguishes between two sides: server/gateway and application/framework.
The server invokes a callable object that is provided by the application.

WSGI - the application side

application object is a callable object
app object accepts two arguments: the environment dictionary and a callable (start_response)
start_response takes (at least) two arguments
object can be anything that can be called "like a function"

Example:


def hello_world(environment, start_response):
    status = "200 OK" 
    headers = [("content-type", "text/plain")]
    content = ["Hello world!\n", "How are you today?\n"]
    
    start_response(status, headers)   
    return content

WSGI - the gateway/server side

The gateway has to invoke the application callable for each request


def cgi_gateway(application):
    # [...]
    def write(data):
        """ helper method used to write content to stdout """
        # [...]
    
    def start_response(status, response_headers, exc_info=None):
        """ may only be called once, unless exc_info is given """
        if exc_info:
            # raise exception
        elif headers_set:
            raise Exception('headers already set')
        
        headers_set[:] = [status, response_headers]
        return write
        
    # call application
    environ = dict(os.environ.items())
    result = application(environ, start_response)
    # write data
    for data in result:
        if data:
            write(data)
    # send headers
    # [...]

Code taken from PEP 333.

Running a WSGI-compatible application

(At least) two methods: wsgiref (for development), uwsgi for production.

uwsgiref: Python reference implementation of WSGI.


def hello_world(environment, start_response):
    """ hello world app as defined above """
    # [...]
    
if __name__ == '__main__':
    from wsgiref.simple_server import make_server
    httpd = make_server('', 8000, hello_world)
    print "Serving on port 8000..."
    httpd.serve_forever()

uWSGI

uWSGI is a very advanced WSGI implementation and supports a lot of niceties, e.g. multiple application instances. It plays nicely with reverse proxy web servers, e.g. nginx.

You can get it very conveniently via pip.

time to do something useful... -ish

We'll use what we have now learned to build a little webapp.

JSONStore. A web api to store and traverse a JSON object.

User stories for our JSONStore API

I want to be able to send and store a JSON document at the root url.
I want to be able to look at a subpart of the JSON document by entering a resource identifier corresponding to the document path.
I want to be able to modify parts of the JSON document by sending another JSON document to the resource identifier of the document part.

Examples - pt I


PUT /

Content:
{
    "sniffles": [
        {"name": "fred"},
        {"name": "irma"},
        {"name": "wobble"}
    ],
    "snuffle": {"name": "gundar"}
}


GET /sniffles/

Response:
[
        {"name": "fred"},
        {"name": "irma"},
        {"name": "wobble"}
]

Examples - pt II


GET /sniffles/0

Response:
{"name": "fred"}


PUT /sniffles/0/rank

Content:
"sniffle chief"


GET /sniffles/0

Response:
{
    "name": "fred",
    "rank": "sniffle chief"
}

A word of advice when building your own web applications

Don't (!) build your web application using low-level Python libraries, unless you know what you're doing.

Good reasons to roll your own stack:

Learning more about WSGI and HTTP.
Your particular project has constraints no framework can satisfy (e.g. regulatory ISO-constraints).
You only need a very limited set of features you can oversee yourself.

Bad reasons to roll your own stack:

You're better than all those other web developers who built the existing frameworks.
You don't like the way framework XYZ has implemented a certain feature.
You don't trust other developers' code.

A number of good Python frameworks using wSGI

Flask (very lightweight, based on Werkzeug)
Django (also very good and full-featured)
Werkzeug (very basic WSGI-support layer, my favorite)
Tornado (good for asynchronous stuff)
web.py
Cherry.py

Most people these days use Django. There are also tons of other web frameworks out there.

That's it.

Now it's time for questions.

References and Recommended Reading

Deploying CGI-applications with uWSGI:
http://uwsgi-docs.readthedocs.org/en/latest/CGI.html
Building CGI-applications with bash-script:
http://www.team2053.org/docs/bashcgi/index.html
Specification of CGI:
http://tools.ietf.org/html/rfc3875 (CGI Version 1.1)
Some general information on Python and web application:
http://docs.python.org/2/howto/webservers.html
Python PEP333 (WSGI specification):
http://www.python.org/dev/peps/pep-0333/
Getting started with WSGI by Armin Ronacher:
http://lucumr.pocoo.org/2007/5/21/getting-started-with-wsgi/