WSGI 理解（1）

affectalways 收录于 python WSGI

2020-06-20 约 1955 字预计阅读 4 分钟次阅读

WSGI 是个什么东西？

实际的生产环境中，Python应用程序是放在服务器的http server（比如Apache、Nginx等）上的。现在的问题是http server（之后以服务器代称）怎么把接收到的请求传递给Python应用程序？这就是WSGI做的事情。

WSGI（Web Server Gateway Interface）即Web服务器网关接口，解耦了服务器（Apache、Nginx等）和Python应用程序，是Python开发者只需要关注Python应用程序的开发。

Web Server：即HTTP Server，接收用户的请求并返回响应信息；分为以下两部分：

服务器，如Apache、Nginx等

Python应用程序，负责处理业务逻辑

HTTP Server 实现

服务器每接收到一个请求就调用一次Python Application。服务器作用如下

接收HTTP请求
提供environ和回调函数start_response，并传给callable object
调用callable object

以下是PEP-3333提供的示例

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83


import os, sys

enc, esc = sys.getfilesystemencoding(), 'surrogateescape'

def unicode_to_wsgi(u):
    # Convert an environment variable to a WSGI "bytes-as-unicode" string
    return u.encode(enc, esc).decode('iso-8859-1')

def wsgi_to_bytes(s):
    return s.encode('iso-8859-1')


def run_with_cgi(application):
	"""
		application: 是Python Application中的callable object
	"""
    # 构造environ变量，dict类型，里面的内容是一次HTTP请求的环境变量
    environ = {k: unicode_to_wsgi(v) for k,v in os.environ.items()}
    environ['wsgi.input']        = sys.stdin.buffer
    environ['wsgi.errors']       = sys.stderr
    environ['wsgi.version']      = (1, 0)
    environ['wsgi.multithread']  = False
    environ['wsgi.multiprocess'] = True
    environ['wsgi.run_once']     = True

    if environ.get('HTTPS', 'off') in ('on', '1'):
        environ['wsgi.url_scheme'] = 'https'
    else:
        environ['wsgi.url_scheme'] = 'http'

    headers_set = []
    headers_sent = []

    # 把响应信息写到终端
    def write(data):
        out = sys.stdout.buffer

        if not headers_set:
             raise AssertionError("write() before start_response()")

        elif not headers_sent:
             # Before the first output, send the stored headers
             status, response_headers = headers_sent[:] = headers_set
             out.write(wsgi_to_bytes('Status: %s\r\n' % status))
             for header in response_headers:
                 out.write(wsgi_to_bytes('%s: %s\r\n' % header))
             out.write(wsgi_to_bytes('\r\n'))

        out.write(data)
        out.flush()

    # 定义start_response回调函数
    def start_response(status, response_headers, exc_info=None):
        if exc_info:
            try:
                if headers_sent:
                    # Re-raise original exception if headers sent
                    raise exc_info[1].with_traceback(exc_info[2])
            finally:
                exc_info = None     # avoid dangling circular ref
        elif headers_set:
            raise AssertionError("Headers already set!")

        headers_set[:] = [status, response_headers]

        # Note: error checking on the headers should happen here,
        # *after* the headers are set.  That way, if an error
        # occurs, start_response can only be re-called with
        # exc_info set.

        return write

    result = application(environ, start_response)
    try:
        # 处理application返回的结果（可迭代）
        for data in result:
            if data:    # don't send headers until body appears
                write(data)
        if not headers_sent:
            write('')   # send headers now if body was empty
    finally:
        if hasattr(result, 'close'):
            result.close()

中间件Middleware

Middlerware是位于Http Server和Python Application之间的功能组件。

对于Http Server而言，Middlerware就是应用程序；对于Python Application而言，Middlerware就是Http Server。Middleware对Http Server和Python Application是透明的，把从Http Server接收到的请求进行处理并向后传递，一直传递给Python Application，最后把Python Application的处理结果返回给Http Server。如下图：

Middlerware组件可执行以下功能：

根据 url 把用户请求调度到不同的 Python Application 中。
负载均衡，转发用户请求
预处理 XSL 等相关数据
限制请求速率，设置白名单

PS：WSGI 的 middleware 体现了 unix 的哲学之一：do one thing and do it well。

本例实现了一个关于异常处理的 middleware（摘自）：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39


from sys import exc_info
from traceback import format_tb

class ExceptionMiddleware(object):
    """The middleware we use."""

    def __init__(self, app):
        self.app = app

    def __call__(self, environ, start_response):
        """Call the application can catch exceptions."""
        appiter = None
        # just call the application and send the output back
        # unchanged but catch exceptions
        try:
            appiter = self.app(environ, start_response)
            for item in appiter:
                yield item
        # if an exception occours we get the exception information
        # and prepare a traceback we can render
        except:
            e_type, e_value, tb = exc_info()
            traceback = ['Traceback (most recent call last):']
            traceback += format_tb(tb)
            traceback.append('%s: %s' % (e_type.__name__, e_value))
            # we might have not a stated response by now. try
            # to start one with the status code 500 or ignore an
            # raised exception if the application already started one.
            try:
                start_response('500 INTERNAL SERVER ERROR', [
                               ('Content-Type', 'text/plain')])
            except:
                pass
            yield '\n'.join(traceback)

        # wsgi applications might have a close function. If it exists
        # it *must* be called.
        if hasattr(appiter, 'close'):
            appiter.close()

Python Application

Python Application端必须定义一个 callable object，callable object 可以是以下三者之一：

function/method
class
instance with a __call__ method

callable object必须满足以下两个条件：

接收两个参数：environ（字典，WSGI的环境信息）、start_response（响应请求的函数, 返回HTTP status、headers给server）
返回一个可迭代的值（iterable）

重点内容：

environ和start_response由http server提供并实现

environ变量是包含环境变量的字典

Python Application内部在返回前调用start_response

start_response也是一个callable，接收两个必要的参数，status和response_headers

callable object代码实现

1.function/method

1
2
3
4


def application(environ, start_response):
	# 调用服务器程序提供的 start_response，填入两个参数
	start_response('200 OK', [('Content-Type', 'text/json')])
	return []

2.class

1
2
3
4
5
6
7
8


class ApplicationClass(object):
	def __init__(self, environ, start_response):
		self.environ = environ
		self.start_response = start_response
	
	def __iter__(self):
		self.start_response('200 OK', [('Content-Type', 'text/json')])
		yield "随便"

使用方式

1
2


for result in ApplicationClass(environ, start_response):
    do_somthing(result)

3.instance with a call method

1
2
3
4
5
6
7


class ApplicationClass(object):
	def __init__(self):
		pass
		
	def __call__(self, environ, start_response):
		start_response('200 OK', [('Content-Type', 'text/json')])
		yield "anything"