django 源码解析 - 1

338次阅读  |  发布于2年以前

Django是一款经典的Python Web开发框架,也是最受欢迎的Python开源项目之一。不同于Flask框架,Django是高度集成的,可以帮助开发者快速搭建一个Web项目。从本周开始,我们一起进入Djaong项目的源码解析,加深对django的理解,熟练掌握python web开发。django采用的源码版本是: 4.0.0,我们首先采用概读法,大概了解django项目的创建和启动过程,包括下面几个部分:

django简单回顾

创建好虚拟环境,安装django的 4.0.0 版本。这个版本和官方的最新文档一致,而且有完整的中文翻译版本。我们可以跟随「快速安装指南」创建一个django项目,感受一下其魅力。首先使用 startproject 命令创建一个名叫 hello 的项目,django会在本地搭建一个基础的项目结构,进入hello项目后,可以直接使用 runserver 命令启动项目。

python3 -m django startproject hello
cd hello  && python3 manage.py runserver

django提供很好的模块化支持,可以利用它做大型web项目的开发。在project下的模块称为app,一个project可以包括多个app模块, 和flask的blue-print类似。我们继续使用 startapp 命令来创建一个叫做 api 的app。

python3 -m django startapp api

对api-app的 views.py,我们需要完善一下,增加下面内容:

from django.shortcuts import render

# Create your views here.

from django.http import HttpResponse


def index(request):
    return HttpResponse("Hello, Game404. You're at the index.")

然后再创建一个urls.py的模块文件,定义url和view的映射关系,填写:

from django.urls import path

from . import views

urlpatterns = [
    path('', views.index, name='index'),
]

完成api-app的实现后,我们需要在project中添加上自定义的api模块。这需要两步,先是在hello-project的setting.py中配置这个api-app:

INSTALLED_APPS = [
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',
    'api',
]

再在hello-project的urls.py导入api-app中定义的url和视图:


from django.contrib import admin
from django.urls import path, include

urlpatterns = [
    path('admin/', admin.site.urls),
    path('api/', include('api.urls')),
]

剩下的内容,都可以使用模版的标准实现。然后我们再次启动项目,访问下面路径:

➜  ~ curl http://127.0.0.1:8000/api/
Hello, Game404. You're at the index.%

我们基本完成了一个最简单的django项目创建和启动,接下来我们一起了解这个流程是如何实现的。

django项目概况

django项目源码大概包括下面一些包:

功能描述
apps django的app管理器
conf 配置信息,主要有项目模版和app模版等
contrib django默认提供的标准app
core django核心功能
db 数据库模型实现
dispatch 信号,用于模块解耦
forms 表单实现
http http协议和服务相关实现
middleware django提供的标准中间件
template && templatetags 模版功能
test 单元测试的支持
urls 一些url的处理类
utils 工具类
views 视图相关的实现

我也简单对比了一下django和flask的代码情况:

-------------------------------------------------------------------------------
Project                      files          blank        comment           code
-------------------------------------------------------------------------------
django                         716          19001          25416          87825
flask                           20           1611           3158           3587

仅从文件数量和代码行数可以看到,django是一个庞大的框架,不同于flask是一个简易框架,有700多个模块文件和近9万行代码。在阅读flask源码前,我们还需要先了解sqlalchemy,werkzeug等依赖框架,从pyproject.toml中的依赖项可发现,django默认就没有其它依赖,可以直接开始。这有点类似ios和Android系统,flask需要各种插件的支持,更为开放;django则是全部集成,使用默认的框架就已经可以处理绝大部分web开放需求了。

django命令脚手架

django提供了一系列脚手架命令,协助开发者创建和管理django项目。可以使用 help 参数查看命令清单:

python3 -m django --help

Type 'python -m django help <subcommand>' for help on a specific subcommand.

Available subcommands:

[django]
    check
    compilemessages
    createcachetable
    dbshell
    diffsettings
    dumpdata
    flush
    inspectdb
    loaddata
    makemessages
    makemigrations
    migrate
    runserver
    sendtestemail
    shell
    showmigrations
    sqlflush
    sqlmigrate
    sqlsequencereset
    squashmigrations
    startapp
    startproject
    test
    testserver

前面使用到的 startprojectstartapprunserver 三个命令是本篇重点介绍的命令,其它的二十多个命令我们在使用到的时候再行介绍。这是 概读法 的精髓,只关注主干,先建立全局视野,再逐步深入。

django模块的main函数在 __main__.py 中提供:

"""
Invokes django-admin when the django module is run as a script.

Example: python -m django check
"""
from django.core import management

if __name__ == "__main__":
    management.execute_from_command_line()

可以看到 django.core.management 模块提供脚手架的功能实现。同样在project的 manager.py 中也是通过调用management模块来实现项目启动:

#!/usr/bin/env python
"""Django's command-line utility for administrative tasks."""
import os
import sys

def main():
    """Run administrative tasks."""
    os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'hello.settings')
    try:
        from django.core.management import execute_from_command_line
    except ImportError as exc:
        ...
    execute_from_command_line(sys.argv)

if __name__ == '__main__':
    main()

ManagementUtility的主要结构如下:

class ManagementUtility:
    """
    Encapsulate the logic of the django-admin and manage.py utilities.
    """
    def __init__(self, argv=None):
        # 命令行参数
        self.argv = argv or sys.argv[:]
        self.prog_name = os.path.basename(self.argv[0])
        if self.prog_name == '__main__.py':
            self.prog_name = 'python -m django'
        self.settings_exception = None

    def main_help_text(self, commands_only=False):
        """Return the script's main help text, as a string."""
        pass

    def fetch_command(self, subcommand):
        """
        Try to fetch the given subcommand, printing a message with the
        appropriate command called from the command line (usually
        "django-admin" or "manage.py") if it can't be found.
        """
        pass

    ...

    def execute(self):
        """
        Given the command-line arguments, figure out which subcommand is being
        run, create a parser appropriate to that command, and run it.
        """
        pass

一个好的命令行工具,离不开清晰的帮助输出。默认的帮助信息是调用main_help_text函数:

def main_help_text(self, commands_only=False):
    """Return the script's main help text, as a string."""

    usage = [
        "",
        "Type '%s help <subcommand>' for help on a specific subcommand." % self.prog_name,
        "",
        "Available subcommands:",
    ]
    commands_dict = defaultdict(lambda: [])
    for name, app in get_commands().items():
        commands_dict[app].append(name)
    style = color_style()
    for app in sorted(commands_dict):
        usage.append("")
        usage.append(style.NOTICE("[%s]" % app))
        for name in sorted(commands_dict[app]):
            usage.append("    %s" % name)
    # Output an extra note if settings are not properly configured
    if self.settings_exception is not None:
        usage.append(style.NOTICE(
            "Note that only Django core commands are listed "
            "as settings are not properly configured (error: %s)."
            % self.settings_exception))

    return '\n'.join(usage)

main_help_text主要利用了下面4个函数去查找命令清单:

def find_commands(management_dir):
    """
    Given a path to a management directory, return a list of all the command
    names that are available.
    """
    command_dir = os.path.join(management_dir, 'commands')
    return [name for _, name, is_pkg in pkgutil.iter_modules([command_dir])
            if not is_pkg and not name.startswith('_')]

def load_command_class(app_name, name):
    """
    Given a command name and an application name, return the Command
    class instance. Allow all errors raised by the import process
    (ImportError, AttributeError) to propagate.
    """
    module = import_module('%s.management.commands.%s' % (app_name, name))
    return module.Command()

def get_commands():
    commands = {name: 'django.core' for name in find_commands(__path__[0])}

    if not settings.configured:
        return commands

    for app_config in reversed(list(apps.get_app_configs())):
        path = os.path.join(app_config.path, 'management')
        commands.update({name: app_config.name for name in find_commands(path)})

    return commands

def call_command(command_name, *args, **options):
    pass

find_commands查找management/commands下的命令文件,load_command_class使用 import_module 这个动态加载模块的方法导入命令。

子命令的执行是通过fetch_command找到子命令,然后执行子命令的run_from_argv方法。

def execute(self):
    ...
    self.fetch_command(subcommand).run_from_argv(self.argv)

def fetch_command(self, subcommand):
    """
    Try to fetch the given subcommand, printing a message with the
    appropriate command called from the command line (usually
    "django-admin" or "manage.py") if it can't be found.
    """
    # Get commands outside of try block to prevent swallowing exceptions
    commands = get_commands()
    try:
        app_name = commands[subcommand]
    except KeyError:
        ...
    if isinstance(app_name, BaseCommand):
        # If the command is already loaded, use it directly.
        klass = app_name
    else:
        klass = load_command_class(app_name, subcommand)
    return klass

在django的core中包含了下面这些命令,多数都直接继承自BaseCommand:

BaseCommand的主要代码结构:

最重要的run_from_argv和execute方法都是命令的执行入口:

def run_from_argv(self, argv):
    ...
    parser = self.create_parser(argv[0], argv[1])

    options = parser.parse_args(argv[2:])
    cmd_options = vars(options)
    # Move positional args out of options to mimic legacy optparse
    args = cmd_options.pop('args', ())
    handle_default_options(options)
    try:
        self.execute(*args, **cmd_options)
    except CommandError as e:
        ...

def execute(self, *args, **options):
    output = self.handle(*args, **options)
    return output

handler方法则留给子类覆盖实现:

def handle(self, *args, **options):
    """
    The actual logic of the command. Subclasses must implement
    this method.
    """
    raise NotImplementedError('subclasses of BaseCommand must provide a handle() method')

startproject && startapp 命令实现

startprojectstartapp两个命令分别创建项目和app,都派生自TemplateCommand。主要功能实现都在TemplateCommand中。

# startproject
def handle(self, **options):
    project_name = options.pop('name')
    target = options.pop('directory')

    # Create a random SECRET_KEY to put it in the main settings.
    options['secret_key'] = SECRET_KEY_INSECURE_PREFIX + get_random_secret_key()

    super().handle('project', project_name, target, **options)

# startapp
def handle(self, **options):
    app_name = options.pop('name')
    target = options.pop('directory')
    super().handle('app', app_name, target, **options)

我们使用startproject命令后的项目结构大概如下:

├── hello
│   ├── __init__.py
│   ├── asgi.py
│   ├── settings.py
│   ├── urls.py
│   └── wsgi.py
└── manage.py
这和conf下的project_templat

这和conf下的project_template目录中的模版文件一致:

.
├── manage.py-tpl
└── project_name
    ├── __init__.py-tpl
    ├── asgi.py-tpl
    ├── settings.py-tpl
    ├── urls.py-tpl
    └── wsgi.py-tpl

manage.py-tpl模版文件内容:

#!/usr/bin/env python
"""Django's command-line utility for administrative tasks."""
import os
import sys

def main():
    """Run administrative tasks."""
    os.environ.setdefault('DJANGO_SETTINGS_MODULE', '{{ project_name }}.settings')
    try:
        from django.core.management import execute_from_command_line
    except ImportError as exc:
        ....

if __name__ == '__main__':
    main()

可见startproject命令的功能是接收开发者输入的project_name,然后渲染到模版文件中,再生成项目文件。

TemplateCommand中模版处理的函数主要内容如下:

from django.template import Context, Engine

def handle(self, app_or_project, name, target=None, **options):
    ...
    base_name = '%s_name' % app_or_project
    base_subdir = '%s_template' % app_or_project
    base_directory = '%s_directory' % app_or_project
    camel_case_name = 'camel_case_%s_name' % app_or_project
    camel_case_value = ''.join(x for x in name.title() if x != '_')
    ...
    context = Context({
        **options,
        base_name: name,
        base_directory: top_dir,
        camel_case_name: camel_case_value,
        'docs_version': get_docs_version(),
        'django_version': django.__version__,
    }, autoescape=False)   
    ...
    template_dir = self.handle_template(options['template'],
                                            base_subdir)
    ...   
    for root, dirs, files in os.walk(template_dir):
        for filename in files:
            if new_path.endswith(extensions) or filename in extra_files:
                with open(old_path, encoding='utf-8') as template_file:
                    content = template_file.read()
                    template = Engine().from_string(content)
                    content = template.render(context)
                    with open(new_path, 'w', encoding='utf-8') as new_file:
                        new_file.write(content)

django.template.Engine如何工作,和模版引擎mako有什么区别,以后再行介绍,本章我们只需要了解即可。

runserver 命令实现

runserver提供了一个开发测试的http服务,帮助启动django项目,也是django使用频率最高的命令。django项目遵循wsgi规范,执行启动之前需要先查找wsgi-application。(如果对wsgi规范不了解的同学,欢迎翻看之前的文章)

def get_internal_wsgi_application():
    """
    Load and return the WSGI application as configured by the user in
    ``settings.WSGI_APPLICATION``. With the default ``startproject`` layout,
    this will be the ``application`` object in ``projectname/wsgi.py``.

    This function, and the ``WSGI_APPLICATION`` setting itself, are only useful
    for Django's internal server (runserver); external WSGI servers should just
    be configured to point to the correct application object directly.

    If settings.WSGI_APPLICATION is not set (is ``None``), return
    whatever ``django.core.wsgi.get_wsgi_application`` returns.
    """
    from django.conf import settings
    app_path = getattr(settings, 'WSGI_APPLICATION')
    if app_path is None:
        return get_wsgi_application()

    try:
        return import_string(app_path)
    except ImportError as err:
        ...

结合注释可以知道这里会载入开发者在project的settings中定义的application:

WSGI_APPLICATION = 'hello.wsgi.application'

默认情况下,自定义的wsgi-application是这样的:

import os

from django.core.wsgi import get_wsgi_application

os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'hello.settings')

application = get_wsgi_application()

继续查看runserver的执行, 主要是 inner_run 函数:

from django.core.servers.basehttp import run

def inner_run(self, *args, **options):
    try:
        handler = self.get_handler(*args, **options)
        run(self.addr, int(self.port), handler,
            ipv6=self.use_ipv6, threading=threading, server_cls=self.server_cls)
    except OSError as e:
        ...
        # Need to use an OS exit because sys.exit doesn't work in a thread
        os._exit(1)
    except KeyboardInterrupt:
        ..
        sys.exit(0)

http服务的功能实现由 django.core.servers.basehttp 提供,完成http服务和wsgi之间的衔接,其架构图如下:

run函数的代码:

def run(addr, port, wsgi_handler, ipv6=False, threading=False, server_cls=WSGIServer):
    server_address = (addr, port)
    if threading:
        httpd_cls = type('WSGIServer', (socketserver.ThreadingMixIn, server_cls), {})
    else:
        httpd_cls = server_cls
    httpd = httpd_cls(server_address, WSGIRequestHandler, ipv6=ipv6)
    if threading:
        # ThreadingMixIn.daemon_threads indicates how threads will behave on an
        # abrupt shutdown; like quitting the server by the user or restarting
        # by the auto-reloader. True means the server will not wait for thread
        # termination before it quits. This will make auto-reloader faster
        # and will prevent the need to kill the server manually if a thread
        # isn't terminating correctly.
        httpd.daemon_threads = True
    httpd.set_app(wsgi_handler)
    httpd.serve_forever()

django的wsgi-application实现, http协议的实现,下一章再行详细介绍,我们也暂时跳过。在runserver中还有一个非常重要的功能: 自动重启服务 。如果我们修改了项目的代码,服务会自动重启,可以提高开发效率。

比如我们修改api的视图功能,随便增加几个字符,可以在控制台看到大概下面的输出:

# python manage.py runserver
Watching for file changes with StatReloader
Performing system checks...

System check identified no issues (0 silenced).
March 05, 2022 - 09:06:07
Django version 4.0, using settings 'hello.settings'
Starting development server at http://127.0.0.1:8000/
Quit the server with CONTROL-C.
/Users/yoo/tmp/django/hello/api/views.py changed, reloading.
Watching for file changes with StatReloader
Performing system checks...

System check identified no issues (0 silenced).
March 05, 2022 - 09:12:59
Django version 4.0, using settings 'hello.settings'
Starting development server at http://127.0.0.1:8000/
Quit the server with CONTROL-C.

runserver命令检测到 /hello/api/views.py 有修改,然后自动使用StatReloader重启服务。

inner_run有下面2种启动方式, 默认情况下使用autoreload启动:

def run(self, **options):
    """Run the server, using the autoreloader if needed."""
    ...
    use_reloader = options['use_reloader']

    if use_reloader:
        autoreload.run_with_reloader(self.inner_run, **options)
    else:
        self.inner_run(None, **options)

autoreload的继承关系如下:

class BaseReloader:
    pass

class StatReloader(BaseReloader):
    pass

class WatchmanReloader(BaseReloader):
    pass

def get_reloader():
    """Return the most suitable reloader for this environment."""
    try:
        WatchmanReloader.check_availability()
    except WatchmanUnavailable:
        return StatReloader()
    return WatchmanReloader()

当前版本优先使用WatchmanReloader实现,这依赖于pywatchman库,需要额外安装。否则使用StatReloader的实现,这个实现在之前介绍werkzeug中也有过介绍,本质上都是持续的监听文件的状态变化。

class StatReloader(BaseReloader):
    SLEEP_TIME = 1  # Check for changes once per second.

    def tick(self):
        mtimes = {}
        while True:
            for filepath, mtime in self.snapshot_files():
                old_time = mtimes.get(filepath)
                mtimes[filepath] = mtime
                if old_time is None:
                    logger.debug('File %s first seen with mtime %s', filepath, mtime)
                    continue
                elif mtime > old_time:
                    logger.debug('File %s previous mtime: %s, current mtime: %s', filepath, old_time, mtime)
                    self.notify_file_changed(filepath)

            time.sleep(self.SLEEP_TIME)
            yield

当文件有变化后,退出当前进程,并使用subprocess启动新的进程:

def trigger_reload(filename):
    logger.info('%s changed, reloading.', filename)
    sys.exit(3)

def restart_with_reloader():
    new_environ = {**os.environ, DJANGO_AUTORELOAD_ENV: 'true'}
    args = get_child_arguments()
    while True:
        p = subprocess.run(args, env=new_environ, close_fds=False)
        if p.returncode != 3:
            return p.returncode

app的setup过程

runserver命令在执行之前还有很重要的一步就是setup: 加载和初始化开发者自定义的app内容。这是在ManagementUtility的execute函数中开始的:

def execute(self):
    ...
    try:
        settings.INSTALLED_APPS
    except ImproperlyConfigured as exc:
        self.settings_exception = exc
    except ImportError as exc:
        self.settings_exception = exc
    ...

Settings中会载下面的一些模块,主要是INSTALLED_APPS:

class Settings:
    def __init__(self, settings_module):
        ...
        # store the settings module in case someone later cares
        self.SETTINGS_MODULE = settings_module

        mod = importlib.import_module(self.SETTINGS_MODULE)

        tuple_settings = (
            'ALLOWED_HOSTS',
            "INSTALLED_APPS",
            "TEMPLATE_DIRS",
            "LOCALE_PATHS",
        )
        self._explicit_settings = set()
        ...

INSTALLED_APPS在项目的setting中定义:

# Application definition

INSTALLED_APPS = [
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',
    'api',
]

这样django框架就完成了开发者自定义的内容动态加载。

小结

Django是一个高度集成的python web开发框架,支持模块化开发。django还提供了一系列脚手架命令,比如使用startproject和startapp协助创建项目和模块模版;使用runserver辅助测试和开发项目。django项目也符合wsgi规范,其http服务的启动中创建了WSGIServer,并且支持多线程模式。django作为一个框架,可以通过约定的配置文件setting动态加载开发者的业务实现。

小技巧

django命令支持智能提示。比如我们 runserver 命令时,不小心输入错误的把字母 u 打成了 i。命令会自动提醒我们,是不是想使用 runserver 命令:

python -m django rinserver
No Django settings specified.
Unknown command: 'rinserver'. Did you mean runserver?
Type 'python -m django help' for usage.

智能提示的功能对命令行工具很有帮助,一般的实现就是比较用户输入和已知命令的重合度,从而找到最接近的命令。这是「字符串编辑距离」算法的实际应用。个人认为理解场景,这会比死刷算法更有用,我偶尔会在面试的时候使用这个例子来观测面试人的算法水准。django这里直接使用python标准库difflib提供的实现:

from difflib import get_close_matches

possible_matches = get_close_matches(subcommand, commands)
sys.stderr.write('Unknown command: %r' % subcommand)
if possible_matches:
    sys.stderr.write('. Did you mean %s?' % possible_matches[0])

get_close_matches的使用示例:

>>> get_close_matches("appel", ["ape", "apple", "peach", "puppy"])
['apple', 'ape']

对算法感兴趣的同学,可以自己进一步了解其实现细节。

另外一个小技巧是一个命名的异化细节。class 一般在很多开发语言中都是关键字,如果我们要定义一个class类型的变量名时候,避免关键字冲突。一种方法是使用 klass 替代:

# 
if isinstance(app_name, BaseCommand):
    # If the command is already loaded, use it directly.
    klass = app_name
else:
    klass = load_command_class(app_name, subcommand)

另一种是使用 clazz 替代:

if ( classes.length ) {
 while ( ( elem = this[ i++ ] ) ) {
  curValue = getClass( elem );
  ...
  if ( cur ) {
   j = 0;
   while ( ( clazz = classes[ j++ ] ) ) {

    // Remove *all* instances
    while ( cur.indexOf( " " + clazz + " " ) > -1 ) {
     cur = cur.replace( " " + clazz + " ", " " );
    }
   }
      ...
  }
 }
}

大家一般都用哪一种呢?

参考链接

Copyright© 2013-2020

All Rights Reserved 京ICP备2023019179号-8