宋宝华: kvmalloc ——倚天剑屠龙刀两大神器合体?

1093次阅读  |  发布于4年以前

你应该曾经纠结过是用kmalloc(),还是vmalloc()?现在你不用那么纠结了,因为内核里面现在有个API叫kvmalloc(),可以认为是kmalloc()和vmalloc()的双剑合一。屠龙刀和倚天剑的合体

内核里面有大量的代码现在都使用了kvmalloc(),譬如:

source/ipc/msg.c

static int newque(struct ipc_namespace *ns, struct ipc_params *params)
{
  struct msg_queue *msq;
  int retval;
  key_t key = params->key;
  int msgflg = params->flg;

  msq = kvmalloc(sizeof(*msq), GFP_KERNEL);
  if (unlikely(!msq))
    return -ENOMEM;

  ...
}

这个代码在早期的内核里面是(比如v4.0-rc7/source/ipc/msg.c):

static int newque(struct ipc_namespace *ns, struct ipc_params *params)
{
  struct msg_queue *msq;
  int id, retval;
  key_t key = params->key;
  int msgflg = params->flg;

  msq = ipc_rcu_alloc(sizeof(*msq));
  if (!msq)
    return -ENOMEM;

 ...

}

看起来是用的这个函数申请内存:

ipc_rcu_alloc(sizeof(*msq))

那么这个ipc_rc_alloc()是怎么回事呢?

void *ipc_alloc(int size)
{
  void *out;
  if (size > PAGE_SIZE)
    out = vmalloc(size);
  else
    out = kmalloc(size, GFP_KERNEL);
  return out;
}

逻辑上是,大于一页的时候用vmalloc(),小于等于1页用kmalloc()。

而kvmalloc()的实现代码里面则对类似逻辑进行了非常智能地处理:

void *kvmalloc_node(size_t size, gfp_t flags, int node)
{
  gfp_t kmalloc_flags = flags;
  void *ret;

  /*
   * vmalloc uses GFP_KERNEL for some internal allocations (e.g page tables)
   * so the given set of flags has to be compatible.
   */
  if ((flags & GFP_KERNEL) != GFP_KERNEL)
    return kmalloc_node(size, flags, node);

  /*
   * We want to attempt a large physically contiguous block first because
   * it is less likely to fragment multiple larger blocks and therefore
   * contribute to a long term fragmentation less than vmalloc fallback.
   * However make sure that larger requests are not too disruptive - no
   * OOM killer and no allocation failure warnings as we have a fallback.
   */
  if (size > PAGE_SIZE) {
    kmalloc_flags |= __GFP_NOWARN;

    if (!(kmalloc_flags & __GFP_RETRY_MAYFAIL))
      kmalloc_flags |= __GFP_NORETRY;
  }

  ret = kmalloc_node(size, kmalloc_flags, node);

  /*
   * It doesn't really make sense to fallback to vmalloc for sub page
   * requests
   */
  if (ret || size <= PAGE_SIZE)
    return ret;

  return __vmalloc_node_flags_caller(size, node, flags,
      __builtin_return_address(0));
}
EXPORT_SYMBOL(kvmalloc_node);

static inline void *kvmalloc(size_t size, gfp_t flags)
{
  return kvmalloc_node(size, flags, NUMA_NO_NODE);
}

大于一个page的时候,会先用kmalloc()进行__GFP_NORETRY的尝试,如果尝试失败就fallback到vmalloc(NORETRY标记避免了kmalloc在申请内存失败地情况下,反复尝试甚至做OOM来获得内存)。

当然,kvmalloc()的size如果小于1个page,则沿用老的kmalloc()逻辑,而且也不会设置__GFP_NORETRY,如果反复尝试失败的话,也不会fallback到vmalloc(),因为vmalloc()申请小于1个page的内存是不合适的。

凡事都没有绝对的,当咱们还在纠结是kmalloc()还是vmalloc()的时候,人家已经造出了kvmalloc()。咱的纠结,相对于人家的创造,是不是有一种要钻进去地洞的感觉?思考是最重要的,脑洞要开地大一点,被动地学习永远只是追着别人的脑子跑。

Copyright© 2013-2020

All Rights Reserved 京ICP备2023019179号-8