iOS Crash 治理:淘宝VisionKitCore 问题修复

808次阅读  |  发布于1年以前

本文通过逆向系统,阅读汇编指令,逐步找到源码,定位到了 iOS 16.0.<iOS 16.2 WKWebView 的系统bug 。同时苹果已经在新版本修复了 Bug,对于巨大的存量用户,仍旧会造成日均 Crash pv 1200+ uv 1000+, 最终通过 Hook 系统行为,规避此 Bug。在手机淘宝双 11 版本中已经彻底修复,Crash 跌 0。

01 背景

手机淘宝的 Crash 率(Crash+Abort)维持在了 x% 左右一两年的时间了,今年组织又提出了更高的要求,努力把 Crash 再降一降, 我也参与到了其中,我在其中负责几个疑难杂症,有幸定位解决了一些操作系统的 Bug。本文将Crash在 VisionKitCore 的系统 Bug 调研过程以及解决方案记录一下。

02 Crash 信息

堆栈特征:

Noteable Address 特征:

额外信息:(观察到都是图文详情)

PS 有水印不方便透出, 额外信息为 改造 KSCrash 附带的当前页面信息。

版本特征:

crash 占比:有堆栈 Crash 第三名

以上简单信息已经可以佐证,首先这大概率是一个操作系统 Bug, 并且由于前期念纪大佬治理了较多业务堆栈问题,这个疑难杂症已经登上了 Crash (有堆栈)的排行榜 Top 3 了,必须要投入解决了。

03 排查定位

先在苹果论坛搜索了下这个 Crash 堆栈,发现果然有人反馈过这个 Crash。

https://developer.apple.com/forums/thread/718305

发现去年 苹果论坛有人反馈是因为在webview 长按复制图片的逻辑中触发了这个 bug,有位用户反馈了,禁用掉这个 WKWebview 长按手势就可以规避掉这个 Crash (其实不行)。基于以上信息进行测试,并且从 平台 找到一个用户访问的图文详情尝试寻找堆栈。

WKWebView *webview = [[WKWebView alloc] initWithFrame:self.view.bounds];
[self.view addSubview:webview];
[webview loadRequest:[NSURLRequest requestWithURL:[NSURL URLWithString:@"https://url"]] ];

论坛用户描述的是:禁用长按就不会 crash,但是我测试下来,禁用长按只会让 wkwebview 不创建选择框,但是还是会走创建图片的逻辑,同时手机淘宝的 WebView 容器禁用掉了默认的长按选择框,只实现了一个保存图片的功能,因此这个帖子的解决办法并不能解决手机淘宝的 bug。

刚好今年系统性学习了下 Arm 64 汇编,刚好锻炼下新掌握的知识,从底层找下 Bug、简要堆栈。

Incident Identifier: 9DAC8C95-D65D-4AA2-BF12-D36DC1A7F3B8
CrashReporter Key:   KSCrash2
Hardware Model:      iPhone14,2
Process:             Taobao4iPhone [20565]
Path:                /private/var/containers/Bundle/Application/36FBCF28-38AA-40B3-8234-EDAE1B3D6611/Taobao4iPhone.app/Taobao4iPhone
Identifier:          TBXDetailViewController|com.taobao.taobao4iphone
Version:             31863389 (10.27.40)
Code Type:           ARM-64
Parent Process:      ? [1]

Date/Time:           2023-09-11 21:32:18 +0800
Launch Time:         2023-09-11 21:27:01 +0800
OS Version:          iOS 16.1.1 (20B101)
Report Version:      104

Exception Type:  EXC_BAD_ACCESS
Exception Codes: KERN_INVALID_ADDRESS at 0x0000000173fc8000
Exception Subtype: SIGSEGV
Triggered by Thread:  99

Thread 99 Crashed:
0   libsystem_platform.dylib        0x000000021350e930 0x21350e000 + 2352    _platform_memmove :96 (in libsystem_platform.dylib)
1   CoreGraphics                    0x00000001c8159988 0x1c80f3000 + 420232  _CGDataProviderCreateWithCopyOfData :20 (in CoreGraphics)
2   CoreGraphics                    0x00000001c8142648 0x1c80f3000 + 325192  _CGBitmapContextCreateImage :216 (in CoreGraphics)
3   VisionKitCore                   0x0000000208405ad0 0x2083fa000 + 47824   -[VKCRemoveBackgroundResult _createCGImageFromBGRAPixelBuffer:cropRect:] :348 (in VisionKitCore)
4   VisionKitCore                   0x0000000208405880 0x2083fa000 + 47232   -[VKCRemoveBackgroundResult createCGImage] :156 (in VisionKitCore)
5   VisionKitCore                   0x000000020849da98 0x2083fa000 + 670360   __vk_cgImageRemoveBackgroundWithDownsizing_block_invoke :64 (in VisionKitCore)
6   VisionKitCore                   0x0000000208473a5c 0x2083fa000 + 498268   __63-[VKCRemoveBackgroundRequestHandler performRequest:completion:]_block_invoke.5 :436 (in VisionKitCore)
7   MediaAnalysisServices           0x0000000209847968 0x209840000 + 31080    __92-[MADService performRequests:onPixelBuffer:withOrientation:andIdentifier:completionHandler:]_block_invoke.38 :400 (in MediaAnalysisServices)
8   CoreFoundation                  0x00000001c65b8704 0x1c6544000 + 476932   __invoking___ :148 (in CoreFoundation)
9   CoreFoundation                  0x00000001c6564b6c 0x1c6544000 + 133996   -[NSInvocation invoke] :428 (in CoreFoundation)
10  Foundation                      0x00000001c09c5b08 0x1c0924000 + 662280   __NSXPCCONNECTION_IS_CALLING_OUT_TO_REPLY_BLOCK__ :16 (in Foundation)
11  Foundation                      0x00000001c0996ef0 0x1c0924000 + 470768   -[NSXPCConnection _decodeAndInvokeReplyBlockWithEvent:sequence:replyInfo:] :520 (in Foundation)
12  Foundation                      0x00000001c0f702e4 0x1c0924000 + 6603492  __88-[NSXPCConnection _sendInvocation:orArguments:count:methodSignature:selector:withProxy:]_block_invoke_5 :188 (in Foundation)
13  libxpc.dylib                    0x0000000213604f1c 0x2135e7000 + 122652   _xpc_connection_reply_callout :124 (in libxpc.dylib)
14  libxpc.dylib                    0x00000002135f7fb4 0x2135e7000 + 69556    _xpc_connection_call_reply_async :88 (in libxpc.dylib)
15  libdispatch.dylib               0x00000001cdb1e05c 0x1cdb1a000 + 16476    _dispatch_client_callout3 :20 (in libdispatch.dylib)
16  libdispatch.dylib               0x00000001cdb3bf58 0x1cdb1a000 + 139096   _dispatch_mach_msg_async_reply_invoke :344 (in libdispatch.dylib)
17  libdispatch.dylib               0x00000001cdb2556c 0x1cdb1a000 + 46444    _dispatch_lane_serial_drain :376 (in libdispatch.dylib)
18  libdispatch.dylib               0x00000001cdb26214 0x1cdb1a000 + 49684    _dispatch_lane_invoke :436 (in libdispatch.dylib)
19  libdispatch.dylib               0x00000001cdb30e10 0x1cdb1a000 + 93712    _dispatch_workloop_worker_thread :652 (in libdispatch.dylib)
20  libsystem_pthread.dylib         0x00000002135a3df8 0x2135a3000 + 3576     _pthread_wqthread :288 (in libsystem_pthread.dylib)
21  libsystem_pthread.dylib         0x00000002135a3b98 0x2135a3000 + 2968     _start_wqthread :8 (in libsystem_pthread.dylib)
Thread State:
     x8:0xe361afd13768009c     x9:0xe361afd13768009c     lr:0x00000001c8155de0     fp:0x000000016bcae040
    x10:0x0000000000000090    x12:0x0000000115204f70    x11:0x000000021cbc2268    x14:0x000000021dfff180
    x13:0x000000021dfff160    x16:0x000000021350e8d0    x15:0x00000000e781489a     sp:0x000000016bcadfb0
    x18:0x0000000000000000    x17:0x000000021e003320    x19:0x0000000000143c00   cpsr:0x0000000020001000
     pc:0x000000021350e930    x21:0x0000000000148000    x20:0x0000000173e846c8     x0:0x000000016fe8c6c8
    x23:0x000000016fe8c000     x1:0x0000000173fc8000    x22:0x000000016fe8c6c8     x2:0x00000000000002a8
    x25:0x0000000000000020     x3:0x000000016ffd0000    x24:0x000000021cbae570     x4:0x0000000003ff8000
    x27:0x0000000000000000     x5:0x0000000000000018    x26:0x0000000000000008     x6:0x000000000000002c
     x7:0x0000000000000000    x28:0x000000028040e180
Binary Images:
0x0000000104638000 - 0x000000010c70bfff Taobao4iPhone arm64  <23be6181e1c43ce9a6b37d61de01bab3> /private/var/containers/Bundle/Application/36FBCF28-38AA-40B3-8234-EDAE1B3D6611/Taobao4iPhone.app/Taobao4iPhone
0x000000021350e000 - 0x0000000213514ff3 libsystem_platform.dylib arm64e  <29a26364acef38c28b0ddb0dfca0bb65> /usr/lib/system/libsystem_platform.dylib
0x00000001c80f3000 - 0x00000001c8700ff3 CoreGraphics arm64e  <ffb3f1e74e3b3ff79d00be32c9d8133c> /System/Library/Frameworks/CoreGraphics.framework/CoreGraphics
0x00000002083fa000 - 0x0000000208500fff VisionKitCore arm64e  <ce997b5ba4b03818bba22d7f057bc3a2> /System/Library/PrivateFrameworks/VisionKitCore.framework/VisionKitCore
0x0000000209840000 - 0x000000020985dfff MediaAnalysisServices arm64e  <0c75ee56f3343b8ca96080651906e0dd> /System/Library/PrivateFrameworks/MediaAnalysisServices.framework/MediaAnalysisServices
0x00000001c6544000 - 0x00000001c6929fff CoreFoundation arm64e  <5cdc5d9ae5063740b64ebb30867b4f1b> /System/Library/Frameworks/CoreFoundation.framework/CoreFoundation
0x00000001c0924000 - 0x00000001c126dfff Foundation arm64e  <c431acb6fe043d28b6774de6e1c7d81f> /System/Library/Frameworks/Foundation.framework/Foundation


Notable Addresses:
memory near x0: 
    0x000000016fe8c678: 0000000000000000 0000000000000000 ................
    0x000000016fe8c688: 0000000000000000 0000000000000000 ................
    0x000000016fe8c698: 0000000000000000 0000000000000000 ................
    0x000000016fe8c6a8: 0000000000000000 0000000000000000 ................
    0x000000016fe8c6b8: 0000000000000000 0000000000000000 ................
  ->0x000000016fe8c6c8: 878787a3aaaaaac5 a9a9a9c5b5b5b5d2 ................
    0x000000016fe8c6d8: cbcbcbe7d5d5d5f0 d6d6d6f1d4d4d4f1 ................
    0x000000016fe8c6e8: d4d4d4f1d6d6d6f4 d8d8d8f7d8d8d8f8 ................
      [0xf8d8d8d8f7d8d8d8:  [objc_object: NSString()]]
    0x000000016fe8c6f8: d9d9d9fadadadafb dbdbdbfbdcdcdcfc ................
    0x000000016fe8c708: dededefddfdfdffe dfdfdffedfdfdffe ................
    0x000000016fe8c718: dfdfdffedfdfdffe dfdfdfffe0e0e0ff ................
    0x000000016fe8c728: e0e0e0ffe0e0e0ff e0e0e0ffdfdfdfff ................
      [0xffe0e0e0ffe0e0e0:  [objc_object: NSString(NATY2cRJ)]]
      [0xffdfdfdfffe0e0e0:  [objc_object: NSString(Hh-e2cRJ)]]
    0x000000016fe8c738: e0e0e0ffe0e0e0ff e0e0e0ffe0e0e0ff ................
      [0xffe0e0e0ffe0e0e0:  [objc_object: NSString(NATY2cRJ)]]
      [0xffe0e0e0ffe0e0e0:  [objc_object: NSString(NATY2cRJ)]]
    0x000000016fe8c748: e0e0e0ffe0e0e0ff e0e0e0ffe0e0e0ff ................
      [0xffe0e0e0ffe0e0e0:  [objc_object: NSString(NATY2cRJ)]]
      [0xffe0e0e0ffe0e0e0:  [objc_object: NSString(NATY2cRJ)]]
    0x000000016fe8c758: e0e0e0ffe0e0e0ff e0e0e0ffe0e0e0ff ................
      [0xffe0e0e0ffe0e0e0:  [objc_object: NSString(NATY2cRJ)]]
      [0xffe0e0e0ffe0e0e0:  [objc_object: NSString(NATY2cRJ)]]
  memory near x1: 
    0x0000000173fc7fb0: 0000000000000000 0000000000000000 ................
    0x0000000173fc7fc0: 0000000000000000 0000000000000000 ................
    0x0000000173fc7fd0: 0000000000000000 0000000000000000 ................
    0x0000000173fc7fe0: 0000000000000000 0000000000000000 ................
    0x0000000173fc7ff0: 0000000000000000 0000000000000000 ................
  memory near x3: 
    0x000000016ffcffb0: 0000000000000000 0000000000000000 ................
    0x000000016ffcffc0: 0000000000000000 0000000000000000 ................
    0x000000016ffcffd0: 0000000000000000 0000000000000000 ................
    0x000000016ffcffe0: 0000000000000000 0000000000000000 ................
    0x000000016ffcfff0: 0000000000000000 0000000000000000 ................
  ->0x000000016ffd0000: 0000000000000000 0000000000000000 ................
    0x000000016ffd0010: 0000000000000000 0000000000000000 ................
    0x000000016ffd0020: 0000000000000000 0000000000000000 ................
    0x000000016ffd0030: 0000000000000000 0000000000000000 ................
    0x000000016ffd0040: 0000000000000000 0000000000000000 ................
    0x000000016ffd0050: 0000000000000000 0000000000000000 ................
    0x000000016ffd0060: 0000000000000000 0000000000000000 ................
    0x000000016ffd0070: 0000000000000000 0000000000000000 ................
    0x000000016ffd0080: 0000000000000000 0000000000000000 ................
    0x000000016ffd0090: 0000000000000000 0000000000000000 ................

图文详情链接:
https://xxxx.xx.com

分析关键函数汇编指令

函数调用栈为:

0   libsystem_platform.dylib        0x00000001fb27a930 _platform_memmove :96 (in libsystem_platform.dylib)
1   CoreGraphics                    0x00000001afec1988 _CGDataProviderCreateWithCopyOfData :20 (in CoreGraphics)
2   CoreGraphics                    0x00000001afeaa648 _CGBitmapContextCreateImage :216 (in CoreGraphics)
3   VisionKitCore                   0x00000001f0171ad0 -[VKCRemoveBackgroundResult _createCGImageFromBGRAPixelBuffer:cropRect:] :348 (in VisionKitCore)
4   VisionKitCore                   0x00000001f0171880 -[VKCRemoveBackgroundResult createCGImage] :156 (in VisionKitCore)
5   VisionKitCore                   0x00000001f0209a98 __vk_cgImageRemoveBackgroundWithDownsizing_block_invoke :64 (in VisionKitCore)
  1. 基础知识:
  2. Arm 64 调用约定及传参规范(地址:https://developer.arm.com/documentation/den0024/a/The-ABI-for-ARM-64-bit-Architecture/Register-use-in-the-AArch64-Procedure-Call-Standard/Parameters-in-general-purpose-registers)

针对本文,只需要了解到,

  1. x0..x7 是函数调用时传递参数使用到的通用寄存器,分别为第 1 个 到 第 7 个标量参数
  1. 符号化,必须要选择与出现问题的操作系统一样的版本幸好万瑜 老师手里有一台 iOS 16.1.1 的手机。

libSystem_platform_platform_memmove 分析

__platform_memmove: (x0: dest, x1: src, x2: count)
00000001d3d628d0  sub  x3, x0, x1 ; x3 = x0 - x1 
00000001d3d628d4  cmp  x3, x2 ; x3 < x2?
00000001d3d628d8  b.lo  0x1d3d62aa0; 看起来是判断 src的尾部 和 dest 有没有重叠 本例没有满足小于
00000001d3d628dc  mov  x3, x0; x3 = x0 
00000001d3d628e0  cmp  x2, #0x40; x2 - 0x40? 
00000001d3d628e4  b.lo  0x1d3d62a7c ; 判断count 有没有小于 0x40, 本例没有满足小于
00000001d3d628e8  sub  x4, x1, x0 ; x4 = x1 - x0 
00000001d3d628ec  cmp  x4, x2; x4 - x2 ; 看起来是判断 dest 的尾部 和 src 有没有重叠 
00000001d3d628f0  b.lo  0x1d3d629b4; 也没有满足
00000001d3d628f4  cmp  x2, #0x4, lsl #12; 比较 count 是否小于 #0x4000,
00000001d3d628f8  b.lo  0x1d3d62958; 本例也没有小于
00000001d3d628fc  add  x3, x3, #0x20
00000001d3d62900  and  x3, x3, #0xffffffffffffffe0


00000001d3d62904  ldnp  q2, q3, [x1]
00000001d3d62908  sub  x5, x3, x0
00000001d3d6290c  add  x1, x1, x5
00000001d3d62910  ldnp  q0, q1, [x1]
00000001d3d62914  add  x1, x1, #0x20

00000001d3d62918  sub  x2, x2, x5
00000001d3d6291c  stnp  q2, q3, [x0]
00000001d3d62920  subs  x2, x2, #0x40
00000001d3d62924  b.ls  0x1d3d62940
00000001d3d62928  stnp  q0, q1, [x3]
00000001d3d6292c  add  x3, x3, #0x20
00000001d3d62930  ldnp  q0, q1, [x1]
; 崩溃第 16 行堆栈 这里 x1 的地址是 0x0000000173fc8000   

near x0附近全是 0000

通过分析,可以看到 __platform_memmove,的代码是一个较为常见的 memove 或者memcopy的实现,有一些首尾重叠校验,最终 Crash 的时候 发现 X1 寄存器的内存地址指向了一块数据,这快数据出现了异常。继续往看。

这里发现 _CGDataProviderCreateWithCopyOfData 地址跳转的是 _create_protected_copy,(⊙o⊙)…

神奇的是 Crash 堆栈里面并没有这个函数调用栈。并且_create_protected_copy 也没有找到任何关于 _platform_memmove 的 b、br、bl 调用 ,难道是这堆栈有点问题?

CoreGraphics`create_protected_copy:
->  0x19c8a1cd0 <+0>:   pacibsp 
    0x19c8a1cd4 <+4>:   sub    sp, sp, #0xa0
    0x19c8a1cd8 <+8>:   stp    x24, x23, [sp, #0x60]
    0x19c8a1cdc <+12>:  stp    x22, x21, [sp, #0x70]
    0x19c8a1ce0 <+16>:  stp    x20, x19, [sp, #0x80]
    0x19c8a1ce4 <+20>:  stp    x29, x30, [sp, #0x90]
    0x19c8a1ce8 <+24>:  add    x29, sp, #0x90
    0x19c8a1cec <+28>:  mov    x21, #0x0
    0x19c8a1cf0 <+32>:  cbz    x0, 0x19c8a1e5c           ; <+396>
    0x19c8a1cf4 <+36>:  mov    x19, x1
    0x19c8a1cf8 <+40>:  cbz    x1, 0x19c8a1e5c           ; <+396>
    0x19c8a1cfc <+44>:  mov    x20, x0
    0x19c8a1d00 <+48>:  adrp   x8, 341894
    0x19c8a1d04 <+52>:  ldr    x8, [x8, #0xaa0]
    0x19c8a1d08 <+56>:  ldr    x8, [x8]
    0x19c8a1d0c <+60>:  cmp    x8, x19
    0x19c8a1d10 <+64>:  b.ls   0x19c8a1d48               ; <+120>
    0x19c8a1d14 <+68>:  mov    x0, #0x0
    0x19c8a1d18 <+72>:  mov    x1, x20
    0x19c8a1d1c <+76>:  mov    x2, x19
    0x19c8a1d20 <+80>:  ldp    x29, x30, [sp, #0x90]
    0x19c8a1d24 <+84>:  ldp    x20, x19, [sp, #0x80]
    0x19c8a1d28 <+88>:  ldp    x22, x21, [sp, #0x70]
    0x19c8a1d2c <+92>:  ldp    x24, x23, [sp, #0x60]
    0x19c8a1d30 <+96>:  add    sp, sp, #0xa0
    0x19c8a1d34 <+100>: autibsp 
    0x19c8a1d38 <+104>: eor    x16, x30, x30, lsl #1
    0x19c8a1d3c <+108>: tbz    x16, #0x3e, 0x19c8a1d44   ; <+116>
    0x19c8a1d40 <+112>: brk    #0xc471
    0x19c8a1d44 <+116>: b      0x1a076cd80
    0x19c8a1d48 <+120>: neg    x9, x8
    0x19c8a1d4c <+124>: and    x22, x9, x20
    0x19c8a1d50 <+128>: add    x10, x19, x20
    0x19c8a1d54 <+132>: add    x8, x10, x8
    0x19c8a1d58 <+136>: sub    x8, x8, #0x1
    0x19c8a1d5c <+140>: and    x8, x8, x9
    0x19c8a1d60 <+144>: sub    x21, x8, x22
    0x19c8a1d64 <+148>: mov    x0, #0x0
    0x19c8a1d68 <+152>: mov    x1, x21
    0x19c8a1d6c <+156>: mov    w2, #0x3
    0x19c8a1d70 <+160>: mov    w3, #0x1002
    0x19c8a1d74 <+164>: mov    w4, #0x36000000
    0x19c8a1d78 <+168>: mov    x5, #0x0
    0x19c8a1d7c <+172>: bl     0x1a076e5c0
    0x19c8a1d80 <+176>: cmn    x0, #0x1
    0x19c8a1d84 <+180>: b.eq   0x19c8a1e58               ; <+392>
    0x19c8a1d88 <+184>: mov    x23, x0
    0x19c8a1d8c <+188>: adrp   x24, 341894
    0x19c8a1d90 <+192>: ldr    x24, [x24, #0xa80]
    0x19c8a1d94 <+196>: ldr    w0, [x24]
    0x19c8a1d98 <+200>: mov    x1, x22
    0x19c8a1d9c <+204>: mov    x2, x21
    0x19c8a1da0 <+208>: mov    x3, x23
    0x19c8a1da4 <+212>: bl     0x1a076ecd0
    0x19c8a1da8 <+216>: sub    x8, x20, x22
    0x19c8a1dac <+220>: add    x22, x8, x23
    0x19c8a1db0 <+224>: cbz    w0, 0x19c8a1de0           ; <+272>
    0x19c8a1db4 <+228>: adrp   x8, 1405
    0x19c8a1db8 <+232>: add    x8, x8, #0xed9            ; "copy_read_only"
    0x19c8a1dbc <+236>: stp    x8, x0, [sp]
    0x19c8a1dc0 <+240>: adrp   x1, 1405
    0x19c8a1dc4 <+244>: add    x1, x1, #0xeba            ; "%s: vm_copy failed: status %d."
    0x19c8a1dc8 <+248>: mov    w0, #0x0
    0x19c8a1dcc <+252>: bl     0x19cb1ffcc               ; CGLog
    0x19c8a1dd0 <+256>: mov    x0, x22
    0x19c8a1dd4 <+260>: mov    x1, x20
    0x19c8a1dd8 <+264>: mov    x2, x19
    0x19c8a1ddc <+268>: bl     0x19cc48f80               ; symbol stub for: memcpy
    0x19c8a1de0 <+272>: ldr    w0, [x24]
    0x19c8a1de4 <+276>: mov    x1, x22
    0x19c8a1de8 <+280>: mov    x2, x19
    0x19c8a1dec <+284>: mov    w3, #0x1
    0x19c8a1df0 <+288>: mov    w4, #0x1
    0x19c8a1df4 <+292>: bl     0x1a076ecf0
    0x19c8a1df8 <+296>: cbz    x22, 0x19c8a1e58          ; <+392>
    0x19c8a1dfc <+300>: cmp    x22, x20
    0x19c8a1e00 <+304>: b.eq   0x19c8a1d14               ; <+68>
    0x19c8a1e04 <+308>: movi.2d v0, #0000000000000000
    0x19c8a1e08 <+312>: stp    q0, q0, [sp, #0x30]
    0x19c8a1e0c <+316>: stp    q0, q0, [sp, #0x10]
    0x19c8a1e10 <+320>: str    x21, [sp, #0x18]
    0x19c8a1e14 <+324>: adrp   x16, -67
    0x19c8a1e18 <+328>: add    x16, x16, #0x544          ; vm_allocator_deallocate
    0x19c8a1e1c <+332>: paciza x16
    0x19c8a1e20 <+336>: stp    x16, xzr, [sp, #0x48]
    0x19c8a1e24 <+340>: add    x1, sp, #0x10
    0x19c8a1e28 <+344>: mov    x0, #0x0
    0x19c8a1e2c <+348>: bl     0x1a076cb00
    0x19c8a1e30 <+352>: mov    x20, x0
    0x19c8a1e34 <+356>: mov    x0, #0x0
    0x19c8a1e38 <+360>: mov    x1, x22
    0x19c8a1e3c <+364>: mov    x2, x19
    0x19c8a1e40 <+368>: mov    x3, x20
    0x19c8a1e44 <+372>: bl     0x1a076cdb0
    0x19c8a1e48 <+376>: mov    x21, x0
    0x19c8a1e4c <+380>: mov    x0, x20
    0x19c8a1e50 <+384>: bl     0x1a076d200
    0x19c8a1e54 <+388>: b      0x19c8a1e5c               ; <+396>
    0x19c8a1e58 <+392>: mov    x21, #0x0
    0x19c8a1e5c <+396>: mov    x0, x21
    0x19c8a1e60 <+400>: ldp    x29, x30, [sp, #0x90]
    0x19c8a1e64 <+404>: ldp    x20, x19, [sp, #0x80]
    0x19c8a1e68 <+408>: ldp    x22, x21, [sp, #0x70]
    0x19c8a1e6c <+412>: ldp    x24, x23, [sp, #0x60]
    0x19c8a1e70 <+416>: add    sp, sp, #0xa0
    0x19c8a1e74 <+420>: retab

最后经过请教了大佬同事,补充了一个知识盲区,x86_64的调用约定里面强制要求函数调用时需要将 pc 的下一行地址(返回地址)入栈,因此只需要遍历栈即可获取正确的函数调用栈。

但 Arm 64 体系结构中使用 LR 寄存器存放函数返回地址,如果当前函数也需要调用其他函数,就需要再 prolog 里面保存 lr 寄存器的地址。这也是大家经常在函数调用栈开始看到的模版代码:

WKCopy`+[ViewController load]:
    0x100b0c000 <+0>:  sub    sp, sp, #0x20 // 栈增长
    0x100b0c004 <+4>:  stp    x29, x30, [sp, #0x10] // 旧 lr 和 fp 存栈
    0x100b0c008 <+8>:  add    x29, sp, #0x10 // fp 指向 新的栈底 
  do some thing 
    0x100b0c020 <+32>: ldp    x29, x30, [sp, #0x10] // 恢复 旧的 lr 和 fp
    0x100b0c024 <+36>: add    sp, sp, #0x20 // 栈缩小
    0x100b0c028 <+40>: ret    //  返回上个调用栈

但是由于并不是所有函数都使用栈,这类函数叫 FrameLess 函数。比如 memset. memove memcpy 这类函数通常的逻辑都是 通过一个来源地址,每次拷贝一部分数据到寄存器,然后再从寄存器复制到目标地址中,并且地址长度增长到某个长度截止。

同时 Arm64 中还有一类不返回跳转指令,比如 b/br 一般用于桩指令。

在一些尾递归场景中为了省去不必要的返回(当函数发现我调用下一个函数没必要回来)也会直接使用 b 指令来进行优化。其实最常见的就是 msg_send 既用到了尾调用优化,又是 frameless 函数。

当进程 Crash 时,KSCrash 会对函数调用堆栈进行回溯如果函数是 FrameLess函数,规则会有一定细节处理具体来说就是:

  1. 崩溃当前函数,直接用 pc 地址,获取最后一个函数栈帧,获取起始范围,
  2. 遍历 上一个函数栈,通过 ldp fp, lr, x29 取出来 lr 计算函数栈
  3. 递归执行2,当lr执行到0的时候,证明到了 线程启动函数,终止。

代码见 KSStackCursor

但会有个场景 frameless function + b + frameless function crash, 导致堆栈看起来丢失。以本文为例,在这个里面丢失了两行堆栈原因是因为:

  1. memmove 是一个 尾调用优化,因此再尾调用优化的自身就丢失了,这确实是正常的
  2. platformmemove 是一个 frameless 函数,因此它没有保存栈的逻辑,取出来的栈上的lr其实是 _create_protected_copy 的函数栈,因为自己都是无栈的,所以丢失了 lr。碰见这种函数可以从 lr 地址里面去看函数地址。

所以本文其实真正的调用堆栈是:

0   libsystem_platform.dylib        0x00000001fb27a930 _platform_memmove :96 (in libsystem_platform.dylib)
  丢失的堆栈2        _memcpy
   丢失的堆栈1:_create_protected_copy 
1   CoreGraphics                    0x00000001afec1988 _CGDataProviderCreateWithCopyOfData :20 (in CoreGraphics)
2   CoreGraphics                    0x00000001afeaa648 _CGBitmapContextCreateImage :216 (in CoreGraphics)
3   VisionKitCore                   0x00000001f0171ad0 -[VKCRemoveBackgroundResult _createCGImageFromBGRAPixelBuffer:cropRect:] :348 (in VisionKitCore)
4   VisionKitCore                   0x00000001f0171880 -[VKCRemoveBackgroundResult createCGImage] :156 (in VisionKitCore)
5   VisionKitCore                   0x00000001f0209a98 __vk_cgImageRemoveBackgroundWithDownsizing_block_invoke :64 (in VisionKitCore)

接着看:

VisionKitCore`-[VKCRemoveBackgroundResult _createCGImageFromBGRAPixelBuffer:cropRect:]:

VisionKitCore`-[VKCRemoveBackgroundResult _createCGImageFromBGRAPixelBuffer:cropRect:]:
    0x1dcb51974 <+0>:   cbz    x0, 0x1dcb51b98           ; <+548>
    0x1dcb51978 <+4>:   pacibsp 
    0x1dcb5197c <+8>:   sub    sp, sp, #0x90
    0x1dcb51980 <+12>:  stp    d11, d10, [sp, #0x10]
    0x1dcb51984 <+16>:  stp    d9, d8, [sp, #0x20]
    0x1dcb51988 <+20>:  stp    x28, x27, [sp, #0x30]
    0x1dcb5198c <+24>:  stp    x26, x25, [sp, #0x40]
    0x1dcb51990 <+28>:  stp    x24, x23, [sp, #0x50]
    0x1dcb51994 <+32>:  stp    x22, x21, [sp, #0x60]
    0x1dcb51998 <+36>:  stp    x20, x19, [sp, #0x70]
    0x1dcb5199c <+40>:  stp    x29, x30, [sp, #0x80]
    0x1dcb519a0 <+44>:  add    x29, sp, #0x80
    0x1dcb519a4 <+48>:  fmov   d8, d3
    0x1dcb519a8 <+52>:  fmov   d9, d2
    0x1dcb519ac <+56>:  fmov   d10, d1
    0x1dcb519b0 <+60>:  fmov   d11, d0
    0x1dcb519b4 <+64>:  mov    x19, x2
    0x1dcb519b8 <+68>:  mov    x0, x2
    0x1dcb519bc <+72>:  mov    w1, #0x1
    0x1dcb519c0 <+76>:  bl     0x1dda62ce0   CVPixelBufferLockBaseAddress(cvpixbuffer)
    0x1dcb519c4 <+80>:  mov    x0, x19   
    0x1dcb519c8 <+84>:  bl     0x1dda62c90    CVPixelBufferGetBaseAddress 
    0x1dcb519cc <+88>:  mov    x21, x0  x21 = pix =  address  
    0x1dcb519d0 <+92>:  mov    x0, x19
    0x1dcb519d4 <+96>:  bl     0x1dda62cc0  CVPixelBufferGetPixelFormatType(cvpixbuffer) 
    0x1dcb519d8 <+100>: mov    x1, x0  x1 = pixformatdesc
    0x1dcb519dc <+104>: mov    x0, #0x0
    0x1dcb519e0 <+108>: bl     0x1dda62d20 CVPixelFormatDescriptionCreateWithPixelFormatType(NULL: allocator, pixformatdesc)
    0x1dcb519e4 <+112>: bl     0x1dda632f0  autorelease 
    0x1dcb519e8 <+116>: mov    x20, x0   x0 : __NSFrozenDictionaryM =  format desc
    0x1dcb519ec <+120>: cbz    x0, 0x1dcb51b20           ; <+428>  
    0x1dcb519f0 <+124>: cbz     x21, 0x1dcb51b5c; <+488>校验是否为空 baseAddress pixformatdesc is nil
    0x1dcb519f4 <+128>: fcvtmu x22, d9; 类型转换 x22 = d9 = d3 height
    0x1dcb519f8 <+132>: fcvtmu x23, d8 x23 = width
    0x1dcb519fc <+136>: adrp x8, 64224
    0x1dcb51a00 <+140>: ldr x8, [x8, #0x958]
    0x1dcb51a04 <+144>: ldr x2, [x8]
    0x1dcb51a08 <+148>: mov x0, x20
    0x1dcb51a0c <+152>: bl 0x1dcc41360 pifornmar[BitsPerBlock] is 32 ; objc_msgSend$objectForKeyedSubscript:
    0x1dcb51a10 <+156>: bl 0x1dda632f0 autorelease
    0x1dcb51a14 <+160>: mov x24, x0;  
    0x1dcb51a18 <+164>: bl 0x1dcc3ee00 [x0 integervalue]; objc_msgSend$integerValue
    0x1dcb51a1c <+168>: mov x25, x0  ; x25 = bitsperBlock = 32 
    0x1dcb51a20 <+172>: bl 0x1dda63450 x0 release
    0x1dcb51a24 <+176>: adrp x2, 102753
    0x1dcb51a28 <+180>: add x2, x2, #0x9e0; @"BitsPerComponent"
    0x1dcb51a2c <+184>: mov x0, x20
    0x1dcb51a30 <+188>: bl 0x1dcc41360 pifornmar[BitsPerComponent]; objc_msgSend$objectForKeyedSubscript:
    0x1dcb51a34 <+192>: bl 0x1dda632f0 autorelease
    0x1dcb51a38 <+196>: mov x24, x0  
    0x1dcb51a3c <+200>: bl 0x1dcc3ee00; objc_msgSend$integerValue
    0x1dcb51a40 <+204>: mov x26, x0 ;  x26 = 8 通道
    0x1dcb51a44 <+208>: bl 0x1dda63450   x0 release 
    0x1dcb51a48 <+212>: fmov d0, d11
    0x1dcb51a4c <+216>: fmov d1, d10
    0x1dcb51a50 <+220>: fmov d2, d9
    0x1dcb51a54 <+224>: fmov d3, d8
    0x1dcb51a58 <+228>: bl 0x1dda62af0   CGRectGetMinX 
    0x1dcb51a5c <+232>: fcvtmu x24, d0   x24 = minX
    0x1dcb51a60 <+236>: fmov d0, d11
    0x1dcb51a64 <+240>: fmov d1, d10
    0x1dcb51a68 <+244>: fmov d2, d9
    0x1dcb51a6c <+248>: fmov d3, d8  
    0x1dcb51a70 <+252>: bl 0x1dda62b00  CGRectGetMinY
    0x1dcb51a74 <+256>: fcvtmu x27, d0  ; x27 = minY
  0x1dcb51a78 <+260>: lsr x8, x25, #3 ; 右移三位  x8 = 4了
    0x1dcb51a7c <+264>: madd   x21, x8, x24, x21   (4 *  x24) +  baseAddress
    0x1dcb51a80 <+268>: mov    x0, x19
  0x1dcb51a84 <+272>: bl     0x1dda62ca0  CVPixelBufferGetBytesPerRow(pifbuffer, )
    0x1dcb51a88 <+276>: madd   x21, x0, x27, x21  (byterPerfow * minY ) + baseAddress 
    0x1dcb51a8c <+280>: adrp   x8, 64219
    0x1dcb51a90 <+284>: ldr    x8, [x8, #0xb60]
    0x1dcb51a94 <+288>: ldr    x0, [x8]
    0x1dcb51a98 <+292>: bl     0x1dda627e0    CGColorSpaceCreateWithName(kCGColorSpaceSRGB)
    0x1dcb51a9c <+296>: mov    x24, x0 x24 = <CGColorSpace 0x28121fe40> (kCGColorSpaceICCBased; kCGColorSpaceModelRGB; sRGB IEC61966-2.1)
    0x1dcb51aa0 <+300>: mov    x0, x19; 
    0x1dcb51aa4 <+304>: bl     0x1dda62ca0 CVPixelBufferGetBytesPerRow(pixbuffer)
    0x1dcb51aa8 <+308>: mov    x4, x0 x4 = btresPerrow
    0x1dcb51aac <+312>: mov    x0, x21
    0x1dcb51ab0 <+316>: mov    x1, x22
    0x1dcb51ab4 <+320>: mov    x2, x23
    0x1dcb51ab8 <+324>: mov    x3, x26
    0x1dcb51abc <+328>: mov    x5, x24
    0x1dcb51ac0 <+332>: mov    w6, #0x2002
    0x1dcb51ac4 <+336>: bl     0x1dda62700   CGBitmapContextCreate(baseAddress, width, height ,BitsPerComponent(8), btresPerrow, minX, 0x2002)
    0x1dcb51ac8 <+340>: mov    x21, x0;;  x0 = bitMap 
    0x1dcb51acc <+344>: bl     0x1dda62710;  _CGBitmapContextCreateImage 


// 
    0x1dcb51ad0 <+348>: mov    x22, x0
    0x1dcb51ad4 <+352>: mov    x0, x21
    0x1dcb51ad8 <+356>: bl     0x1dda62850
    0x1dcb51adc <+360>: mov    x0, x24
    0x1dcb51ae0 <+364>: bl     0x1dda62810
    0x1dcb51ae4 <+368>: mov    x0, x19
    0x1dcb51ae8 <+372>: mov    w1, #0x1
    0x1dcb51aec <+376>: bl     0x1dda62d10
    0x1dcb51af0 <+380>: bl     0x1dda63410
    0x1dcb51af4 <+384>: mov    x0, x22
    0x1dcb51af8 <+388>: ldp    x29, x30, [sp, #0x80]
    0x1dcb51afc <+392>: ldp    x20, x19, [sp, #0x70]
    0x1dcb51b00 <+396>: ldp    x22, x21, [sp, #0x60]
    0x1dcb51b04 <+400>: ldp    x24, x23, [sp, #0x50]
    0x1dcb51b08 <+404>: ldp    x26, x25, [sp, #0x40]
    0x1dcb51b0c <+408>: ldp    x28, x27, [sp, #0x30]
    0x1dcb51b10 <+412>: ldp    d9, d8, [sp, #0x20]
    0x1dcb51b14 <+416>: ldp    d11, d10, [sp, #0x10]
    0x1dcb51b18 <+420>: add    sp, sp, #0x90
    0x1dcb51b1c <+424>: retab  
    0x1dcb51b20 <+428>: adrp   x8, 76725
    0x1dcb51b24 <+432>: ldr    x0, [x8, #0xae8]
    0x1dcb51b28 <+436>: adrp   x8, 176
    0x1dcb51b2c <+440>: add    x8, x8, #0xc38            ; "pixelFormatDict"
    0x1dcb51b30 <+444>: str    x8, [sp]
    0x1dcb51b34 <+448>: adrp   x2, 176
    0x1dcb51b38 <+452>: add    x2, x2, #0xbb4            ; "((pixelFormatDict) != nil)"
    0x1dcb51b3c <+456>: adrp   x3, 176
    0x1dcb51b40 <+460>: add    x3, x3, #0xbcf            ; "-[VKCRemoveBackgroundResult _createCGImageFromBGRAPixelBuffer:cropRect:]"
    0x1dcb51b44 <+464>: adrp   x6, 102753
    0x1dcb51b48 <+468>: add    x6, x6, #0x9c0            ; @"Expected non-nil value for '%s'"
    0x1dcb51b4c <+472>: mov    w4, #0x0
    0x1dcb51b50 <+476>: mov    w5, #0x0
    0x1dcb51b54 <+480>: bl     0x1dcc3d080               ; objc_msgSend$handleFailedAssertWithCondition:functionName:simulateCrash:showAlert:format:
    0x1dcb51b58 <+484>: cbnz   x21, 0x1dcb519f4          ; <+128>
    0x1dcb51b5c <+488>: adrp   x8, 76725
    0x1dcb51b60 <+492>: ldr    x0, [x8, #0xae8]
    0x1dcb51b64 <+496>: adrp   x8, 176
    0x1dcb51b68 <+500>: add    x8, x8, #0xc65            ; "bufferBaseAddress"
    0x1dcb51b6c <+504>: str    x8, [sp]
    0x1dcb51b70 <+508>: adrp   x2, 176
    0x1dcb51b74 <+512>: add    x2, x2, #0xc48            ; "((bufferBaseAddress) != nil)"
    0x1dcb51b78 <+516>: adrp   x3, 176
    0x1dcb51b7c <+520>: add    x3, x3, #0xbcf            ; "-[VKCRemoveBackgroundResult _createCGImageFromBGRAPixelBuffer:cropRect:]"
    0x1dcb51b80 <+524>: adrp   x6, 102753
    0x1dcb51b84 <+528>: add    x6, x6, #0x9c0            ; @"Expected non-nil value for '%s'"
    0x1dcb51b88 <+532>: mov    w4, #0x0
    0x1dcb51b8c <+536>: mov    w5, #0x0
    0x1dcb51b90 <+540>: bl     0x1dcc3d080               ; objc_msgSend$handleFailedAssertWithCondition:functionName:simulateCrash:showAlert:format:
    0x1dcb51b94 <+544>: b      0x1dcb519f4               ; <+128>
    0x1dcb51b98 <+548>: ret

伪代码如下:

CVReturn return1 = CVPixelBufferLockBaseAddress(cvpixbuffer,YES)
 pixaddress = CVPixelBufferGetBaseAddress(cvpixbuffer)
 formatType = CVPixelBufferGetPixelFormatType(cvpixbuffer) 
 formatDesc: Dictionary = CVPixelFormatDescriptionCreateWithPixelFormatType(NULL, formatType)
 if pixaddress == nil || formatDesc {
    go other logic 
 }

 bitsPerBlock = [formatDesc[bitsperBlock] integerValue]
 bitsPerComponent = [formatDesc[BitsPerComponent] integerValue]

minx = CGRectGetMinX(x, y, w, h)
miny = CGRectGetMinY(x, y, w, h)
baseaddress = (4 *  minx) +  baseAddress
size byterPerRow =  CVPixelBufferGetBytesPerRow(pifbuffer)
 baseAddress (byterPerRow * minY ) + baseAddress 
 colorSpace =  CGColorSpaceCreateWithName(kCGColorSpaceSRGB)
 size byterPerRow =  CVPixelBufferGetBytesPerRow(pifbuffer)
bitMap = CGBitmapContextCreate(baseAddress, width, height ,BitsPerComponent(8), btresPerrow, minX,,colorspace, 0x2002)

CGBitmapContextCreateImage(bitMap)

VisionKitCore`-[VKCRemoveBackgroundResult createCGImage]:

VisionKitCore`-[VKCRemoveBackgroundResult createCGImage]:
    0x1dcb517e4 <+0>:   pacibsp 
    0x1dcb517e8 <+4>:   sub    sp, sp, #0x60
    0x1dcb517ec <+8>:   stp    d11, d10, [sp, #0x10]
    0x1dcb517f0 <+12>:  stp    d9, d8, [sp, #0x20]
    0x1dcb517f4 <+16>:  stp    x22, x21, [sp, #0x30]
    0x1dcb517f8 <+20>:  stp    x20, x19, [sp, #0x40]
    0x1dcb517fc <+24>:  stp    x29, x30, [sp, #0x50]
    0x1dcb51800 <+28>:  add    x29, sp, #0x50
    0x1dcb51804 <+32>:  mov    x20, x0
    0x1dcb51808 <+36>:  bl     0x1dcc41b00               ; objc_msgSend$pixelBuffer
    0x1dcb5180c <+40>:  mov    x19, x0 ; <CVPixelBuffer 0x283708dc0 width=1179 height=1825 bytesPerRow=4736 pixelFormat=BGRA iosurface=0x280214df0 poolName=CoreVideo attributes={
    0x1dcb51810 <+44>:  mov    x0, x20; <VKCRemoveBackgroundResult: 0x282b3cd90>
    0x1dcb51814 <+48>:  bl     0x1dcc3ac00               ; objc_msgSend$cropRect
        // 获取截图区域     d0 = 69.08203125 d1 = 349.31640625 d2 = 1045.44140625 d3 = 741.40625
    0x1dcb51818 <+52>:  fmov   d11, d0
    0x1dcb5181c <+56>:  fmov   d10, d1
    0x1dcb51820 <+60>:  fmov   d9, d2
    0x1dcb51824 <+64>:  fmov   d8, d3
    0x1dcb51828 <+68>:  cbz    x19, 0x1dcb51888; <+164>  判断pixbuffer是否为nil
    0x1dcb5182c <+72>:  fmov   d0, d11
    0x1dcb51830 <+76>:  fmov   d1, d10
    0x1dcb51834 <+80>:  fmov   d2, d9
    0x1dcb51838 <+84>:  fmov   d3, d8
    0x1dcb5183c <+88>:  bl     0x1dcb4cecc               ; VKMRectHasArea
    0x1dcb51840 <+92>:  cbz    w0, 0x1dcb51888 ; <+164> 是否在区域内
    0x1dcb51844 <+96>:  mov    w22, #0x5241
    0x1dcb51848 <+100>: movk   w22, #0x4247, lsl #16
    0x1dcb5184c <+104>: mov    x0, x19
    0x1dcb51850 <+108>: bl     0x1dda62d00; CVPixelBufferRetain , CVPixelBufferUnlockBaseAddress
    0x1dcb51854 <+112>: mov    x0, x19
    0x1dcb51858 <+116>: bl     0x1dda62cc0; CVPixelBufferGetPixelFormatType
    0x1dcb5185c <+120>: cmp    w0, w22; 不同就返回
    0x1dcb51860 <+124>: b.ne   0x1dcb518e4  CVPixelBufferGetWidth ; <+256>
    0x1dcb51864 <+128>: mov    x0, x20
    0x1dcb51868 <+132>: mov    x2, x19
    0x1dcb5186c <+136>: fmov   d0, d11
    0x1dcb51870 <+140>: fmov   d1, d10
    0x1dcb51874 <+144>: fmov   d2, d9
    0x1dcb51878 <+148>: fmov   d3, d8
        x0: self, x1: selector x2 cvpixbuffer x
        d0  d0 = 69.08203125
      d1 = 349.31640625
      d2 = 1045.44140625
      d3 = 741.40625
    0x1dcb5187c <+152>: bl     0x1dcb51974               ; -[VKCRemoveBackgroundResult _createCGImageFromBGRAPixelBuffer:cropRect:]




->  0x1dcb51880 <+156>: mov    x20, x0
    0x1dcb51884 <+160>: b      0x1dcb5194c               ; <+360>
    0x1dcb51888 <+164>: adrp   x8, 76725
    0x1dcb5188c <+168>: ldr    x20, [x8, #0xae8]
    0x1dcb51890 <+172>: fmov   d0, d11
    0x1dcb51894 <+176>: fmov   d1, d10
    0x1dcb51898 <+180>: fmov   d2, d9
    0x1dcb5189c <+184>: fmov   d3, d8
    0x1dcb518a0 <+188>: bl     0x1dcbd1834               ; VKMUIStringForRect
    0x1dcb518a4 <+192>: bl     0x1dda632f0
    0x1dcb518a8 <+196>: mov    x21, x0
    0x1dcb518ac <+200>: stp    x19, x0, [sp]
    0x1dcb518b0 <+204>: adrp   x2, 176
    0x1dcb518b4 <+208>: add    x2, x2, #0xaf3            ; "__objc_no"
    0x1dcb518b8 <+212>: adrp   x3, 176
    0x1dcb518bc <+216>: add    x3, x3, #0xafd            ; "-[VKCRemoveBackgroundResult createCGImage]"
    0x1dcb518c0 <+220>: adrp   x6, 102753
    0x1dcb518c4 <+224>: add    x6, x6, #0x9a0            ; @"CreateCGImage is buffer incorrect, buffer: %@, cropRect:%@"
    0x1dcb518c8 <+228>: mov    x0, x20
    0x1dcb518cc <+232>: mov    w4, #0x0
    0x1dcb518d0 <+236>: mov    w5, #0x0
    0x1dcb518d4 <+240>: bl     0x1dcc3d080               ; objc_msgSend$handleFailedAssertWithCondition:functionName:simulateCrash:showAlert:format:
    0x1dcb518d8 <+244>: bl     0x1dda63420
    0x1dcb518dc <+248>: mov    x20, #0x0
    0x1dcb518e0 <+252>: b      0x1dcb51954               ; <+368>
    0x1dcb518e4 <+256>: mov    x21, x0
    0x1dcb518e8 <+260>: adrp   x8, 76725
    0x1dcb518ec <+264>: ldr    x20, [x8, #0xae8]
    0x1dcb518f0 <+268>: mov    w0, #0x5241
    0x1dcb518f4 <+272>: movk   w0, #0x4247, lsl #16
    0x1dcb518f8 <+276>: bl     0x1dcbd1b80               ; VKMUIStringForCVPixelBufferType
    0x1dcb518fc <+280>: bl     0x1dda632f0
    0x1dcb51900 <+284>: mov    x22, x0
    0x1dcb51904 <+288>: mov    x0, x21
    0x1dcb51908 <+292>: bl     0x1dcbd1b80               ; VKMUIStringForCVPixelBufferType
    0x1dcb5190c <+296>: bl     0x1dda632f0
    0x1dcb51910 <+300>: mov    x21, x0
    0x1dcb51914 <+304>: stp    x22, x0, [sp]
    0x1dcb51918 <+308>: adrp   x2, 176
    0x1dcb5191c <+312>: add    x2, x2, #0xaf3            ; "__objc_no"
    0x1dcb51920 <+316>: adrp   x3, 176
    0x1dcb51924 <+320>: add    x3, x3, #0xafd            ; "-[VKCRemoveBackgroundResult createCGImage]"
    0x1dcb51928 <+324>: adrp   x6, 102753
    0x1dcb5192c <+328>: add    x6, x6, #0x980            ; @"Pixel format for createCGImage is incorrect, expected: %@, received: %@. Bailing"
    0x1dcb51930 <+332>: mov    x0, x20
    0x1dcb51934 <+336>: mov    w4, #0x0
    0x1dcb51938 <+340>: mov    w5, #0x0
    0x1dcb5193c <+344>: bl     0x1dcc3d080               ; objc_msgSend$handleFailedAssertWithCondition:functionName:simulateCrash:showAlert:format:
    0x1dcb51940 <+348>: bl     0x1dda63420
    0x1dcb51944 <+352>: bl     0x1dda63430
    0x1dcb51948 <+356>: mov    x20, #0x0
    0x1dcb5194c <+360>: mov    x0, x19
    0x1dcb51950 <+364>: bl     0x1dda62cf0
    0x1dcb51954 <+368>: mov    x0, x20
    0x1dcb51958 <+372>: ldp    x29, x30, [sp, #0x50]
    0x1dcb5195c <+376>: ldp    x20, x19, [sp, #0x40]
    0x1dcb51960 <+380>: ldp    x22, x21, [sp, #0x30]
    0x1dcb51964 <+384>: ldp    d9, d8, [sp, #0x20]
    0x1dcb51968 <+388>: ldp    d11, d10, [sp, #0x10]
    0x1dcb5196c <+392>: add    sp, sp, #0x60
    0x1dcb51970 <+396>: retab

伪代码逻辑:

cvpixbuffer = [VKCRemoveBackgroundResult pifbuffer]
getCropRect = [VKCRemoveBackgroundResult crioRect]
if cvpixbuffer ==nil || !VKMRectHasArea(getCropRect) {
    go other logic 
}
[cvpixbuffer retain ]
VKCRemoveBackgroundResult: _createCGImageFromBGRAPixelBuffer: pixbuffer: cropRect: cropRect]

由以上逻辑可以看到系统在 WKWebview 里面长按的逻辑是这样实现的:

WKWebview 跨进程访问了 从BitMap 里面截取了一个图片,并且传递给 VisionKitCore,然后 VisionKit 直接从这个区域获取了 buffer 然后创建了一张图片做一些行为。但是具体为什么 Crash 这时候已经很难排查, 因为这个 bitmap 的对象其实是很早创建的,只是在这里消费的时候挂掉了,有可能是因为提前释放,有可能是野指针,有可能是越界了~~ 因此尝试从其他地方找一些蛛丝马迹。

04 对比下各版本操作系统

既然线上观察到 iOS 16.2 以上就不会出现 Crash了,那可能真的是系统 Bug ,并且偷偷摸摸解决了。于是寻找几台高版本的手机进行实验。

iOS 16.2

长按 webview 后, __vk_cgImageRemoveBackgroundWithDownsizing_block_invoke函数传递过来的 x1 是 nil,而且针对 VKCRemoveBackgroundResult 所有符号打符号断点,发现长按webview时,不会命中任何逻辑。彻底和 iOS 16.1.1 的设备逻辑不一致了。

iOS 17

到了iOS 17 后又不一样了,VisionKitCore-[VKCRemoveBackgroundResult _createCGImageFromBGRAPixelBuffer:cropRect:]:改成了直接调用 visionkit 里面的 vk_cgImageFromPixelBuffer 创建。

VisionKitCore`-[VKCRemoveBackgroundResult _createCGImageFromBGRAPixelBuffer:cropRect:]:
->  0x204396388 <+0>:   cbz    x0, 0x2043963f8           ; <+112>
    0x20439638c <+4>:   pacibsp 
    0x204396390 <+8>:   stp    d11, d10, [sp, #-0x40]!
    0x204396394 <+12>:  stp    d9, d8, [sp, #0x10]
    0x204396398 <+16>:  stp    x20, x19, [sp, #0x20]
    0x20439639c <+20>:  stp    x29, x30, [sp, #0x30]
    0x2043963a0 <+24>:  add    x29, sp, #0x30
    0x2043963a4 <+28>:  fmov   d8, d3
    0x2043963a8 <+32>:  fmov   d9, d2
    0x2043963ac <+36>:  fmov   d10, d1
    0x2043963b0 <+40>:  fmov   d11, d0
    0x2043963b4 <+44>:  mov    x0, x1
    0x2043963b8 <+48>:  bl     0x20444d4e8               ; vk_cgImageFromPixelBuffer
    0x2043963bc <+52>:  mov    x19, x0
    0x2043963c0 <+56>:  fmov   d0, d11
    0x2043963c4 <+60>:  fmov   d1, d10
    0x2043963c8 <+64>:  fmov   d2, d9
    0x2043963cc <+68>:  fmov   d3, d8
    0x2043963d0 <+72>:  bl     0x206acc070
    0x2043963d4 <+76>:  mov    x20, x0
    0x2043963d8 <+80>:  mov    x0, x19
    0x2043963dc <+84>:  bl     0x206acc110
    0x2043963e0 <+88>:  mov    x0, x20
    0x2043963e4 <+92>:  ldp    x29, x30, [sp, #0x30]
    0x2043963e8 <+96>:  ldp    x20, x19, [sp, #0x20]
    0x2043963ec <+100>: ldp    d9, d8, [sp, #0x10]
    0x2043963f0 <+104>: ldp    d11, d10, [sp], #0x40
    0x2043963f4 <+108>: retab  
    0x2043963f8 <+112>: ret

iOS 16.1.1

blockInvoke 的时候也就是说 x1 一定是有值的,因此会走调用逻辑。

看看这个图片到底有什么用?

看上去绘制了一个低分辨率的缩略图,不知道有啥用。

继续看 :

看起来是回调到了 webkit,那webkit 是开源的,继续看——

找到对应设备存在的Webkit版本号:

代码在 ImageAnalysisUtilities.mm(地址:https://github.com/WebKit/WebKit/blob/releases/Apple/Safari-16.1-iOS-16.1.1/Source/WebKit/Platform/cocoa/ImageAnalysisUtilities.mm)

看上去做图像识别的,但是还不确定,继续搜谁调用了它 ,Github目前能直接搜索符号

基本确认是做图像物体识别的,并且有额外判断逻辑,没有 image 就 return。

WebContextMenuProxyMax.mm(地址:https://github.com/WebKit/WebKit/blob/main/Source/WebKit/UIProcess/mac/WebContextMenuProxyMac.mm#L334)

05 解决方案

基于前面的原因得到一些初步的结论:这个功能是 iOS 16 新增的Feature,也就是图像识别,在iOS 16中,系统相册也可以长按抠图,同时 系统直接给 WKWebview 里面的所有图片都增加了这个功能。

  1. iOS 16.0..<16.2 期间的所有版本都是有隐含 Bug 的。并不是开发者造成的
  2. _memmove. platformmemory 是非常底层常用的 API,不可能是这的问题。
  3. 大概率是 WKWebview 使用方式导致的,或者是 VisionKit 抠图能力有 Bug。但是由于多次异步加 XPC 调度已经很难确认。

第一种解决方案

我突然想到,既然是默认的行为,那是不是去掉这个行为就好了,同时在前面的的调用栈发现,当 -[VKCRemoveBackgroundResult createCGImage]创建图片识别时,系统也有判空逻辑,不会出现 Crash 那我不让它返回就好了。

于是我写个 demo 测试下Hook 掉这个行为, 用了下之前去家里的小猫照片。

- (void)viewDidLoad {
    WKWebView *webview = [[WKWebView alloc] initWithFrame:self.view.bounds configuration:config];
    [self.view addSubview:webview];
    [webview loadRequest:[NSURLRequest requestWithURL:[NSURL URLWithString:@"https://www.valiantcat.cn/dsn.html"]] ];
    UIButton *button = [UIButton buttonWithType:(UIButtonTypeCustom)];
    button.frame = CGRectMake(0, 0, 300, 200);
    [button setTitle:@"点击hook" forState:(UIControlStateNormal)];
    [button setTitleColor:UIColor.redColor forState:UIControlStateNormal];
    [self.view addSubview:button];
    button.center = self.view.center;
    [button addTarget:self action:@selector(hook) forControlEvents:(UIControlEventTouchUpInside)];
}

- (void)hook {
    Class class = objc_getClass("VKCRemoveBackgroundResult");
    SEL selector = sel_registerName("createCGImage");
    Method m = class_getInstanceMethod(class, selector);
    const char *type = method_getTypeEncoding(m);
    IMP newImp = imp_implementationWithBlock(^CGImageRef(id self, SEL cmd) {
        return NULL;
    });
    IMP oldImp = class_replaceMethod(class, selector, newImp, type);
    NSLog(@"%p", oldImp);
}

可以发现,在 hook 后,长按图片不再有抠图功能。

综上猜测,觉得这个方案可行,于是咨询了下详情和容器,他们并未对 WKWebView 的默认行为做额外处理,并不太会影响手机淘宝的业务。于是准备上线。

不过在上线前突然发现, 淘宝里扫一扫和拍立淘有 visionkit 的使用,觉得有风险,又陷入了困境。

Diff 发现

突然想到既然代码是开源,并且只在 iOS 16.0..<iOS16.2 之间的版本有,是不是可以看下系统怎么偷偷摸摸修了bug。果不其然发现了蛛丝马迹,系统在多处 copy 图片的逻辑中都涉及一个图片长度尺寸的变更(但是我在打符号断点的过程中强制修改这个函数的入参,并不能造成同样的Crash)但是经过这个diff,可以更大概率的确认 Bug 来自 WKWebView 而不是 VisionKit。

Diff 链接:https://github.com/WebKit/WebKit/compare/releases/Apple/Safari-16.1-iOS-16.1.2...releases/Apple/Safari-16.2-iOS-16.2?diff=split

第二种解决方案

继续尝试从 WKWebview 排查。长按触发堆栈查找有用信息。

通过阅读代码后发现这是 iOS 16 新增的功能,同时在源码中查找到了是如何添加的手势

突然发现原来在 iOS 16 以前 WKWebView 里面只有一个手势,当长按时,会触发保存图片菜单。

在 iOS 16 以后,WKWebview 添加了两个手势,竞争用户的长按动作。

直接添加符号断点-[WKContentView imageAnalysisGestureDidBegin:]并添加 Command thread return 中断逻辑。发现果然会命中超时逻辑。 结合代码可以看到超时的菜单中没有 copySubject 逻辑。

WKContentViewInteraction.mm(地址:https://github.com/WebKit/WebKit/blob/releases/Apple/Safari-16.1-iOS-16.1.1/Source/WebKit/UIProcess/ios/WKContentViewInteraction.mm)抠图识别成功后,具有 CopySubject 菜单。

因此新的方案为 Hook WKWebView 长按手势图片识别能力。

static void hook2(void) {

    Class class = objc_getClass("WKContentView");
    SEL selector = sel_registerName("imageAnalysisGestureDidBegin:");
    Method m = class_getInstanceMethod(class, selector);
    const char *type = method_getTypeEncoding(m);
    IMP newImp = imp_implementationWithBlock(^void(id self,UILongPressGestureRecognizer *ges) {
        // do nothing

    });
    if (m == NULL || class == NULL) {
        return;
    }
    IMP oldImp = class_replaceMethod(class, selector, newImp, type);
}
void hookStart() {
       if (@available (iOS 16.0, *)) {
        if (@available (iOS 16.2, *)) {
            return;
        } else {
           hook2();
        }
    }
}

线上观察

由于 Hook 长按手势后会导致 WKWebview 自带的抠图功能和文字 OCR 功能失效,担心有舆情风险。我们选择在手机淘宝安全气垫 SDK 实现此 Hook,并且通过放量修复。我们在 10.28.11 中通过放量来进行观察,发现Crash 从 500+ 跌倒了 67 (冷起生效,有时效性问题),可以确认修复有效,并且没有舆情反馈。全量后,经过观察,带有 Hook 方案的手机淘宝 Crash 基本跌 0,至此此 Bug 彻底修复。 日降低 Crash 1200+,影响设备 1000+ 。

06 总结

稳定性治理是一个长期的事情,由于前期同事的努力使得用户Crash 基本解决,一些操作系统的 Bug 逐步浮出水面,冲上排行榜,起初我并没有信心解决系统的 Bug,但是在定位过程中利用自己学习到的知识抽丝剥茧逐步定位到问题,也让自己对系统 Crash 不在畏惧,同时感谢同事在排查Bug 期间的经验输出和指导。 同时在定位过程中如有疑问或错误,欢迎讨论、指正。

07 参考资料

  1. iOS app crashed on iOS 16 https://developer.apple.com/forums/thread/718305
  2. The-ABI-for-ARM-64-bit-Architecture https://developer.arm.com/documentation/den0024/a/The-ABI-for-ARM-64-bit-Architecture/Register-use-in-the-AArch64-Procedure-Call-Standard/Parameters-in-general-purpose-registers
  3. WebKit https://github.com/WebKit/WebKit

Copyright© 2013-2020

All Rights Reserved 京ICP备2023019179号-8