实战还原 V8 bytenode 保护 JS(V8 字节码分析记录)
V8 字节码分析,简单写写在前辈们的基础上,又遇到些什么问题~(绝对不是我水不出很长的文章的问题)~ (如有错漏,敬请指正,因为是在弄完后过了很久才写的)
0x00 前言
拿到一个需要逆向分析的 JS start.js。
目标环境:
- Node.js:
16.14.0 - 对应 V8:
9.4.146.24-node.20(flag hashed0ab240)
核心代码如下:
const vm = require('vm');const v8 = require('v8');const zlib = require('zlib');const fs = require('fs');const path = require('path');const Module = require('module');v8.setFlagsFromString('--no-lazy');v8.setFlagsFromString('--no-flush-bytecode');global.generateScript=function(cachedData, filename) { cachedData = zlib.brotliDecompressSync(cachedData); fixBytecode(cachedData); const length = readSourceHash(cachedData); let dummyCode = ''; if (length > 1) { dummyCode = '"' + '\u200b'.repeat(length - 2) + '"'; } const script = new vm.Script(dummyCode, { cachedData, filename }); if (script.cachedDataRejected) { throw new Error(''); } return script;}global.compileCode = function(javascriptCode, compress) { const script = new vm.Script(javascriptCode, { produceCachedData: true }); let bytecodeBuffer = (script.createCachedData && script.createCachedData.call) ? script.createCachedData() : script.cachedData; if (compress) bytecodeBuffer = zlib.brotliCompressSync(bytecodeBuffer); return bytecodeBuffer;};global.fixBytecode = function(bytecodeBuffer) { const dummyBytecode = compileCode(''); dummyBytecode.subarray(12, 16).copy(bytecodeBuffer, 12);};global.readSourceHash = function(bytecodeBuffer) { return bytecodeBuffer.subarray(8, 12).reduce((sum, number, power) => sum += number * Math.pow(256, power), 0);};try { Module._extensions['.jsc'] = function(fileModule, filename) { const data = fs.readFileSync(filename, 'utf8') const bytecodeBuffer = Buffer.from(data, 'base64'); const script = generateScript(bytecodeBuffer, filename);
function require(id) { return fileModule.require(id); } require.resolve = function(request, options) { return Module._resolveFilename(request, fileModule, false, options); }; if (process.main) { require.main = process.main; } require.extensions = Module._extensions; require.cache = Module._cache; const compiledWrapper = script.runInThisContext({ filename: filename, lineOffset: 0, columnOffset: 0, displayErrors: true }); const dirname = path.dirname(filename); const args = [ fileModule.exports, require, fileModule, filename, dirname, process, global ]; return compiledWrapper.apply(fileModule.exports, args); };} catch (ex) { console.error('xrequire:' + ex.message);}require("${codeScript}")经过搜索资料发现:
这就是V8 cachedData / bytenode 方案。
0x01 第一次尝试:View8
定位到 bytecode 后,我先上了 View8:
https://github.com/suleram/View8
然后配套 9.4.146.24.exe 去跑反编译。
发现问题:
- 输出一点代码后自动崩溃退出。
- View8 因 d8 崩溃只导出 23 个外围函数,关键的函数基本丢失。
当时第一反应是项目太久没维护,又去github找别的项目。
0x02 第二次尝试:jsc2js
又看了 jsc2js:
https://github.com/xqy2006/jsc2js
这个仓库新一些,也有 patch + CI 体系。
把 patch 套到 v8 9.4.146.24,结果仍然和第一轮差不多。
这时候基本就炸毛了~(先躺一会)~:
字节码本来就不太好 hook,现成工具又不稳定。
于是查阅相关资料:
- https://www.aynakeya.com/articles/ctf/a-quick-guide-to-disassemble-v8-bytecode/
- https://rce.moe/2025/01/07/v8-bytecode-decompiler/
V8 bytecode 就是 V8 自己序列化的一段内部数据。
想稳定拿结果,必须回到 V8 源码层改输出逻辑。 不同 V8 版本在字节码层差异很大,尤其是 opcode、参数语义、寄存器布局。
0x03 第三次尝试:拉 V8 仓库
@echo offset PATH=E:\Dev\SDKs\depot_tools;%PATH%set DEPOT_TOOLS_WIN_TOOLCHAIN=0
mkdir v8_941cd v8_941
echo solutions = [{ > .gclientecho "name": "v8", >> .gclientecho "url": "https://chromium.googlesource.com/v8/v8.git@9.4.146.24", >> .gclientecho "deps_file": "DEPS", >> .gclientecho "managed": False, >> .gclientecho "custom_deps": {}, >> .gclientecho }] >> .gclient
git clone --depth=1 --branch 9.4.146.24 https://chromium.googlesource.com/v8/v8.git v8gclient sync -D --no-history0x04 patch + 编译参数
先patch,再单独构建 d8:
cd /d <dir>\v8_941\v8python ..\..\apply_patches_v8_94.py .gn gen out/releaseninja -C out/release d8构建参数:
dcheck_always_on = falseis_clang = falseis_component_build = falseis_debug = falsetarget_cpu = "x64"use_custom_libcxx = falsev8_monolithic = truev8_use_external_startup_data = falsev8_static_library = truev8_enable_disassembler = truev8_enable_object_print = truetreat_warnings_as_errors = falsev8_enable_pointer_compression = falsev8_enable_31bit_smis_on_64bit_arch = falsev8_enable_lite_mode = falsev8_enable_i18n_support = truev8_enable_webassembly = true0x05 改动阶段
真男人就要硬刚v8,部分diff我就不贴出来了,把问题和思路贴一下,欸嘿~
0x06 问题一:cachedData 反序列化被拒绝
CodeSerializer::Deserialize 默认会严检 magic/version/flags/hash/checksum/source hash。
如果任何一项没通过,它会直接 reject 掉这份缓存,返回空对象。
src/snapshot/code-serializer.cc:
@@ SerializedCodeData::SanityCheck- SanityCheckResult result = SanityCheckWithoutSource();- if (result != CHECK_SUCCESS) return result;- ...- return CHECK_SUCCESS;+ return SerializedCodeData::SanityCheckResult::CHECK_SUCCESS;
@@ SerializedCodeData::SanityCheckWithoutSource- if (this->size_ < kHeaderSize) return INVALID_HEADER;- uint32_t magic_number = GetMagicNumber();- if (magic_number != kMagicNumber) return MAGIC_NUMBER_MISMATCH;- ...- if (Checksum(ChecksummedContent()) != c) return CHECKSUM_MISMATCH;- return CHECK_SUCCESS;+ return SerializedCodeData::SanityCheckResult::CHECK_SUCCESS;src/snapshot/deserializer.cc:
@@ Deserializer<IsolateT>::Deserializer- CHECK_EQ(magic_number_, SerializedData::kMagicNumber);+ /*+ CHECK_EQ(magic_number_, SerializedData::kMagicNumber);+ */@@ ReadSingleBytecodeData+ std::fprintf(stderr, "[FATAL] Unknown serializer bytecode: 0x%02x\n", data);0x07 问题二:反汇编/打印阶段栈溢出
这里就是之前view8打印不出来的主要问题:
BytecodeArray::Disassemble- 打常量池
- 命中
SharedFunctionInfo SharedFunctionInfoPrint- 再次
Disassemble - 深度叠加,最终栈爆
- 改动:TLS guard + SEH
src/diagnostics/objects-printer.cc:
thread_local int g_in_bytecode_disasm = 0;... ++g_in_bytecode_disasm; hbc->Disassemble(*(c->os)); --g_in_bytecode_disasm;@@ SharedFunctionInfoPrint PrintSourceCode(os); // PrintSourceCode(os); int exc = SehWrapCall(DoBcDisasm, &ctx); if (exc != 0) { os << "<BytecodeArray Disassemble CRASHED ...>"; }src/objects/objects.cc:
extern thread_local int g_in_bytecode_disasm;void SafePrintSharedFunctionInfo(...);void SafePrintFixedArray(...);... case SHARED_FUNCTION_INFO_TYPE: if (g_in_bytecode_disasm > 0) { break; } SafePrintSharedFunctionInfo(shared, os); case FIXED_ARRAY_TYPE: SafePrintFixedArray(FixedArray::cast(*this), os);- 对应:d8 入口改成 BFS 平铺
src/d8/d8.cc:
void Shell::LoadBytecode(...)std::deque<i::Handle<i::SharedFunctionInfo>> queue;std::unordered_set<i::Address> seen;while (!queue.empty()) { ... }global_template->Set(isolate, "loadBytecode", FunctionTemplate::New(isolate, LoadBytecode));0x08 稳定性
修 Handle 生命周期和字节码迭代稳定性
i::HandleScope inner_scope(isolateInternal); // No inner HandleScope here — child handles stored in queue/all_sfis // must survive across iterations. outer_scope keeps them all alive.... i::BytecodeArray handle_storage = *hbca; i::Handle<i::BytecodeArray> handle( reinterpret_cast<i::Address*>(&handle_storage)); i::interpreter::BytecodeArrayIterator iterator(handle); // Use hbca directly — it's a proper Handle rooted in print_scope. i::interpreter::BytecodeArrayIterator iterator(hbca);... // Re-derive base_address each iteration (GC-safe) i::Address base_address = hbca->GetFirstBytecodeAddress();调试可见性 + SFI 入队条件
printf("[DBG] root SFI ptr = 0x%p\n", reinterpret_cast<void*>(root->ptr())); printf("[DBG] root HasBytecodeArray = %d\n", root_has_bc);... printf("[DBG] cp[%d] raw=0x%p smi=%d\n", cp_index, reinterpret_cast<void*>(obj.ptr()), obj.IsSmi());... if (obj.IsSharedFunctionInfo()) { if (!obj.IsSmi() && obj.IsSharedFunctionInfo()) {常量池可读性增强
const int kMaxLiteralElementsToPrint = 1024;std::function<void(i::Object, int)> print_compact_obj;...if (value.IsArrayBoilerplateDescription()) { ... }if (value.IsFixedArray()) { ... }if (value.IsFixedDoubleArray()) { ... }...print_compact_obj(obj, 0);其它
src/objects/string.cc:
if (len > kMaxShortPrintLength) {// if (len > kMaxShortPrintLength) {...accumulator->Add("%c", c);accumulator->Add("\\u%04x", c);0x09 将反编译结果初步还原成可读js
- 喂给
jsc2js/View8(这里你可能要手动改一下,懒得贴了),我记得好像还要处理一下常量池?
来来回回折腾了三天,最开始啥都不懂硬生生肝出来了,基本都是需要什么去问什么去学什么 hhhhhhh 现在还真是方便啊
写完跑路
部分信息可能已经过时









