Chapter 11Project Dynamic String Builder

项目

概述

这个项目将上一章的原始分配器模式转化为一个专注的实用工具:一个动态字符串构建器,它可以拼接报告、日志和模板,而无需在代码中到处散布[]u8的簿记。通过包装std.ArrayList(u8),我们保持了摊销的O(1)追加操作,暴露了增长指标以供调试,并在缓冲区准备好后轻松地将所有权交给调用者;参见10array_list.zig

真实的程序不仅仅在一个分配器上运行,因此我们还对构建器进行了压力测试,测试对象包括栈缓冲区、arena和通用分配器。结果是一个可以放入CLI、模板任务或日志子系统的模式,每当你需要灵活但明确的字符串组装时;参见heap.zig

学习目标

  • 制作一个可重用的StringBuilder包装器,它在依赖std.ArrayList(u8)进行存储的同时跟踪增长事件;参见string_builder.zig
  • 通过std.io.GenericWriter驱动构建器,使格式化打印与普通追加操作组合;参见writer.zig
  • 使用std.heap.stackFallback在栈缓冲区、arena和堆分配器之间为动态文本工作流做出选择。

构建器蓝图

核心实用程序位于string_builder.zig中:一个薄结构体,存储调用者的分配器、一个std.ArrayList(u8)缓冲区,以及一些用于追加、格式化和增长遥测的助手。每个操作都通过你选择的分配器进行,因此给构建器传递一个不同的分配器会立即改变其行为。

渲染结构化摘要

为了看到构建器的实际效果,以下程序构成一份简短的报告,捕获长度/容量/增长的快照,并将一个拥有的切片返回给调用者。构建器将清理工作推迟到defer builder.deinit(),因此即使toOwnedSlice移动了缓冲区,周围的作用域也保持无泄漏。

Zig
const std = @import("std");
const builder_mod = @import("string_builder.zig");
const StringBuilder = builder_mod.StringBuilder;

pub fn main() !void {
    // Initialize a general-purpose allocator with leak detection
    // This allocator tracks all allocations and reports leaks on deinit
    // 初始化一个带有泄漏检测的通用分配器
    // 此分配器跟踪所有分配并在 deinit 时报告泄漏
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer {
        if (gpa.deinit() == .leak) std.log.err("leaked allocations detected", .{});
    }
    const allocator = gpa.allocator();

    // Create a StringBuilder with 64 bytes of initial capacity
    // Pre-allocating reduces reallocation overhead for known content size
    // 创建一个带有 64 字节初始容量的 StringBuilder
    // 预分配可以减少已知内容大小的重新分配开销
    var builder = try StringBuilder.initCapacity(allocator, 64);
    defer builder.deinit();

    // Build report header using basic string concatenation
    // 使用基本的字符串连接构建报告头部
    try builder.append("Report\n======\n");
    try builder.append("source: dynamic builder\n\n");

    // Define structured data for report generation
    // Each item represents a category with its count
    // 定义用于报告生成的结构化数据
    // 每个项目代表一个类别及其计数
    const items = [_]struct {
        name: []const u8,
        count: usize,
    }{
        .{ .name = "widgets", .count = 7 },
        .{ .name = "gadgets", .count = 13 },
        .{ .name = "doodads", .count = 2 },
    };

    // Obtain a writer interface for formatted output
    // This allows using std.fmt.format-style print operations
    // 获取一个用于格式化输出的写入器接口
    // 这允许使用 std.fmt.format 样式的打印操作
    var writer = builder.writer();
    for (items, 0..) |item, index| {
        // Format each item as a numbered list entry with name and count
        // 将每个项目格式化为带编号的列表条目,包含名称和计数
        try writer.print("* {d}. {s}: {d}\n", .{ index + 1, item.name, item.count });
    }

    // Capture allocation statistics before adding summary
    // Snapshot preserves metrics for analysis without affecting builder state
    // 在添加摘要之前捕获分配统计信息
    // 快照保留指标用于分析,而不影响构建器状态
    const snapshot = builder.snapshot();
    try writer.print("\nsummary: appended {d} entries\n", .{items.len});

    // Transfer ownership of the constructed string to caller
    // After this call, builder is reset and cannot be reused without re-initialization
    // 将构造的字符串的所有权转移给调用者
    // 在此调用之后,构建器被重置,不能在不重新初始化的情况下重用
    const result = try builder.toOwnedSlice();
    defer allocator.free(result);

    // Display the generated report alongside allocation statistics
    // 显示生成的报告以及分配统计信息
    std.debug.print("{s}\n---\n{any}\n", .{ result, snapshot });
}
运行
Shell
$ zig run builder_core.zig
输出
Shell
Report
======
source: dynamic builder

* 1. widgets: 7
* 2. gadgets: 13
* 3. doodads: 2

summary: appended 3 entries

---
.{ .length = 88, .capacity = 224, .growth_events = 1 }

snapshot()足够廉价,可以在代码中随处使用,以便在你需要确认给定工作负载保持在特定容量范围内时进行确认。

运行中的分配器

分配器定义了构建器在压力下的行为:stackFallback提供极快的栈写入,直到缓冲区溢出;arena让你一次性批量释放整个代;而GPA则使泄漏检测保持在游戏中。本节演示了相同的构建器代码如何适应不同的分配策略。

带arena安全网的栈缓冲区

这里我们将构建器包装在一个由栈支持的分配器中,一旦256字节的临时空间用完,它就会回退到arena。输出显示了小报告如何保持在栈缓冲区内,而较大的报告则溢出到arena并增长了四次;参见10

Zig
const std = @import("std");
const builder_mod = @import("string_builder.zig");
const StringBuilder = builder_mod.StringBuilder;
const Stats = builder_mod.Stats;

/// Container for a generated report and its allocation statistics
/// 生成的报告及其分配统计信息的容器
const Report = struct {
    text: []u8,
    stats: Stats,
};

/// Builds a text report with random sample data
/// Demonstrates StringBuilder usage with various allocator strategies
/// 构建带有随机样本数据的文本报告
/// 演示 StringBuilder 在各种分配器策略下的使用
fn buildReport(allocator: std.mem.Allocator, label: []const u8, sample_count: usize) !Report {
    // Initialize StringBuilder with the provided allocator
    // 使用提供的分配器初始化 StringBuilder
    var builder = StringBuilder.init(allocator);
    defer builder.deinit();

    // Write report header
    // 写入报告头部
    try builder.append("label: ");
    try builder.append(label);
    try builder.append("\n");

    // Initialize PRNG with a seed that varies based on sample_count
    // Ensures reproducible but different sequences for different report sizes
    // 根据 sample_count 初始化伪随机数生成器,使用可变的种子
    // 确保不同报告大小的可重现但不同的序列
    var prng = std.Random.DefaultPrng.init(0x5eed1234 ^ @as(u64, sample_count));
    var random = prng.random();

    // Generate random sample data and accumulate totals
    // 生成随机样本数据并累计总计
    var total: usize = 0;
    var writer = builder.writer();
    for (0..sample_count) |i| {
        // Each sample represents a random KiB allocation between 8-64
        // 每个样本代表 8-64 之间的随机 KiB 分配
        const chunk = random.intRangeAtMost(u32, 8, 64);
        total += chunk;
        try writer.print("{d}: +{d} KiB\n", .{ i, chunk });
    }

    // Write summary line with aggregated statistics
    // 写入带有聚合统计信息的摘要行
    try writer.print("total: {d} KiB across {d} samples\n", .{ total, sample_count });

    // Capture allocation statistics before transferring ownership
    // 在转移所有权之前捕获分配统计信息
    const stats = builder.snapshot();

    // Transfer ownership of the built string to the caller
    // 将构建字符串的所有权转移给调用者
    const text = try builder.toOwnedSlice();
    return .{ .text = text, .stats = stats };
}

pub fn main() !void {
    // Arena allocator will reclaim all allocations at once when deinit() is called
    // Arena 分配器将在调用 deinit() 时一次回收所有分配
    var arena = std.heap.ArenaAllocator.init(std.heap.page_allocator);
    defer arena.deinit();

    // Small report: 256-byte stack buffer should be sufficient
    // stackFallback tries stack first, falls back to arena if needed
    // 小报告:256 字节堆栈缓冲区应该足够
    // stackFallback 首先尝试堆栈,如果需要则回退到 arena
    var fallback_small = std.heap.stackFallback(256, arena.allocator());
    const small_allocator = fallback_small.get();
    const small = try buildReport(small_allocator, "stack-only", 6);
    defer small_allocator.free(small.text);

    // Large report: 256-byte stack buffer will overflow, forcing arena allocation
    // Demonstrates fallback behavior when stack space is insufficient
    // 大报告:256 字节堆栈缓冲区将溢出,强制 arena 分配
    // 演示堆栈空间不足时的回退行为
    var fallback_large = std.heap.stackFallback(256, arena.allocator());
    const large_allocator = fallback_large.get();
    const large = try buildReport(large_allocator, "needs-arena", 48);
    defer large_allocator.free(large.text);

    // Display both reports with their allocation statistics
    // Stats will reveal which allocator strategy was used (stack vs heap)
    // 显示两个报告及其分配统计信息
    // 统计信息将显示使用了哪种分配器策略(堆栈 vs 堆)
    std.debug.print("small buffer ->\n{s}stats: {any}\n\n", .{ small.text, small.stats });
    std.debug.print("large buffer ->\n{s}stats: {any}\n", .{ large.text, large.stats });
}
运行
Shell
$ zig run allocator_fallback.zig
输出
Shell
small buffer ->
label: stack-only
0: +40 KiB
1: +16 KiB
2: +13 KiB
3: +31 KiB
4: +44 KiB
5: +9 KiB
total: 153 KiB across 6 samples
stats: .{ .length = 115, .capacity = 128, .growth_events = 1 }

large buffer ->
label: needs-arena
0: +35 KiB
1: +29 KiB
2: +33 KiB
3: +14 KiB
4: +33 KiB
5: +20 KiB
6: +36 KiB
7: +21 KiB
8: +11 KiB
9: +58 KiB
10: +22 KiB
11: +53 KiB
12: +21 KiB
13: +41 KiB
14: +30 KiB
15: +20 KiB
16: +10 KiB
17: +39 KiB
18: +46 KiB
19: +59 KiB
20: +33 KiB
21: +8 KiB
22: +30 KiB
23: +22 KiB
24: +28 KiB
25: +32 KiB
26: +48 KiB
27: +50 KiB
28: +61 KiB
29: +53 KiB
30: +30 KiB
31: +27 KiB
32: +42 KiB
33: +24 KiB
34: +32 KiB
35: +58 KiB
36: +60 KiB
37: +27 KiB
38: +40 KiB
39: +17 KiB
40: +50 KiB
41: +50 KiB
42: +42 KiB
43: +54 KiB
44: +61 KiB
45: +10 KiB
46: +25 KiB
47: +50 KiB
total: 1695 KiB across 48 samples
stats: .{ .length = 618, .capacity = 1040, .growth_events = 4 }

stackFallback(N, allocator)每个实例只允许一次对.get()的调用;当你需要多个并发构建器时,请启动一个新的回退包装器。

增长规划

构建器记录了容量变化的次数,这非常适合分析“盲目追加”和“一次性预先调整大小”之间的差异。下一个例子显示了两种路径产生相同的文本,而计划好的版本将增长保持在单次重新分配。

预先调整大小 vs 天真追加

Zig
const std = @import("std");
const builder_mod = @import("string_builder.zig");
const StringBuilder = builder_mod.StringBuilder;
const Stats = builder_mod.Stats;

/// Container for built string and its allocation statistics
/// 已构建字符串及其分配统计信息的容器
const Result = struct {
    text: []u8,
    stats: Stats,
};

/// Calculates the total byte length of all string segments
/// Used to pre-compute capacity requirements for efficient allocation
/// 计算所有字符串段的总字节长度
/// 用于预先计算高效分配的容量需求
fn totalLength(parts: []const []const u8) usize {
    var sum: usize = 0;
    for (parts) |segment| sum += segment.len;
    return sum;
}

/// Builds a formatted string without pre-allocating capacity
/// Demonstrates the cost of incremental growth through multiple reallocations
/// Separators are spaces, with newlines every 8th segment
/// 在不预分配容量的情况下构建格式化字符串
/// 演示了通过多次重新分配进行增量增长的开销
/// 分隔符为空格,每 8 个段换行一次
fn buildNaive(allocator: std.mem.Allocator, parts: []const []const u8) !Result {
    // Initialize with default capacity (0 bytes)
    // Builder will grow dynamically as content is appended
    // 使用默认容量(0 字节)初始化
    // 构建器将随着内容追加而动态增长
    var builder = StringBuilder.init(allocator);
    defer builder.deinit();

    for (parts, 0..) |segment, index| {
        // Each append may trigger reallocation if capacity is insufficient
        // 如果容量不足,每次追加都可能触发重新分配
        try builder.append(segment);
        if (index + 1 < parts.len) {
            // Insert newline every 8 segments, space otherwise
            // 每 8 个段插入换行符,否则插入空格
            const sep = if ((index + 1) % 8 == 0) "\n" else " ";
            try builder.append(sep);
        }
    }

    // Capture allocation statistics showing multiple growth operations
    // 捕获显示多次增长操作的分配统计信息
    const stats = builder.snapshot();
    const text = try builder.toOwnedSlice();
    return .{ .text = text, .stats = stats };
}

/// Builds a formatted string with pre-calculated capacity
/// Demonstrates performance optimization by eliminating reallocations
/// Produces identical output to buildNaive but with fewer allocations
/// 使用预计算容量构建格式化字符串
/// 通过消除重新分配来演示性能优化
/// 生成与 buildNaive 相同的输出,但分配次数更少
fn buildPlanned(allocator: std.mem.Allocator, parts: []const []const u8) !Result {
    var builder = StringBuilder.init(allocator);
    defer builder.deinit();

    // Calculate exact space needed: all segments plus separator count
    // Separators: n-1 for n parts (no separator after last segment)
    // 计算所需的确切空间:所有段加上分隔符数量
    // 分隔符:n 个部分对应 n-1(最后一个段后没有分隔符)
    const separators = if (parts.len == 0) 0 else parts.len - 1;
    // Pre-allocate all required capacity in a single allocation
    // 在一次分配中预分配所有所需的容量
    try builder.ensureUnusedCapacity(totalLength(parts) + separators);

    for (parts, 0..) |segment, index| {
        // Append operations never reallocate due to pre-allocation
        // 由于预分配,追加操作从不重新分配
        try builder.append(segment);
        if (index + 1 < parts.len) {
            // Insert newline every 8 segments, space otherwise
            // 每 8 个段插入换行符,否则插入空格
            const sep = if ((index + 1) % 8 == 0) "\n" else " ";
            try builder.append(sep);
        }
    }

    // Capture statistics showing single allocation with no growth
    // 捕获显示单次分配且无增长的统计信息
    const stats = builder.snapshot();
    const text = try builder.toOwnedSlice();
    return .{ .text = text, .stats = stats };
}

pub fn main() !void {
    // Initialize leak-detecting allocator to verify proper cleanup
    // 初始化泄漏检测分配器以验证正确的清理
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer {
        if (gpa.deinit() == .leak) std.log.err("leaked allocations detected", .{});
    }
    const allocator = gpa.allocator();

    // Sample data: 32 Greek letters and astronomy terms
    // Large enough to demonstrate multiple reallocations in naive approach
    // 示例数据:32 个希腊字母和天文学术语
    // 足够大以演示朴素方法中的多次重新分配
    const segments = [_][]const u8{
        "alpha",
        "beta",
        "gamma",
        "delta",
        "epsilon",
        "zeta",
        "eta",
        "theta",
        "iota",
        "kappa",
        "lambda",
        "mu",
        "nu",
        "xi",
        "omicron",
        "pi",
        "rho",
        "sigma",
        "tau",
        "upsilon",
        "phi",
        "chi",
        "psi",
        "omega",
        "aurora",
        "borealis",
        "cosmos",
        "nebula",
        "quasar",
        "pulsar",
        "singularity",
        "zenith",
    };

    // Build string without capacity planning
    // Stats will show multiple allocations and growth operations
    // 在没有容量规划的情况下构建字符串
    // 统计信息将显示多次分配和增长操作
    const naive = try buildNaive(allocator, &segments);
    defer allocator.free(naive.text);

    // Build string with exact capacity pre-allocation
    // Stats will show single allocation with no growth
    // 使用精确容量预分配构建字符串
    // 统计信息将显示单次分配且无增长
    const planned = try buildPlanned(allocator, &segments);
    defer allocator.free(planned.text);

    // Compare allocation statistics side-by-side
    // Demonstrates the efficiency gain from capacity planning
    // 并排比较分配统计信息
    // 演示容量规划的效率提升
    std.debug.print(
        "naive -> {any}\n{s}\n\nplanned -> {any}\n{s}\n",
        .{ naive.stats, naive.text, planned.stats, planned.text },
    );
}
运行
Shell
$ zig run growth_comparison.zig
输出
Shell
naive -> .{ .length = 186, .capacity = 320, .growth_events = 2 }
alpha beta gamma delta epsilon zeta eta theta
iota kappa lambda mu nu xi omicron pi
rho sigma tau upsilon phi chi psi omega
aurora borealis cosmos nebula quasar pulsar singularity zenith

planned -> .{ .length = 186, .capacity = 320, .growth_events = 1 }
alpha beta gamma delta epsilon zeta eta theta
iota kappa lambda mu nu xi omicron pi
rho sigma tau upsilon phi chi psi omega
aurora borealis cosmos nebula quasar pulsar singularity zenith

增长计数取决于分配器策略——切换到固定缓冲区或arena会改变容量扩展的时间。比较配置文件时,请同时跟踪统计数据和所选的分配器。

注意与警告

  • toOwnedSlice将所有权交给调用者;请记住用传递给StringBuilder的同一个分配器来释放它。
  • stackFallback每次调用.get()时都会清零临时缓冲区;如果需要持久重用,请保留返回的分配器,而不是重复调用.get()
  • reset()清除内容但保留容量,因此在需要紧凑循环中重建字符串的热路径中优先使用它。

练习

  • 用一个由std.io.Writer.Allocating驱动的appendFormat(comptime fmt, args)助手来扩展StringBuilder,然后将其分配与重复的writer.print调用进行比较。
  • 构建一个CLI,将JSON记录流式传输到构建器中,通过命令行标志在GPA和arena分配器之间切换;参见05
  • 通过将构建器管道传输到std.fs.File.writer()来向磁盘发出Markdown报告,并验证最终切片与写入的字节匹配;参见06fs.zig

替代方案和边缘情况

  • 非常大的字符串可能会分配数GB的内存——保护输入或在length超过安全阈值时流式传输到磁盘。
  • 组合多个构建器时,共享一个arena或GPA,以使所有权链保持简单,并使泄漏检测保持准确。
  • 如果延迟比分配更重要,请直接向缓冲写入器发出,并仅对真正需要随机访问编辑的部分使用构建器;参见09

Help make this chapter better.

Found a typo, rough edge, or missing explanation? Open an issue or propose a small improvement on GitHub.