Skip to content

buffer: improve performance of multiple Buffer operations#61871

Merged
nodejs-github-bot merged 11 commits intonodejs:mainfrom
thisalihassan:buffer-perf-improvements
Mar 27, 2026
Merged

buffer: improve performance of multiple Buffer operations#61871
nodejs-github-bot merged 11 commits intonodejs:mainfrom
thisalihassan:buffer-perf-improvements

Conversation

@thisalihassan
Copy link
Copy Markdown
Contributor

@thisalihassan thisalihassan commented Feb 17, 2026

Multiple performance improvements to Buffer operations, verified with benchmarks (15-30 runs, comparing old vs new binaries built from same tree).

Buffer.copyBytesFrom() (+100-210%)
Avoid intermediate TypedArrayPrototypeSlice allocation by calculating byte offsets directly into the source TypedArray's underlying ArrayBuffer.

Buffer.prototype.fill("t", "ascii") (+26-37%)

ASCII indexOf (+14-46%)
Call indexOfString directly for ASCII encoding instead of first converting the search value to a Buffer via fromStringFast and then calling indexOfBuffer. ASCII and Latin-1 share the same byte values for characters 0-127.

swap16/32/64 (+3-38%)
Add V8 Fast API C++ functions (FastSwap16/32/64) alongside the existing slow path. Largest gains at len=256 (+35%).

Benchmark results

Key results (15-30 runs, *** = p < 0.001):

Benchmark Improvement
copyBytesFrom (offset, Uint8Array, len=256) +210% ***
copyBytesFrom (offset+length, Uint8Array, len=256) +206% ***
swap16 len=256 +38% ***
fill("t", "ascii") size=8192 +37% ***
indexOf ASCII 'Alice' +46% ***
indexOf ASCII '@' +31% ***
fill("t", "ascii") size=65536 +26% ***
swap64 len=768 aligned +12% ***

Attaching Details below:
detail.pdf -- visual breakdown
buffer-benchmark-all-rebased.csv -- raw benchmark CSV
compare-R-output.txt-- full output

@nodejs-github-bot
Copy link
Copy Markdown
Collaborator

Review requested:

  • @nodejs/performance

@nodejs-github-bot nodejs-github-bot added buffer Issues and PRs related to the buffer subsystem. c++ Issues and PRs that require attention from people who are familiar with C++. needs-ci PRs that need a full CI run. labels Feb 17, 2026
@thisalihassan thisalihassan force-pushed the buffer-perf-improvements branch from d2ba38f to 495feb5 Compare February 17, 2026 21:41
Comment on lines +1210 to +1217
void FastSwap16(Local<Value> receiver,
Local<Value> buffer_obj,
// NOLINTNEXTLINE(runtime/references)
FastApiCallbackOptions& options) {
HandleScope scope(options.isolate);
ArrayBufferViewContents<char> buffer(buffer_obj);
CHECK(nbytes::SwapBytes16(const_cast<char*>(buffer.data()), buffer.length()));
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These fast callbacks are non-identical to the conventional callbacks they shadow.

  • The existing callbacks validate their argument and throws to JS if invalid, whereas your fast callbacks hard-crash the process. It might be better to validate in the JS layer, then use the same unwrapping logic on both sides.
  • Your fast callback cannot have a different return convention to the conventional callback. You will need to remove the return value from the conventional callback.

Copy link
Copy Markdown
Member

@ChALkeR ChALkeR Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was removing the validation instead of moving it to js intentional?

Also fast paths can keep the validation and fall back to slow i think, so we can still validate on the src side?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also fast paths can keep the validation and fall back to slow i think, so we can still validate on the src side?

This is outdated, the fast API doesn't use fallback any more (since Node.js v23.x).

However, any validation in a JS wrapper should be shadowed by a CHECK or DCHECK in the C++ binding.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SPREAD_BUFFER_ARG already contains CHECK((val)->IsArrayBufferView()) on all Slow & Fast swap methods

Comment on lines +1210 to +1217
void FastSwap16(Local<Value> receiver,
Local<Value> buffer_obj,
// NOLINTNEXTLINE(runtime/references)
FastApiCallbackOptions& options) {
HandleScope scope(options.isolate);
ArrayBufferViewContents<char> buffer(buffer_obj);
CHECK(nbytes::SwapBytes16(const_cast<char*>(buffer.data()), buffer.length()));
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fast callbacks should include debug tracking and call tests.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sharing these I will update the code

@Renegade334 Renegade334 added performance Issues and PRs related to the performance of Node.js. needs-benchmark-ci PR that need a benchmark CI run. labels Feb 17, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Feb 17, 2026

Codecov Report

❌ Patch coverage is 92.39130% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.69%. Comparing base (7547e79) to head (85fb7f1).
⚠️ Report is 36 commits behind head on main.

Files with missing lines Patch % Lines
src/node_buffer.cc 72.00% 0 Missing and 7 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #61871      +/-   ##
==========================================
- Coverage   91.60%   89.69%   -1.91%     
==========================================
  Files         337      676     +339     
  Lines      140745   206763   +66018     
  Branches    21802    39605   +17803     
==========================================
+ Hits       128925   185452   +56527     
- Misses      11595    13444    +1849     
- Partials      225     7867    +7642     
Files with missing lines Coverage Δ
lib/buffer.js 99.16% <100.00%> (+12.78%) ⬆️
src/node_buffer.cc 68.19% <72.00%> (ø)

... and 459 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@thisalihassan thisalihassan force-pushed the buffer-perf-improvements branch 4 times, most recently from 1395d2f to 01ba74f Compare February 17, 2026 23:42
FastApiCallbackOptions& options) {
TRACK_V8_FAST_API_CALL("buffer.swap16");
HandleScope scope(options.isolate);
ArrayBufferViewContents<char> buffer(buffer_obj);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ArrayBufferViewContents is wrong here, as buffer.data() may be a stack-allocated copy of the byte data rather than the data itself. SPREAD_BUFFER_ARG is the correct macro to use here, as per the conventional callback.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Renegade334 replaced ArrayBufferViewContents with SPREAD_BUFFER_ARG in all three fast swap callbacks

Copy link
Copy Markdown
Member

@ChALkeR ChALkeR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For toHex, wait until #61609, which improves native perf significantly (more than Uint8Array.prototype.toHex)

See also #60249 (comment)

} else if (value.length === 1) {
// Fast path: If `value` fits into a single byte, use that numeric value.
if (normalizedEncoding === 'utf8') {
if (normalizedEncoding === 'utf8' || normalizedEncoding === 'ascii') {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, ascii behaves exactly like latin1
Unsure if by design or accidentally

Copy link
Copy Markdown
Contributor Author

@thisalihassan thisalihassan Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this is safe by design I am just extending the existing single byte numeric optimization to cover ASCII, since the guard already constrains it to the valid ASCII range.

@thisalihassan
Copy link
Copy Markdown
Contributor Author

Note on toBase64 / toBase64url:

I also tried replacing the C++ base64Slice/base64urlSlice bindings with V8's Uint8Array.prototype.toBase64() (similar to the toHex change) but it caused a 35-54% regression across all buffer sizes so I reverted base64/base64url and kept only the toHex optimization which showed a clear +26-37% win.

@ChALkeR
Copy link
Copy Markdown
Member

ChALkeR commented Feb 18, 2026

@thisalihassan toHex doesn't show a win anymore with nbytes update which should soon land (as it landed in nbytes)

Instead, it's ~3x slower.

See nodejs/nbytes#12

@thisalihassan
Copy link
Copy Markdown
Contributor Author

thisalihassan commented Feb 18, 2026

Hi @ChALkeR thanks for flagging I was not aware. I benchmarked the nibble approach locally and it's indeed a much bigger win (~3x vs my ~30% with toHex). Reverted the toHex path entirely the other changes in this PR are unaffected.

Should I include the nbytes nibble HexEncode optimization in this PR or keep them as separate PRs?

PS: One test is failing /test/parallel/test-debugger-restart-message.js I believe it's known mac issue and unrelated to my changes

@thisalihassan
Copy link
Copy Markdown
Contributor Author

For toHex, wait until #61609, which improves native perf significantly (more than Uint8Array.prototype.toHex)

See also #60249 (comment)

Hi @ChALkeR I see that your changes landed, congratualtions on that huge win. Is there benchmark-ci that we can run on this PR? I can rebase it

@thisalihassan
Copy link
Copy Markdown
Contributor Author

thisalihassan commented Feb 24, 2026

I re-ran the benchmark against the latest main (which contains the nibble improvement) and found out there are few regressions caused by my code:

buffers/buffer-indexof.js n=50000 type='buffer' encoding='undefined' search='aaaaaaaaaaaaaaaaa' *** -9.41 %
buffers/buffer-indexof.js n=50000 type='string' encoding='ascii' search='aaaaaaaaaaaaaaaaa' *** -10.93
buffers/buffer-indexof.js n=50000 type='string' encoding='latin1' search='aaaaaaaaaaaaaaaaa' *** -9.44 %
buffers/buffer-indexof.js n=50000 type='string' encoding='utf8' search='aaaaaaaaaaaaaaaaa' *** -9.18 %

I tried fixing this regression but unavoidable

Attaching Details below:
detail.pdf -- visual breakdown

buffer-benchmark-all-rebased.csv -- raw benchmark CSV

@thisalihassan
Copy link
Copy Markdown
Contributor Author

thisalihassan commented Feb 24, 2026

compare-R-output.txt-- full output from Rscript benchmark/compare.R (shared as a file to avoid cluttering the PR comment).

@thisalihassan thisalihassan force-pushed the buffer-perf-improvements branch from 59f5d09 to 48abfc5 Compare February 27, 2026 23:09
@thisalihassan
Copy link
Copy Markdown
Contributor Author

PS: I resolved some conflicts

@thisalihassan
Copy link
Copy Markdown
Contributor Author

Hi @ChALkeR @anonrig @Renegade334 is this PR ready to land? can you also please check the latest benchmarks i have posted, I have compiled PDF for this benchmark in one my comments above for more detail

Screenshot 2026-03-01 at 2 29 23 AM Screenshot 2026-03-01 at 2 29 40 AM Screenshot 2026-03-01 at 2 29 54 AM Screenshot 2026-03-01 at 2 28 57 AM

Copy link
Copy Markdown
Member

@Qard Qard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM, but one small nit.

@thisalihassan thisalihassan force-pushed the buffer-perf-improvements branch from 1d16ad4 to f1d938b Compare March 21, 2026 04:33
@thisalihassan thisalihassan force-pushed the buffer-perf-improvements branch from f1d938b to 7c8b68d Compare March 25, 2026 09:52
@aduh95
Copy link
Copy Markdown
Contributor

aduh95 commented Mar 25, 2026

Can you rename the other const len = this.length / const len = TypedArrayPrototypeGetLength(this) to bufferLength please?

@thisalihassan
Copy link
Copy Markdown
Contributor Author

thisalihassan commented Mar 25, 2026

Can you rename the other const len = this.length / const len = TypedArrayPrototypeGetLength(this) to bufferLength please?

Done! Hoping this would be the last of the changes :')

Another reference spotted @aduh95

Buffer.prototype.subarray = function subarray(start, end) {
  const srcLength = this.length;

should also be?

Buffer.prototype.subarray = function subarray(start, end) {
  const srcLength = TypedArrayPrototypeGetLength(this);

@nodejs-github-bot
Copy link
Copy Markdown
Collaborator

@thisalihassan
Copy link
Copy Markdown
Contributor Author

this test-runner.test-run-watch-run-duration is the only one failing on the CI

@nodejs-github-bot
Copy link
Copy Markdown
Collaborator

@nodejs-github-bot
Copy link
Copy Markdown
Collaborator

@aduh95 aduh95 added commit-queue Add this label to land a pull request using GitHub Actions. commit-queue-squash Add this label to instruct the Commit Queue to squash all the PR commits into the first one. labels Mar 27, 2026
@nodejs-github-bot nodejs-github-bot removed the commit-queue Add this label to land a pull request using GitHub Actions. label Mar 27, 2026
@nodejs-github-bot nodejs-github-bot merged commit 9e0dc8b into nodejs:main Mar 27, 2026
70 of 71 checks passed
@nodejs-github-bot
Copy link
Copy Markdown
Collaborator

Landed in 9e0dc8b

aduh95 pushed a commit that referenced this pull request Mar 28, 2026
- copyBytesFrom: calculate byte offsets directly instead of
  slicing into an intermediate typed array
- toString('hex'): use V8 Uint8Array.prototype.toHex() builtin
- fill: add single-char ASCII fast path
- indexOf: use indexOfString directly for ASCII encoding
- swap16/32/64: add V8 Fast API functions

PR-URL: #61871
Reviewed-By: Stephen Belanger <admin@stephenbelanger.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
aduh95 pushed a commit that referenced this pull request Mar 28, 2026
- copyBytesFrom: calculate byte offsets directly instead of
  slicing into an intermediate typed array
- toString('hex'): use V8 Uint8Array.prototype.toHex() builtin
- fill: add single-char ASCII fast path
- indexOf: use indexOfString directly for ASCII encoding
- swap16/32/64: add V8 Fast API functions

PR-URL: #61871
Reviewed-By: Stephen Belanger <admin@stephenbelanger.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
aduh95 pushed a commit that referenced this pull request Mar 28, 2026
- copyBytesFrom: calculate byte offsets directly instead of
  slicing into an intermediate typed array
- toString('hex'): use V8 Uint8Array.prototype.toHex() builtin
- fill: add single-char ASCII fast path
- indexOf: use indexOfString directly for ASCII encoding
- swap16/32/64: add V8 Fast API functions

PR-URL: #61871
Reviewed-By: Stephen Belanger <admin@stephenbelanger.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

author ready PRs that have at least one approval, no pending requests for changes, and a CI started. buffer Issues and PRs related to the buffer subsystem. c++ Issues and PRs that require attention from people who are familiar with C++. commit-queue-squash Add this label to instruct the Commit Queue to squash all the PR commits into the first one. needs-benchmark-ci PR that need a benchmark CI run. needs-ci PRs that need a full CI run. performance Issues and PRs related to the performance of Node.js.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants