Optimize Plug.Conn.Utils.validate_utf8!/3#1311
Conversation
|
There are a few weird things. The fact it says no allocation makes me think that most of the work is happening in C, so we can't probably measure memory. I also worry that the comparison with the output binary is going to be expensive. My suggestion would be optimize the current code with the same techniques done here: elixir-lang/elixir#15255 Also remember to measure other languages as input. Such as japanese or korean which can be unicode heavy. |
|
Nice suggestion. Mix.install([
{:benchee, "~> 1.0"}
])
defmodule Original do
def validate_utf8!(binary, exception, context) when is_binary(binary) do
do_validate_utf8!(binary, exception, context)
end
defp do_validate_utf8!(<<_::utf8, rest::bits>>, exception, context) do
do_validate_utf8!(rest, exception, context)
end
defp do_validate_utf8!(<<byte, _::bits>>, exception, context) do
raise exception, "invalid UTF-8 on #{context}, got byte #{byte}"
end
defp do_validate_utf8!(<<>>, _exception, _context) do
:ok
end
end
defmodule Current do
# 56-bit SWAR guard: all 7 bytes are ASCII (< 128)
defguardp ascii_swar?(w)
when Bitwise.band(w, 0x80808080808080) == 0
def validate_utf8!(binary, exception, context) when is_binary(binary) do
do_validate_utf8!(binary, exception, context)
end
defp do_validate_utf8!(<<w::56, b, rest::binary>>, exception, context)
when b <= 127 and ascii_swar?(w) do
do_validate_utf8!(rest, exception, context)
end
defp do_validate_utf8!(<<b, rest::binary>>, exception, context) when b <= 127 do
do_validate_utf8!(rest, exception, context)
end
defp do_validate_utf8!(<<_::utf8, rest::binary>>, exception, context) do
do_validate_utf8!(rest, exception, context)
end
defp do_validate_utf8!(<<byte, _::binary>>, exception, context) do
raise exception, "invalid UTF-8 on #{context}, got byte #{byte}"
end
defp do_validate_utf8!(<<>>, _exception, _context) do
:ok
end
end
small_valid = "hello"
medium_valid = "hello world, this is a beautiful day! Elixir is awesome, and Plug is great."
large_valid = String.duplicate("Lorem ipsum dolor sit amet, consectetur adipiscing elit. ", 200)
invalid_string = "hello \xff world"
exception = RuntimeError
context = "test"
Benchee.run(
%{
"original (main branch)" => fn input ->
try do
Original.validate_utf8!(input, exception, context)
rescue
_ -> :ok
end
end,
"current (optimization14)" => fn input ->
try do
Current.validate_utf8!(input, exception, context)
rescue
_ -> :ok
end
end
},
inputs: %{
"Small Valid" => small_valid,
"Medium Valid" => medium_valid,
"Large Valid" => large_valid,
"Invalid" => invalid_string
},
time: 2,
memory_time: 2
)Results: Operating System: Linux
CPU Information: AMD Ryzen 7 8845HS w
Number of Available Cores: 16
Available memory: 54.72 GB
Elixir 1.20.0
Erlang 29.0.1
JIT enabled: true
Benchmark suite executing with the following configuration:
warmup: 2 s
time: 2 s
memory time: 2 s
reduction time: 0 ns
parallel: 1
inputs: Invalid, Large Valid, Medium Valid, Small Valid
Estimated total run time: 48 s
Excluding outliers: false
Benchmarking current (optimization14) with input Invalid ...
Benchmarking current (optimization14) with input Large Valid ...
Benchmarking current (optimization14) with input Medium Valid ...
Benchmarking current (optimization14) with input Small Valid ...
Benchmarking original (main branch) with input Invalid ...
Benchmarking original (main branch) with input Large Valid ...
Benchmarking original (main branch) with input Medium Valid ...
Benchmarking original (main branch) with input Small Valid ...
Calculating statistics...
Formatting results...
##### With input Invalid #####
Name ips average deviation median 99th %
original (main branch) 4.68 M 213.77 ns ±2102.38% 191 ns 351 ns
current (optimization14) 2.21 M 452.24 ns ±1748.98% 321 ns 2405 ns
Comparison:
original (main branch) 4.68 M
current (optimization14) 2.21 M - 2.12x slower +238.47 ns
Memory usage statistics:
Name Memory usage
original (main branch) 336 B
current (optimization14) 336 B - 1.00x memory usage +0 B
**All measurements for memory usage were the same**
##### With input Large Valid #####
Name ips average deviation median 99th %
current (optimization14) 269.24 K 3.71 μs ±24.51% 3.62 μs 6.62 μs
original (main branch) 49.96 K 20.02 μs ±25.64% 18.43 μs 37.86 μs
Comparison:
current (optimization14) 269.24 K
original (main branch) 49.96 K - 5.39x slower +16.30 μs
Memory usage statistics:
Name Memory usage
current (optimization14) 40 B
original (main branch) 40 B - 1.00x memory usage +0 B
**All measurements for memory usage were the same**
##### With input Medium Valid #####
Name ips average deviation median 99th %
current (optimization14) 13.32 M 75.06 ns ±3616.55% 70 ns 100 ns
original (main branch) 5.65 M 177.04 ns ±1392.26% 161 ns 320 ns
Comparison:
current (optimization14) 13.32 M
original (main branch) 5.65 M - 2.36x slower +101.98 ns
Memory usage statistics:
Name Memory usage
current (optimization14) 40 B
original (main branch) 40 B - 1.00x memory usage +0 B
**All measurements for memory usage were the same**
##### With input Small Valid #####
Name ips average deviation median 99th %
original (main branch) 17.61 M 56.80 ns ±4628.88% 50 ns 90 ns
current (optimization14) 15.39 M 64.98 ns ±4332.95% 60 ns 80 ns
Comparison:
original (main branch) 17.61 M
current (optimization14) 15.39 M - 1.14x slower +8.18 ns
Memory usage statistics:
Name Memory usage
original (main branch) 40 B
current (optimization14) 40 B - 1.00x memory usage +0 B |
|
@josevalim would a hybrid approach where we route below 12 bytes to original implementation and for larger binaries we use the swar fit this codebase or that is not the direction you would go? if byte_size(binary) < 12 do
do_validate_utf8_small!(binary, exception, context)
else
do_validate_utf8_swar!(binary, exception, context)
end |
|
We can do a tiny branching, yes. That's cheap. Also please test large/medium/small inputs in japanese/korean. Thank you! |
|
Bench: Mix.install([
{:benchee, "~> 1.0"}
])
defmodule Original do
def validate_utf8!(binary, exception, context) when is_binary(binary) do
do_validate_utf8!(binary, exception, context)
end
defp do_validate_utf8!(<<_::utf8, rest::bits>>, exception, context) do
do_validate_utf8!(rest, exception, context)
end
defp do_validate_utf8!(<<byte, _::bits>>, exception, context) do
raise exception, "invalid UTF-8 on #{context}, got byte #{byte}"
end
defp do_validate_utf8!(<<>>, _exception, _context) do
:ok
end
end
defmodule Current do
# 56-bit SWAR guard: all 7 bytes are ASCII (< 128)
defguardp ascii_swar?(w)
when Bitwise.band(w, 0x80808080808080) == 0
def validate_utf8!(binary, exception, context)
def validate_utf8!(<<binary::binary>>, exception, context) do
if byte_size(binary) < 12 do
do_validate_utf8_small!(binary, exception, context)
else
do_validate_utf8_swar!(binary, exception, context)
end
end
# SWAR loop
defp do_validate_utf8_swar!(<<w::56, b, rest::bits>>, exception, context)
when b <= 127 and ascii_swar?(w) do
do_validate_utf8_swar!(rest, exception, context)
end
defp do_validate_utf8_swar!(rest, exception, context) do
do_validate_utf8_small!(rest, exception, context)
end
# Small loop (identical to original character loop)
defp do_validate_utf8_small!(<<_::utf8, rest::bits>>, exception, context) do
do_validate_utf8_small!(rest, exception, context)
end
defp do_validate_utf8_small!(<<byte, _::bits>>, exception, context) do
raise exception, "invalid UTF-8 on #{context}, got byte #{byte}"
end
defp do_validate_utf8_small!(<<>>, _exception, _context) do
:ok
end
end
small_valid = "hello"
medium_valid = "hello world, this is a beautiful day! Elixir is awesome, and Plug is great."
large_valid = String.duplicate("Lorem ipsum dolor sit amet, consectetur adipiscing elit. ", 200)
invalid_string = "hello \xff world"
jp_small = "こんにちは"
jp_medium = "日本語と韓国語のテキストをテストしています。これは中くらいの長さ of テキストです。"
jp_large = String.duplicate("日本語の長いテキストのテスト。すべての文字が正しく検証される必要があります。", 100)
kr_small = "안녕하세요"
kr_medium = "한국어와 일본어 텍스트를 테스트하고 있습니다. 이것은 중간 크기입니다."
kr_large = String.duplicate("한국어 긴 텍스트 테스트. 모든 글자가 올바르게 검증되어야 합니다.", 100)
exception = RuntimeError
context = "test"
Benchee.run(
%{
"original (main branch)" => fn input ->
try do
Original.validate_utf8!(input, exception, context)
rescue
_ -> :ok
end
end,
"current (optimization14)" => fn input ->
try do
Current.validate_utf8!(input, exception, context)
rescue
_ -> :ok
end
end
},
inputs: %{
"Small Valid" => small_valid,
"Medium Valid" => medium_valid,
"Large Valid" => large_valid,
"Invalid" => invalid_string,
"Japanese Small" => jp_small,
"Japanese Medium" => jp_medium,
"Japanese Large" => jp_large,
"Korean Small" => kr_small,
"Korean Medium" => kr_medium,
"Korean Large" => kr_large
},
time: 2,
memory_time: 2
)Results on noisy system: Operating System: Linux
CPU Information: AMD Ryzen 7 8845HS w
Number of Available Cores: 16
Available memory: 54.72 GB
Elixir 1.20.0
Erlang 29.0.1
JIT enabled: true
Benchmark suite executing with the following configuration:
warmup: 2 s
time: 2 s
memory time: 2 s
reduction time: 0 ns
parallel: 1
inputs: Invalid, Japanese Large, Japanese Medium, Japanese Small, Korean Large, Korean Medium, Korean Small, Large Valid, Medium Valid, Small Valid
Estimated total run time: 2 min
Excluding outliers: false
##### With input Invalid #####
Name ips average deviation median 99th %
current (optimization14) 5.40 M 185.29 ns ±2477.54% 160 ns 311 ns
original (main branch) 5.32 M 188.13 ns ±2422.84% 170 ns 301 ns
Comparison:
current (optimization14) 5.40 M
original (main branch) 5.32 M - 1.02x slower +2.84 ns
Memory usage statistics:
Name Memory usage
current (optimization14) 336 B
original (main branch) 336 B - 1.00x memory usage +0 B
**All measurements for memory usage were the same**
##### With input Japanese Large #####
Name ips average deviation median 99th %
original (main branch) 46.31 K 21.59 μs ±26.55% 21.34 μs 25.05 μs
current (optimization14) 45.91 K 21.78 μs ±19.96% 21.90 μs 26.17 μs
Comparison:
original (main branch) 46.31 K
current (optimization14) 45.91 K - 1.01x slower +0.188 μs
Memory usage statistics:
Name Memory usage
original (main branch) 40 B
current (optimization14) 40 B - 1.00x memory usage +0 B
**All measurements for memory usage were the same**
##### With input Japanese Medium #####
Name ips average deviation median 99th %
original (main branch) 3.65 M 273.63 ns ±1198.27% 261 ns 371 ns
current (optimization14) 3.56 M 281.00 ns ±1282.27% 270 ns 400 ns
Comparison:
original (main branch) 3.65 M
current (optimization14) 3.56 M - 1.03x slower +7.36 ns
Memory usage statistics:
Name Memory usage
original (main branch) 40 B
current (optimization14) 40 B - 1.00x memory usage +0 B
**All measurements for memory usage were the same**
##### With input Japanese Small #####
Name ips average deviation median 99th %
original (main branch) 14.61 M 68.43 ns ±3783.09% 60 ns 90 ns
current (optimization14) 13.79 M 72.50 ns ±3613.47% 70 ns 100 ns
Comparison:
original (main branch) 14.61 M
current (optimization14) 13.79 M - 1.06x slower +4.07 ns
Memory usage statistics:
Name Memory usage
original (main branch) 40 B
current (optimization14) 40 B - 1.00x memory usage +0 B
**All measurements for memory usage were the same**
##### With input Korean Large #####
Name ips average deviation median 99th %
original (main branch) 60.84 K 16.44 μs ±12.28% 16.25 μs 20.00 μs
current (optimization14) 60.73 K 16.47 μs ±20.90% 16.10 μs 19.85 μs
Comparison:
original (main branch) 60.84 K
current (optimization14) 60.73 K - 1.00x slower +0.0297 μs
Memory usage statistics:
Name Memory usage
original (main branch) 40 B
current (optimization14) 40 B - 1.00x memory usage +0 B
**All measurements for memory usage were the same**
##### With input Korean Medium #####
Name ips average deviation median 99th %
original (main branch) 4.35 M 229.97 ns ±1229.31% 220 ns 330 ns
current (optimization14) 4.22 M 236.85 ns ±1245.50% 221 ns 331 ns
Comparison:
original (main branch) 4.35 M
current (optimization14) 4.22 M - 1.03x slower +6.88 ns
Memory usage statistics:
Name Memory usage
original (main branch) 40 B
current (optimization14) 40 B - 1.00x memory usage +0 B
**All measurements for memory usage were the same**
##### With input Korean Small #####
Name ips average deviation median 99th %
original (main branch) 14.48 M 69.04 ns ±3687.47% 60 ns 90 ns
current (optimization14) 13.66 M 73.22 ns ±3567.65% 70 ns 100 ns
Comparison:
original (main branch) 14.48 M
current (optimization14) 13.66 M - 1.06x slower +4.18 ns
Memory usage statistics:
Name Memory usage
original (main branch) 40 B
current (optimization14) 40 B - 1.00x memory usage +0 B
**All measurements for memory usage were the same**
##### With input Large Valid #####
Name ips average deviation median 99th %
current (optimization14) 376.97 K 2.65 μs ±27.37% 2.58 μs 4.52 μs
original (main branch) 54.50 K 18.35 μs ±14.69% 18.11 μs 22.23 μs
Comparison:
current (optimization14) 376.97 K
original (main branch) 54.50 K - 6.92x slower +15.70 μs
Memory usage statistics:
Name Memory usage
current (optimization14) 40 B
original (main branch) 40 B - 1.00x memory usage +0 B
**All measurements for memory usage were the same**
##### With input Medium Valid #####
Name ips average deviation median 99th %
current (optimization14) 16.35 M 61.15 ns ±5096.65% 50 ns 90 ns
original (main branch) 6.91 M 144.69 ns ±1886.01% 131 ns 251 ns
Comparison:
current (optimization14) 16.35 M
original (main branch) 6.91 M - 2.37x slower +83.55 ns
Memory usage statistics:
Name Memory usage
current (optimization14) 40 B
original (main branch) 40 B - 1.00x memory usage +0 B
**All measurements for memory usage were the same**
##### With input Small Valid #####
Name ips average deviation median 99th %
current (optimization14) 22.40 M 44.64 ns ±5934.40% 40 ns 60 ns
original (main branch) 21.91 M 45.63 ns ±5393.68% 40 ns 70 ns
Comparison:
current (optimization14) 22.40 M
original (main branch) 21.91 M - 1.02x slower +1.00 ns
Memory usage statistics:
Name Memory usage
current (optimization14) 40 B
original (main branch) 40 B - 1.00x memory usage +0 B
**All measurements for memory usage were the same** |
|
💚 💙 💜 💛 ❤️ |
Assisted by: Antigravity CLI : Gemini Flash 3.5
The optimized code is faster for valid input and the original code is faster for invalid input but invalid input should be rare in real world applications.
Bench:
Benchmark results (noisy system):