Skip to content

Optimize Plug.Conn.Utils.validate_utf8!/3#1311

Merged
josevalim merged 5 commits into
elixir-plug:mainfrom
preciz:optimization14
Jun 15, 2026
Merged

Optimize Plug.Conn.Utils.validate_utf8!/3#1311
josevalim merged 5 commits into
elixir-plug:mainfrom
preciz:optimization14

Conversation

@preciz

@preciz preciz commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Assisted by: Antigravity CLI : Gemini Flash 3.5

The optimized code is faster for valid input and the original code is faster for invalid input but invalid input should be rare in real world applications.

Bench:

Mix.install([
  {:benchee, "~> 1.0"}
])

defmodule Original do
  def validate_utf8!(binary, exception, context) when is_binary(binary) do
    do_validate_utf8!(binary, exception, context)
  end

  defp do_validate_utf8!(<<_::utf8, rest::bits>>, exception, context) do
    do_validate_utf8!(rest, exception, context)
  end

  defp do_validate_utf8!(<<byte, _::bits>>, exception, context) do
    raise exception, "invalid UTF-8 on #{context}, got byte #{byte}"
  end

  defp do_validate_utf8!(<<>>, _exception, _context) do
    :ok
  end
end

defmodule Optimized do
  def validate_utf8!(binary, exception, context) when is_binary(binary) do
    case :unicode.characters_to_binary(binary) do
      ^binary ->
        :ok

      {_, _, <<byte, _::binary>>} ->
        raise exception, "invalid UTF-8 on #{context}, got byte #{byte}"
    end
  end
end

small_valid = "hello"
medium_valid = "hello world, this is a beautiful day! Elixir is awesome, and Plug is great."
large_valid = String.duplicate("Lorem ipsum dolor sit amet, consectetur adipiscing elit. ", 200)
invalid_string = "hello \xff world"

exception = RuntimeError
context = "test"

Benchee.run(
  %{
    "original" => fn input ->
      try do
        Original.validate_utf8!(input, exception, context)
      rescue
        _ -> :ok
      end
    end,
    "optimized" => fn input ->
      try do
        Optimized.validate_utf8!(input, exception, context)
      rescue
        _ -> :ok
      end
    end
  },
  inputs: %{
    "Small Valid" => small_valid,
    "Medium Valid" => medium_valid,
    "Large Valid" => large_valid,
    "Invalid" => invalid_string
  },
  time: 2,
  memory_time: 2
)

Benchmark results (noisy system):

Operating System: Linux
CPU Information: AMD Ryzen 7 8845HS w
Number of Available Cores: 16
Available memory: 54.72 GB
Elixir 1.20.0
Erlang 29.0.1
JIT enabled: true

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 2 s
memory time: 2 s
reduction time: 0 ns
parallel: 1
inputs: Invalid, Large Valid, Medium Valid, Small Valid
Estimated total run time: 48 s
Excluding outliers: false

##### With input Invalid #####
Name                ips        average  deviation         median         99th %
original         4.95 M      202.09 ns  ±2269.53%         180 ns         331 ns
optimized        2.10 M      475.80 ns  ±1401.32%         381 ns        1042 ns

Comparison:
original         4.95 M
optimized        2.10 M - 2.35x slower +273.71 ns

Memory usage statistics:

Name         Memory usage
original            336 B
optimized           456 B - 1.36x memory usage +120 B

**All measurements for memory usage were the same**

##### With input Large Valid #####
Name                ips        average  deviation         median         99th %
optimized      326.15 K        3.07 μs   ±212.89%        2.77 μs        5.64 μs
original        42.42 K       23.57 μs    ±40.51%       20.03 μs       43.54 μs

Comparison:
optimized      326.15 K
original        42.42 K - 7.69x slower +20.51 μs

Memory usage statistics:

Name         Memory usage
optimized             0 B
original             40 B - ∞ x memory usage +40 B

**All measurements for memory usage were the same**

##### With input Medium Valid #####
Name                ips        average  deviation         median         99th %
optimized       10.89 M       91.85 ns  ±8689.99%          60 ns         130 ns
original         6.15 M      162.55 ns  ±1687.46%         150 ns         291 ns

Comparison:
optimized       10.89 M
original         6.15 M - 1.77x slower +70.71 ns

Memory usage statistics:

Name         Memory usage
optimized             0 B
original             40 B - ∞ x memory usage +40 B

**All measurements for memory usage were the same**

##### With input Small Valid #####
Name                ips        average  deviation         median         99th %
optimized       18.57 M       53.84 ns  ±9579.56%          40 ns          61 ns
original        17.31 M       57.76 ns  ±6189.30%          41 ns          91 ns

Comparison:
optimized       18.57 M
original        17.31 M - 1.07x slower +3.92 ns

Memory usage statistics:

Name         Memory usage
optimized             0 B
original             40 B - ∞ x memory usage +40 B

@josevalim

Copy link
Copy Markdown
Member

There are a few weird things. The fact it says no allocation makes me think that most of the work is happening in C, so we can't probably measure memory. I also worry that the comparison with the output binary is going to be expensive.

My suggestion would be optimize the current code with the same techniques done here: elixir-lang/elixir#15255

Also remember to measure other languages as input. Such as japanese or korean which can be unicode heavy.

@preciz

preciz commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

Nice suggestion.
For small valid inputs it seems the original implementation is faster, so there is a tradeoff but for large inputs the new version is much faster.

Mix.install([
  {:benchee, "~> 1.0"}
])

defmodule Original do
  def validate_utf8!(binary, exception, context) when is_binary(binary) do
    do_validate_utf8!(binary, exception, context)
  end

  defp do_validate_utf8!(<<_::utf8, rest::bits>>, exception, context) do
    do_validate_utf8!(rest, exception, context)
  end

  defp do_validate_utf8!(<<byte, _::bits>>, exception, context) do
    raise exception, "invalid UTF-8 on #{context}, got byte #{byte}"
  end

  defp do_validate_utf8!(<<>>, _exception, _context) do
    :ok
  end
end

defmodule Current do
  # 56-bit SWAR guard: all 7 bytes are ASCII (< 128)
  defguardp ascii_swar?(w)
            when Bitwise.band(w, 0x80808080808080) == 0

  def validate_utf8!(binary, exception, context) when is_binary(binary) do
    do_validate_utf8!(binary, exception, context)
  end

  defp do_validate_utf8!(<<w::56, b, rest::binary>>, exception, context)
       when b <= 127 and ascii_swar?(w) do
    do_validate_utf8!(rest, exception, context)
  end

  defp do_validate_utf8!(<<b, rest::binary>>, exception, context) when b <= 127 do
    do_validate_utf8!(rest, exception, context)
  end

  defp do_validate_utf8!(<<_::utf8, rest::binary>>, exception, context) do
    do_validate_utf8!(rest, exception, context)
  end

  defp do_validate_utf8!(<<byte, _::binary>>, exception, context) do
    raise exception, "invalid UTF-8 on #{context}, got byte #{byte}"
  end

  defp do_validate_utf8!(<<>>, _exception, _context) do
    :ok
  end
end

small_valid = "hello"
medium_valid = "hello world, this is a beautiful day! Elixir is awesome, and Plug is great."
large_valid = String.duplicate("Lorem ipsum dolor sit amet, consectetur adipiscing elit. ", 200)
invalid_string = "hello \xff world"

exception = RuntimeError
context = "test"

Benchee.run(
  %{
    "original (main branch)" => fn input ->
      try do
        Original.validate_utf8!(input, exception, context)
      rescue
        _ -> :ok
      end
    end,
    "current (optimization14)" => fn input ->
      try do
        Current.validate_utf8!(input, exception, context)
      rescue
        _ -> :ok
      end
    end
  },
  inputs: %{
    "Small Valid" => small_valid,
    "Medium Valid" => medium_valid,
    "Large Valid" => large_valid,
    "Invalid" => invalid_string
  },
  time: 2,
  memory_time: 2
)

Results:

Operating System: Linux
CPU Information: AMD Ryzen 7 8845HS w
Number of Available Cores: 16
Available memory: 54.72 GB
Elixir 1.20.0
Erlang 29.0.1
JIT enabled: true

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 2 s
memory time: 2 s
reduction time: 0 ns
parallel: 1
inputs: Invalid, Large Valid, Medium Valid, Small Valid
Estimated total run time: 48 s
Excluding outliers: false

Benchmarking current (optimization14) with input Invalid ...
Benchmarking current (optimization14) with input Large Valid ...
Benchmarking current (optimization14) with input Medium Valid ...
Benchmarking current (optimization14) with input Small Valid ...
Benchmarking original (main branch) with input Invalid ...
Benchmarking original (main branch) with input Large Valid ...
Benchmarking original (main branch) with input Medium Valid ...
Benchmarking original (main branch) with input Small Valid ...
Calculating statistics...
Formatting results...

##### With input Invalid #####
Name                               ips        average  deviation         median         99th %
original (main branch)          4.68 M      213.77 ns  ±2102.38%         191 ns         351 ns
current (optimization14)        2.21 M      452.24 ns  ±1748.98%         321 ns        2405 ns

Comparison:
original (main branch)          4.68 M
current (optimization14)        2.21 M - 2.12x slower +238.47 ns

Memory usage statistics:

Name                        Memory usage
original (main branch)             336 B
current (optimization14)           336 B - 1.00x memory usage +0 B

**All measurements for memory usage were the same**

##### With input Large Valid #####
Name                               ips        average  deviation         median         99th %
current (optimization14)      269.24 K        3.71 μs    ±24.51%        3.62 μs        6.62 μs
original (main branch)         49.96 K       20.02 μs    ±25.64%       18.43 μs       37.86 μs

Comparison:
current (optimization14)      269.24 K
original (main branch)         49.96 K - 5.39x slower +16.30 μs

Memory usage statistics:

Name                        Memory usage
current (optimization14)            40 B
original (main branch)              40 B - 1.00x memory usage +0 B

**All measurements for memory usage were the same**

##### With input Medium Valid #####
Name                               ips        average  deviation         median         99th %
current (optimization14)       13.32 M       75.06 ns  ±3616.55%          70 ns         100 ns
original (main branch)          5.65 M      177.04 ns  ±1392.26%         161 ns         320 ns

Comparison:
current (optimization14)       13.32 M
original (main branch)          5.65 M - 2.36x slower +101.98 ns

Memory usage statistics:

Name                        Memory usage
current (optimization14)            40 B
original (main branch)              40 B - 1.00x memory usage +0 B

**All measurements for memory usage were the same**

##### With input Small Valid #####
Name                               ips        average  deviation         median         99th %
original (main branch)         17.61 M       56.80 ns  ±4628.88%          50 ns          90 ns
current (optimization14)       15.39 M       64.98 ns  ±4332.95%          60 ns          80 ns

Comparison:
original (main branch)         17.61 M
current (optimization14)       15.39 M - 1.14x slower +8.18 ns

Memory usage statistics:

Name                        Memory usage
original (main branch)              40 B
current (optimization14)            40 B - 1.00x memory usage +0 B

@preciz preciz changed the title Optimize and simplify Plug.Conn.Utils.validate_utf8!/3 Optimize Plug.Conn.Utils.validate_utf8!/3 Jun 15, 2026
@preciz

preciz commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

@josevalim would a hybrid approach where we route below 12 bytes to original implementation and for larger binaries we use the swar fit this codebase or that is not the direction you would go?

    if byte_size(binary) < 12 do
      do_validate_utf8_small!(binary, exception, context)
    else
      do_validate_utf8_swar!(binary, exception, context)
    end

@josevalim

Copy link
Copy Markdown
Member

We can do a tiny branching, yes. That's cheap. Also please test large/medium/small inputs in japanese/korean. Thank you!

@preciz

preciz commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

Bench:

Mix.install([
  {:benchee, "~> 1.0"}
])

defmodule Original do
  def validate_utf8!(binary, exception, context) when is_binary(binary) do
    do_validate_utf8!(binary, exception, context)
  end

  defp do_validate_utf8!(<<_::utf8, rest::bits>>, exception, context) do
    do_validate_utf8!(rest, exception, context)
  end

  defp do_validate_utf8!(<<byte, _::bits>>, exception, context) do
    raise exception, "invalid UTF-8 on #{context}, got byte #{byte}"
  end

  defp do_validate_utf8!(<<>>, _exception, _context) do
    :ok
  end
end

defmodule Current do
  # 56-bit SWAR guard: all 7 bytes are ASCII (< 128)
  defguardp ascii_swar?(w)
            when Bitwise.band(w, 0x80808080808080) == 0

  def validate_utf8!(binary, exception, context)

  def validate_utf8!(<<binary::binary>>, exception, context) do
    if byte_size(binary) < 12 do
      do_validate_utf8_small!(binary, exception, context)
    else
      do_validate_utf8_swar!(binary, exception, context)
    end
  end

  # SWAR loop
  defp do_validate_utf8_swar!(<<w::56, b, rest::bits>>, exception, context)
       when b <= 127 and ascii_swar?(w) do
    do_validate_utf8_swar!(rest, exception, context)
  end

  defp do_validate_utf8_swar!(rest, exception, context) do
    do_validate_utf8_small!(rest, exception, context)
  end

  # Small loop (identical to original character loop)
  defp do_validate_utf8_small!(<<_::utf8, rest::bits>>, exception, context) do
    do_validate_utf8_small!(rest, exception, context)
  end

  defp do_validate_utf8_small!(<<byte, _::bits>>, exception, context) do
    raise exception, "invalid UTF-8 on #{context}, got byte #{byte}"
  end

  defp do_validate_utf8_small!(<<>>, _exception, _context) do
    :ok
  end
end

small_valid = "hello"
medium_valid = "hello world, this is a beautiful day! Elixir is awesome, and Plug is great."
large_valid = String.duplicate("Lorem ipsum dolor sit amet, consectetur adipiscing elit. ", 200)
invalid_string = "hello \xff world"

jp_small = "こんにちは"
jp_medium = "日本語と韓国語のテキストをテストしています。これは中くらいの長さ of テキストです。"
jp_large = String.duplicate("日本語の長いテキストのテスト。すべての文字が正しく検証される必要があります。", 100)

kr_small = "안녕하세요"
kr_medium = "한국어와 일본어 텍스트를 테스트하고 있습니다. 이것은 중간 크기입니다."
kr_large = String.duplicate("한국어 긴 텍스트 테스트. 모든 글자가 올바르게 검증되어야 합니다.", 100)

exception = RuntimeError
context = "test"

Benchee.run(
  %{
    "original (main branch)" => fn input ->
      try do
        Original.validate_utf8!(input, exception, context)
      rescue
        _ -> :ok
      end
    end,
    "current (optimization14)" => fn input ->
      try do
        Current.validate_utf8!(input, exception, context)
      rescue
        _ -> :ok
      end
    end
  },
  inputs: %{
    "Small Valid" => small_valid,
    "Medium Valid" => medium_valid,
    "Large Valid" => large_valid,
    "Invalid" => invalid_string,
    "Japanese Small" => jp_small,
    "Japanese Medium" => jp_medium,
    "Japanese Large" => jp_large,
    "Korean Small" => kr_small,
    "Korean Medium" => kr_medium,
    "Korean Large" => kr_large
  },
  time: 2,
  memory_time: 2
)

Results on noisy system:

Operating System: Linux
CPU Information: AMD Ryzen 7 8845HS w
Number of Available Cores: 16
Available memory: 54.72 GB
Elixir 1.20.0
Erlang 29.0.1
JIT enabled: true

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 2 s
memory time: 2 s
reduction time: 0 ns
parallel: 1
inputs: Invalid, Japanese Large, Japanese Medium, Japanese Small, Korean Large, Korean Medium, Korean Small, Large Valid, Medium Valid, Small Valid
Estimated total run time: 2 min
Excluding outliers: false

##### With input Invalid #####
Name                               ips        average  deviation         median         99th %
current (optimization14)        5.40 M      185.29 ns  ±2477.54%         160 ns         311 ns
original (main branch)          5.32 M      188.13 ns  ±2422.84%         170 ns         301 ns

Comparison:
current (optimization14)        5.40 M
original (main branch)          5.32 M - 1.02x slower +2.84 ns

Memory usage statistics:

Name                        Memory usage
current (optimization14)           336 B
original (main branch)             336 B - 1.00x memory usage +0 B

**All measurements for memory usage were the same**

##### With input Japanese Large #####
Name                               ips        average  deviation         median         99th %
original (main branch)         46.31 K       21.59 μs    ±26.55%       21.34 μs       25.05 μs
current (optimization14)       45.91 K       21.78 μs    ±19.96%       21.90 μs       26.17 μs

Comparison:
original (main branch)         46.31 K
current (optimization14)       45.91 K - 1.01x slower +0.188 μs

Memory usage statistics:

Name                        Memory usage
original (main branch)              40 B
current (optimization14)            40 B - 1.00x memory usage +0 B

**All measurements for memory usage were the same**

##### With input Japanese Medium #####
Name                               ips        average  deviation         median         99th %
original (main branch)          3.65 M      273.63 ns  ±1198.27%         261 ns         371 ns
current (optimization14)        3.56 M      281.00 ns  ±1282.27%         270 ns         400 ns

Comparison:
original (main branch)          3.65 M
current (optimization14)        3.56 M - 1.03x slower +7.36 ns

Memory usage statistics:

Name                        Memory usage
original (main branch)              40 B
current (optimization14)            40 B - 1.00x memory usage +0 B

**All measurements for memory usage were the same**

##### With input Japanese Small #####
Name                               ips        average  deviation         median         99th %
original (main branch)         14.61 M       68.43 ns  ±3783.09%          60 ns          90 ns
current (optimization14)       13.79 M       72.50 ns  ±3613.47%          70 ns         100 ns

Comparison:
original (main branch)         14.61 M
current (optimization14)       13.79 M - 1.06x slower +4.07 ns

Memory usage statistics:

Name                        Memory usage
original (main branch)              40 B
current (optimization14)            40 B - 1.00x memory usage +0 B

**All measurements for memory usage were the same**

##### With input Korean Large #####
Name                               ips        average  deviation         median         99th %
original (main branch)         60.84 K       16.44 μs    ±12.28%       16.25 μs       20.00 μs
current (optimization14)       60.73 K       16.47 μs    ±20.90%       16.10 μs       19.85 μs

Comparison:
original (main branch)         60.84 K
current (optimization14)       60.73 K - 1.00x slower +0.0297 μs

Memory usage statistics:

Name                        Memory usage
original (main branch)              40 B
current (optimization14)            40 B - 1.00x memory usage +0 B

**All measurements for memory usage were the same**

##### With input Korean Medium #####
Name                               ips        average  deviation         median         99th %
original (main branch)          4.35 M      229.97 ns  ±1229.31%         220 ns         330 ns
current (optimization14)        4.22 M      236.85 ns  ±1245.50%         221 ns         331 ns

Comparison:
original (main branch)          4.35 M
current (optimization14)        4.22 M - 1.03x slower +6.88 ns

Memory usage statistics:

Name                        Memory usage
original (main branch)              40 B
current (optimization14)            40 B - 1.00x memory usage +0 B

**All measurements for memory usage were the same**

##### With input Korean Small #####
Name                               ips        average  deviation         median         99th %
original (main branch)         14.48 M       69.04 ns  ±3687.47%          60 ns          90 ns
current (optimization14)       13.66 M       73.22 ns  ±3567.65%          70 ns         100 ns

Comparison:
original (main branch)         14.48 M
current (optimization14)       13.66 M - 1.06x slower +4.18 ns

Memory usage statistics:

Name                        Memory usage
original (main branch)              40 B
current (optimization14)            40 B - 1.00x memory usage +0 B

**All measurements for memory usage were the same**

##### With input Large Valid #####
Name                               ips        average  deviation         median         99th %
current (optimization14)      376.97 K        2.65 μs    ±27.37%        2.58 μs        4.52 μs
original (main branch)         54.50 K       18.35 μs    ±14.69%       18.11 μs       22.23 μs

Comparison:
current (optimization14)      376.97 K
original (main branch)         54.50 K - 6.92x slower +15.70 μs

Memory usage statistics:

Name                        Memory usage
current (optimization14)            40 B
original (main branch)              40 B - 1.00x memory usage +0 B

**All measurements for memory usage were the same**

##### With input Medium Valid #####
Name                               ips        average  deviation         median         99th %
current (optimization14)       16.35 M       61.15 ns  ±5096.65%          50 ns          90 ns
original (main branch)          6.91 M      144.69 ns  ±1886.01%         131 ns         251 ns

Comparison:
current (optimization14)       16.35 M
original (main branch)          6.91 M - 2.37x slower +83.55 ns

Memory usage statistics:

Name                        Memory usage
current (optimization14)            40 B
original (main branch)              40 B - 1.00x memory usage +0 B

**All measurements for memory usage were the same**

##### With input Small Valid #####
Name                               ips        average  deviation         median         99th %
current (optimization14)       22.40 M       44.64 ns  ±5934.40%          40 ns          60 ns
original (main branch)         21.91 M       45.63 ns  ±5393.68%          40 ns          70 ns

Comparison:
current (optimization14)       22.40 M
original (main branch)         21.91 M - 1.02x slower +1.00 ns

Memory usage statistics:

Name                        Memory usage
current (optimization14)            40 B
original (main branch)              40 B - 1.00x memory usage +0 B

**All measurements for memory usage were the same**

@josevalim josevalim merged commit 9153fb7 into elixir-plug:main Jun 15, 2026
2 checks passed
@josevalim

Copy link
Copy Markdown
Member

💚 💙 💜 💛 ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants