Skip to content

perf(files): keep systemtag and filecache joins index-friendly#60932

Open
solracsf wants to merge 2 commits into
masterfrom
fix/filecache-systemtag-join-objectid-index
Open

perf(files): keep systemtag and filecache joins index-friendly#60932
solracsf wants to merge 2 commits into
masterfrom
fix/filecache-systemtag-join-objectid-index

Conversation

@solracsf
Copy link
Copy Markdown
Member

@solracsf solracsf commented Jun 2, 2026

Summary

The system tag search joins compared filecache.fileid (bigint) against systemtag_object_mapping.objectid (varchar). On MySQL/MariaDB this implicitly coerces the objectid string column to a number, which makes any index on objectid unusable and turns the join into a full/BNL scan of the mapping table.

Comparing as strings instead (objectid = CAST(fileid AS CHAR)) keeps the index usable. On a 100k-row mapping table the systemtagmap access dropped from ~50k examined rows per outer row to 1, with identical results (objecttype is filtered to 'files', where objectid is always the decimal fileid). Applied to both the search query and the tag-usage count query.

Also, strip a lone to_char() string cast when parsing partitioned join conditions (the previous branch duplicated the int-cast check and was dead code) so sharded setups extract the join columns correctly on Oracle.

Before, only the objecttype prefix of the primary key is usable, so each of the 1,111 matched files scans
the entire mapping table. After, the (objecttype, objectid) prefix gives a direct lookup.

Metrics

systemtag_object_mapping access (the row that changes):

type key key_len ref est. rows actual rows read / probe
before ref PRIMARY 258 const 50,432 100,000.0
after ref PRIMARY 516 const,func 1 0.1

Before, only the objecttype prefix of the primary key is usable, so each of the 1,111 matched files scans
the entire mapping table. After, the (objecttype, objectid) prefix gives a direct lookup.

End-to-end (1,111 files, warm cache, best of 3 runs, identical 1,111 rows returned both ways):

before after
~18,000 ms ~3 ms

Synthetic dataset (100k tag mappings) on a warm buffer pool, absolute timings will differ in production,
but the access-type / rows-examined improvement (from full mapping-table scan per file, to single index lookup)
is the durable result.

############################################################
## EXPLAIN — file search joining system tags (1111 files)
############################################################

>>> BEFORE (fileid = objectid : bigint vs varchar):
+------+-------------+--------------+-------+------------------------------------------------------------------------------------------------------------------+--------------+---------+-------+-------+------------------------------------+
| id   | select_type | table        | type  | possible_keys                                                                                                    | key          | key_len | ref   | rows  | Extra                              |
+------+-------------+--------------+-------+------------------------------------------------------------------------------------------------------------------+--------------+---------+-------+-------+------------------------------------+
|    1 | SIMPLE      | filecache    | range | fs_storage_path_hash,fs_storage_mimetype,fs_storage_mimepart,fs_storage_size,fs_name_hash,fs_storage_path_prefix | fs_name_hash | 1003    | NULL  | 1111  | Using index condition; Using where |
|    1 | SIMPLE      | systemtagmap | ref   | PRIMARY,systag_by_objectid,systag_objecttype                                                                     | PRIMARY      | 258     | const | 50432 | Using where; Using index           |
+------+-------------+--------------+-------+------------------------------------------------------------------------------------------------------------------+--------------+---------+-------+-------+------------------------------------+

>>> AFTER (objectid = CAST(fileid AS CHAR)):
+------+-------------+--------------+-------+------------------------------------------------------------------------------------------------------------------+--------------+---------+------------+------+------------------------------------+
| id   | select_type | table        | type  | possible_keys                                                                                                    | key          | key_len | ref        | rows | Extra                              |
+------+-------------+--------------+-------+------------------------------------------------------------------------------------------------------------------+--------------+---------+------------+------+------------------------------------+
|    1 | SIMPLE      | filecache    | range | fs_storage_path_hash,fs_storage_mimetype,fs_storage_mimepart,fs_storage_size,fs_name_hash,fs_storage_path_prefix | fs_name_hash | 1003    | NULL       | 1111 | Using index condition; Using where |
|    1 | SIMPLE      | systemtagmap | ref   | PRIMARY,systag_by_objectid,systag_objecttype                                                                     | PRIMARY      | 516     | const,func | 1    | Using where; Using index           |
+------+-------------+--------------+-------+------------------------------------------------------------------------------------------------------------------+--------------+---------+------------+------+------------------------------------+

############################################################
## ANALYZE — ACTUAL rows read (r_rows) on the same search
############################################################

>>> BEFORE:
+------+-------------+--------------+-------+------------------------------------------------------------------------------------------------------------------+--------------+---------+-------+-------+-----------+----------+------------+------------------------------------+
| id   | select_type | table        | type  | possible_keys                                                                                                    | key          | key_len | ref   | rows  | r_rows    | filtered | r_filtered | Extra                              |
+------+-------------+--------------+-------+------------------------------------------------------------------------------------------------------------------+--------------+---------+-------+-------+-----------+----------+------------+------------------------------------+
|    1 | SIMPLE      | filecache    | range | fs_storage_path_hash,fs_storage_mimetype,fs_storage_mimepart,fs_storage_size,fs_name_hash,fs_storage_path_prefix | fs_name_hash | 1003    | NULL  | 1111  | 1111.00   |    50.00 |     100.00 | Using index condition; Using where |
|    1 | SIMPLE      | systemtagmap | ref   | PRIMARY,systag_by_objectid,systag_objecttype                                                                     | PRIMARY      | 258     | const | 50432 | 100000.00 |   100.00 |       0.00 | Using where; Using index           |
+------+-------------+--------------+-------+------------------------------------------------------------------------------------------------------------------+--------------+---------+-------+-------+-----------+----------+------------+------------------------------------+

>>> AFTER:
+------+-------------+--------------+-------+------------------------------------------------------------------------------------------------------------------+--------------+---------+------------+------+---------+----------+------------+------------------------------------+
| id   | select_type | table        | type  | possible_keys                                                                                                    | key          | key_len | ref        | rows | r_rows  | filtered | r_filtered | Extra                              |
+------+-------------+--------------+-------+------------------------------------------------------------------------------------------------------------------+--------------+---------+------------+------+---------+----------+------------+------------------------------------+
|    1 | SIMPLE      | filecache    | range | fs_storage_path_hash,fs_storage_mimetype,fs_storage_mimepart,fs_storage_size,fs_name_hash,fs_storage_path_prefix | fs_name_hash | 1003    | NULL       | 1111 | 1111.00 |    50.00 |     100.00 | Using index condition; Using where |
|    1 | SIMPLE      | systemtagmap | ref   | PRIMARY,systag_by_objectid,systag_objecttype                                                                     | PRIMARY      | 516     | const,func | 1    | 0.10    |   100.00 |     100.00 | Using where; Using index           |
+------+-------------+--------------+-------+------------------------------------------------------------------------------------------------------------------+--------------+---------+------------+------+---------+----------+------------+------------------------------------+

Checklist

AI (if applicable)

  • The content of this PR was partly or fully generated using AI

Signed-off-by: Git'Fellow <12234510+solracsf@users.noreply.github.com>
@solracsf solracsf added this to the Nextcloud 35 milestone Jun 2, 2026
@solracsf solracsf added 3. to review Waiting for reviews performance 🚀 feature: database Database related DB labels Jun 2, 2026
Signed-off-by: Git'Fellow <12234510+solracsf@users.noreply.github.com>
@solracsf solracsf marked this pull request as ready for review June 2, 2026 13:29
@solracsf solracsf requested a review from a team as a code owner June 2, 2026 13:29
@solracsf solracsf requested review from Altahrim, ArtificialOwl, come-nc and icewind1991 and removed request for a team June 2, 2026 13:29
@solracsf solracsf self-assigned this Jun 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

3. to review Waiting for reviews feature: database Database related DB performance 🚀

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant