[core] Support btree global index with embedded file metadata#7563
[core] Support btree global index with embedded file metadata#7563lilei1128 wants to merge 2 commits intoapache:masterfrom
Conversation
There was a problem hiding this comment.
Hi, thanks for this PR!
I'm just wondering that if it's necessary to reuse current BTree codebase? For example:
- in the btree index build topo, current implementation will decide the partition num by records number per range, and split ranges by partition, which may be not suitable for your case.
- And also, it seems that the
BTREE_WITH_FILE_METAoption will create a total different index type compared to BTree.
The "with-file-meta" is NOT a completely different index type. It's:
For first question, you're right that the parallelism logic is designed for key-index.
To skip manifest reads, we need two capabilities:
If we don't reuse BTree:
For capability 2, alternatives like manifest caching still require manifest That's why reusing BTree makes sense:
|
|
@lilei1128 |
| int partitionNum = Math.max((int) (range.count() / recordsPerRange), 1); | ||
| partitionNum = Math.min(partitionNum, maxParallelism); | ||
|
|
||
| // Pre-serialize ManifestEntries for file-meta index (if withFileMeta is enabled) |
There was a problem hiding this comment.
It seems that the key(i.e. the filename) is just used for deduplication? Can I image this index as actually a Range to Collection<ManifestEntry> index?
There was a problem hiding this comment.
Yes, currently fileName is mainly for deduplication; the runtime path does not do fileName point lookup yet.
This is an optimization point that follows
|
Hi @lilei1128 , thanks for the contribution! Do you have some benchmark on this PR? I am curious about the performance comparison between file meta based and rowid based in big data scenario. |
Hi, This is my test result on mac: Range queries perform better than point queries and the effect would be better if the data were on OSS/S3. Appendix:
val spark = SparkSession.builder().appName("GlobalIndexPerfTest").getOrCreate() println("=" * 60) // ==================== 配置参数 ==================== // ==================== 1. 创建测试表 ==================== // 删除旧表 // 无索引表 // 普通 btree 索引表 // with-file-meta 索引表 println(s"Created 3 test tables") // ==================== 2. 写入数据 ==================== val writeStartTime = System.currentTimeMillis() } val writeEndTime = System.currentTimeMillis() // ==================== 4. 创建索引 ==================== println("Creating btree index on perf_test_btree...") println("Creating btree index with file-meta on perf_test_with_meta...") println("Indexes created successfully") // ==================== 5. 数据统计 ==================== val countNoIndex = spark.sql("SELECT COUNT() FROM perf_test_no_index").collect()(0)(0) println(s"Total records: $countNoIndex") // ==================== 6. 性能测试 ==================== // 生成随机查询值 case class QueryResult( def runQuery(table: String, queryId: Int): (Long, Long) = { def runBenchmark(tableName: String, scenario: String): Seq[QueryResult] = { } // 执行测试 // ==================== 7. 结果分析 ==================== case class Statistics( def calcStats(results: Seq[QueryResult]): Statistics = { } val statsNoIndex = calcStats(resultsNoIndex) def printStats(stats: Statistics, baseline: Statistics): Unit = { | ${stats.scenario}$improvement printStats(statsNoIndex, null) // 性能提升对比 val improvementBtree = (statsNoIndex.avgMs - statsBtree.avgMs) / statsNoIndex.avgMs * 100 println(f""" // ==================== 8. 保存结果 ==================== val reportFile = new java.io.PrintWriter(new java.io.File("/tmp/global_index_perf_report.txt")) println(s"Report saved to: /tmp/global_index_perf_report.txt") // ==================== 完成 ====================
val spark = SparkSession.builder().appName("GlobalIndexQueryTest").getOrCreate() println("=" * 60) // ==================== 配置参数 ==================== // ==================== 1. 表信息确认 ==================== spark.sql("USE paimon") val countNoIndex = spark.sql("SELECT COUNT(*) FROM perf_test_no_index").collect()(0)(0) // ==================== 2. 预热查询 ==================== for (i <- 1 to NUM_WARMUP) { // ==================== 3. 性能测试 ==================== val maxId = countNoIndex.asInstanceOf[Long] case class QueryResult( def runQuery(table: String, queryId: Int): (Long, Long) = { def runBenchmark(tableName: String, scenario: String): Seq[QueryResult] = { } // 执行测试 // ==================== 4. 结果分析 ==================== case class Statistics( def calcStats(results: Seq[QueryResult]): Statistics = { } val statsNoIndex = calcStats(resultsNoIndex) def printStats(stats: Statistics, baseline: Statistics): Unit = { | ${stats.scenario}$improvement printStats(statsNoIndex, null) // 性能提升对比 val improvementBtree = (statsNoIndex.avgMs - statsBtree.avgMs) / statsNoIndex.avgMs * 100 println(f""" // ==================== 5. 范围查询测试 ==================== def runRangeQuery(table: String, low: Int, high: Int): (Long, Long) = { val rangeTests = Seq( println("\nRange Query Results:") rangeTests.foreach { case (low, high) => } // ==================== 完成 ==================== |

Purpose
Add with-file-meta option for btree global index to embed ManifestEntry
data directly in index files, enabling manifest-skip query planning.
Key changes:
manifest reads, with staleness detection via fileIO.exists()
When enabled, query planning reads only:
See: https://cwiki.apache.org/confluence/display/PAIMON/PIP-41%3A+Introduce+FilePath+Global+Index+And+Optimizations+For+Lookup+In+Append+Table
Tests
CI