Skip to content

feat(extend): add Extend AI document processing integration#3869

Merged
waleedlatif1 merged 6 commits intostagingfrom
waleedlatif1/add-extend-integration
Mar 31, 2026
Merged

feat(extend): add Extend AI document processing integration#3869
waleedlatif1 merged 6 commits intostagingfrom
waleedlatif1/add-extend-integration

Conversation

@waleedlatif1
Copy link
Copy Markdown
Collaborator

Summary

  • Add Extend AI integration for document processing (parse/extract)
  • Supports PDF, images, and Office documents via file upload or URL
  • V1 (legacy) and V2 block variants following Reducto/Pulse patterns
  • Internal API route with DNS validation, secure fetch, file resolution
  • Configurable output format, chunking strategy, and engine selection

Type of Change

  • New feature

Testing

Tested manually

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

@vercel
Copy link
Copy Markdown

vercel bot commented Mar 31, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
docs Ready Ready Preview, Comment Mar 31, 2026 11:17pm

Request Review

@cursor
Copy link
Copy Markdown

cursor bot commented Mar 31, 2026

PR Summary

Medium Risk
Medium risk because it introduces a new authenticated API route that resolves user-provided file inputs/URLs and calls an external service, which can impact security (SSRF/file access) and reliability despite added DNS/IP pinning validations.

Overview
Adds a new Extend AI integration for document parsing, including a new ExtendIcon, docs page, and inclusion in docs/landing icon mappings and integrations metadata.

Introduces new extend and extend_v2 blocks plus extend_parser/extend_parser_v2 tools, wiring them into the block/tool registries and defining inputs for document upload/URL (v1) vs file/reference (v2) along with output format, chunking, engine, and API key.

Adds an internal Next.js API route POST /api/tools/extend/parse that authenticates requests, resolves the provided file input to a URL, validates the Extend endpoint via DNS, and proxies the request via pinned-IP fetch while normalizing the response into id/status/chunks/blocks/pageCount/creditsUsed.

Written by Cursor Bugbot for commit eb90d96. Configure here.

@waleedlatif1 waleedlatif1 force-pushed the waleedlatif1/add-extend-integration branch from 2928c5e to 2634fdb Compare March 31, 2026 23:03
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 31, 2026

Greptile Summary

This PR adds a new Extend AI document processing integration, following the established Reducto/Pulse patterns in the codebase. It introduces a V1 (hidden, legacy) block supporting both file upload and URL input, and a V2 (user-facing) block that supports file upload in basic mode and chained block file references in advanced mode. The implementation includes a secure internal API route with DNS validation and pinned-IP fetch, Zod request validation, structured error surfacing, and proper tool/block registry entries.

Key changes:

  • apps/sim/app/api/tools/extend/parse/route.ts: New internal route with auth, DNS validation, and error surfacing — mirrors the Reducto route cleanly.
  • apps/sim/blocks/blocks/extend.ts: V1 (hidden) + V2 (visible) block configs; V2 derives its subBlocks from V1 by replacing the URL input with a file-reference input.
  • apps/sim/tools/extend/parser.ts: V1 and V2 tool configs with correct request body construction and transformResponse wiring.
  • apps/docs/content/docs/en/tools/extend.mdx: Documentation page — the Output section incorrectly states "This tool does not produce any outputs" while the tool returns id, status, chunks, blocks, pageCount, and creditsUsed.

Confidence Score: 5/5

Safe to merge — all functional code is correct; the only finding is a documentation output section that needs updating.

All prior P1 concerns (error surfacing, icon consistency, V2 placeholder clarity) were addressed in 4830a4c. The remaining finding is a P2 documentation inaccuracy in the output section of extend.mdx, which does not affect runtime behavior.

apps/docs/content/docs/en/tools/extend.mdx — output section needs to list actual tool outputs.

Important Files Changed

Filename Overview
apps/sim/app/api/tools/extend/parse/route.ts Internal API route for Extend parse; mirrors Reducto pattern with DNS validation, pinned-IP fetch, Zod schema validation, and proper error surfacing.
apps/sim/tools/extend/parser.ts Defines V1 and V2 tool configs; V2 spreads V1 and overrides params/request; UserFile shape is compatible with RawFileInputSchema validation in the route.
apps/sim/blocks/blocks/extend.ts V1 block (hidden) supports file upload and URL; V2 block (visible) derives subBlocks from V1 removing the URL sub-block and adding a file-reference input for chaining.
apps/sim/tools/extend/types.ts Well-typed interfaces for V1/V2 input and parser output; UserFile used for V2 file param which is compatible with the route's RawFileInputSchema.
apps/docs/content/docs/en/tools/extend.mdx Documentation references the correct block type (extend_v2) and inputs, but the Output section incorrectly states the tool produces no outputs.

Sequence Diagram

sequenceDiagram
    participant Block as ExtendV2Block
    participant Tool as extendParserV2Tool
    participant Route as /api/tools/extend/parse
    participant FileUtils as resolveFileInputToUrl
    participant DNS as validateUrlWithDNS
    participant Extend as api.extend.ai/parse

    Block->>Tool: params (file, apiKey, outputFormat, chunking, engine)
    Tool->>Route: POST {apiKey, file, outputFormat?, chunking?, engine?}
    Route->>Route: checkInternalAuth
    Route->>FileUtils: resolveFileInputToUrl(file | filePath, userId)
    FileUtils-->>Route: fileUrl (signed/resolved URL)
    Route->>DNS: validateUrlWithDNS("https://api.extend.ai/parse")
    DNS-->>Route: resolvedIP
    Route->>Extend: POST {file:{fileUrl}, config?} Bearer apiKey (pinned IP)
    Extend-->>Route: {id, status, chunks, blocks, pageCount, creditsUsed}
    Route-->>Tool: {success: true, output: {...}}
    Tool-->>Block: transformResponse → ExtendParserOutput
Loading

Reviews (3): Last reviewed commit: "lint" | Re-trigger Greptile

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

@waleedlatif1 waleedlatif1 merged commit 72e28ba into staging Mar 31, 2026
12 checks passed
@waleedlatif1 waleedlatif1 deleted the waleedlatif1/add-extend-integration branch March 31, 2026 23:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant