| name | validate-connector |
|---|---|
| description | Audit an existing Sim knowledge base connector against the service API docs and repository conventions, then report and fix issues in auth, config fields, pagination, document mapping, tags, and registry entries. Use when validating or repairing code in `apps/sim/connectors/{service}/`. |
You are an expert auditor for Sim knowledge base connectors. Your job is to thoroughly validate that an existing connector is correct, complete, and follows all conventions.
When the user asks you to validate a connector:
- Read the service's API documentation (via Context7 or WebFetch)
- Read the connector implementation, OAuth config, and registry entries
- Cross-reference everything against the API docs and Sim conventions
- Report all issues found, grouped by severity (critical, warning, suggestion)
- Fix all issues after reporting them
Read every file for the connector — do not skip any:
apps/sim/connectors/{service}/{service}.ts # Connector implementation
apps/sim/connectors/{service}/index.ts # Barrel export
apps/sim/connectors/registry.ts # Connector registry entry
apps/sim/connectors/types.ts # ConnectorConfig interface, ExternalDocument, etc.
apps/sim/connectors/utils.ts # Shared utilities (computeContentHash, htmlToPlainText, etc.)
apps/sim/lib/oauth/oauth.ts # OAUTH_PROVIDERS — single source of truth for scopes
apps/sim/lib/oauth/utils.ts # getCanonicalScopesForProvider, getScopesForService, SCOPE_DESCRIPTIONS
apps/sim/lib/oauth/types.ts # OAuthService union type
apps/sim/components/icons.tsx # Icon definition for the service
If the connector uses selectors, also read:
apps/sim/hooks/selectors/registry.ts # Selector key definitions
apps/sim/hooks/selectors/types.ts # SelectorKey union type
apps/sim/lib/workflows/subblocks/context.ts # SELECTOR_CONTEXT_FIELDS
Fetch the official API docs for the service. This is the source of truth for:
- Endpoint URLs, HTTP methods, and auth headers
- Required vs optional parameters
- Parameter types and allowed values
- Response shapes and field names
- Pagination patterns (cursor, offset, next token)
- Rate limits and error formats
- OAuth scopes and their meanings
Use Context7 (resolve-library-id → query-docs) or WebFetch to retrieve documentation. If both fail, note which claims are based on training knowledge vs verified docs.
For every API call in the connector (listDocuments, getDocument, validateConfig, and any helper functions), verify against the API docs:
- Base URL is correct for the service's API version
- Endpoint paths match the API docs exactly
- HTTP method is correct (GET, POST, PUT, PATCH, DELETE)
- Path parameters are correctly interpolated and URI-encoded where needed
- Query parameters use correct names and formats per the API docs
- Authorization header uses the correct format:
- OAuth:
Authorization: Bearer ${accessToken} - API Key: correct header name per the service's docs
- OAuth:
-
Content-Typeis set for POST/PUT/PATCH requests - Any service-specific headers are present (e.g.,
Notion-Version,Dropbox-API-Arg) - No headers are sent that the API doesn't support or silently ignores
- POST/PUT body fields match API parameter names exactly
- Required fields are always sent
- Optional fields are conditionally included (not sent as
nullor empty unless the API expects that) - Field value types match API expectations (string vs number vs boolean)
- User-controlled values interpolated into query strings are properly escaped:
- OData
$filter: single quotes escaped with''(e.g.,externalId.replace(/'/g, "''")) - SOQL: single quotes escaped with
\' - GraphQL variables: passed as variables, not interpolated into query strings
- URL path segments:
encodeURIComponent()applied
- OData
- URL-type config fields (e.g.,
siteUrl,instanceUrl) are normalized:- Strip
https:///http://prefix if the API expects bare domains - Strip trailing
/ - Apply
.trim()before validation
- Strip
- Response structure is correctly traversed (e.g.,
data.resultsvsdata.itemsvsdata) - Field names extracted match what the API actually returns
- Nullable fields are handled with
?? nullor|| undefined - Error responses are checked before accessing data fields
Scopes must be correctly declared and sufficient for all API calls the connector makes.
-
requiredScopesin the connector'sauthconfig lists all scopes needed by the connector - Each scope in
requiredScopesis a real, valid scope recognized by the service's API - No invalid, deprecated, or made-up scopes are listed
- No unnecessary excess scopes beyond what the connector actually needs
- Every scope in
requiredScopesexists in the OAuth provider'sscopesarray inlib/oauth/oauth.ts - Find the provider in
OAUTH_PROVIDERS[providerGroup].services[serviceId].scopes - Verify:
requiredScopes⊆OAUTH_PROVIDERS scopes(every required scope is present in the provider config) - If a required scope is NOT in the provider config, flag as critical — the connector will fail at runtime
For each API endpoint the connector calls:
- Identify which scopes are required per the API docs
- Verify those scopes are included in the connector's
requiredScopes - If the connector calls endpoints requiring scopes not in
requiredScopes, flag as warning
- Check the
getOAuthTokenRefreshConfigfunction inlib/oauth/oauth.tsfor this provider -
useBasicAuthmatches the service's token exchange requirements -
supportsRefreshTokenRotationmatches whether the service issues rotating refresh tokens - Token endpoint URL is correct
- Cursor/pagination parameter name matches the API docs
- Response pagination field is correctly extracted (e.g.,
next_cursor,nextPageToken,@odata.nextLink,offset) -
hasMoreis correctly determined from the response -
nextCursoris correctly passed back for the next page -
maxItems/maxRecordscap is correctly applied across pages usingsyncContext.totalDocsFetched - Page size is within the API's allowed range (not exceeding max page size)
- Last page precision: when a
maxItemscap exists, the final page request usesMath.min(PAGE_SIZE, remaining)to avoid fetching more records than needed - No off-by-one errors in pagination tracking
- The connector does NOT hit known API pagination limits silently (e.g., HubSpot search 10k cap)
-
syncContextis used to cache state across pages (user names, field maps, instance URLs, portal IDs, etc.) - Cached state in
syncContextis correctly initialized on first page and reused on subsequent pages
Connectors that require per-document API calls to fetch content (file download, export, blocks fetch) MUST use contentDeferred: true. This is the standard pattern for reliability — without it, content downloads during listing can exhaust the sync task's time budget before any documents are saved.
- If the connector downloads content per-doc during
listDocuments, it MUST usecontentDeferred: trueinstead -
listDocumentsreturns lightweight stubs withcontent: ''andcontentDeferred: true -
getDocumentfetches actual content and returns the full document withcontentDeferred: false - A shared stub function (e.g.,
fileToStub) is used by bothlistDocumentsandgetDocumentto guaranteecontentHashconsistency -
contentHashis metadata-based (e.g.,service:{id}:{modifiedTime}), NOT content-based — it must be derivable from list metadata alone - The
contentHashis identical whether produced bylistDocumentsorgetDocument
Connectors where the list API already returns content inline (e.g., Slack messages, Reddit posts) do NOT need contentDeferred.
-
externalIdis a stable, unique identifier from the source API -
titleis extracted from the correct field and has a sensible fallback (e.g.,'Untitled') -
contentis plain text — HTML content is stripped usinghtmlToPlainTextfrom@/connectors/utils -
mimeTypeis'text/plain' -
contentHashuses a metadata-based format (e.g.,service:{id}:{modifiedTime}) for connectors withcontentDeferred: true, orcomputeContentHashfrom@/connectors/utilsfor inline-content connectors -
sourceUrlis a valid, complete URL back to the original resource (not relative) -
metadatacontains all fields referenced bymapTagsandtagDefinitions
- Rich text / HTML fields are converted to plain text before indexing
- Important content is not silently dropped (e.g., nested blocks, table cells, code blocks)
- Content is not silently truncated without logging a warning
- Empty/blank documents are properly filtered out
- Size checks use
Buffer.byteLength(text, 'utf8')nottext.lengthwhen comparing against byte-based limits (e.g.,MAX_FILE_SIZEin bytes)
- Each
tagDefinitionhas anid,displayName, andfieldType -
fieldTypematches the actual data type:'text'for strings,'number'for numbers,'date'for dates,'boolean'for booleans - Every
idintagDefinitionsis returned bymapTags - No
tagDefinitionreferences a field thatmapTagsnever produces
- Return keys match
tagDefinitionidvalues exactly - Date values are properly parsed using
parseTagDatefrom@/connectors/utils - Array values are properly joined using
joinTagArrayfrom@/connectors/utils - Number values are validated (not
NaN) - Metadata field names accessed in
mapTagsmatch whatlistDocuments/getDocumentstore inmetadata
- Every field has
id,title,type -
requiredis set explicitly (not omitted) - Dropdown fields have
optionswithlabelandidfor each option - Selector fields follow the canonical pair pattern:
- A
type: 'selector'field withselectorKey,canonicalParamId,mode: 'basic' - A
type: 'short-input'field with the samecanonicalParamId,mode: 'advanced' requiredis identical on both fields in the pair
- A
-
selectorKeyvalues exist in the selector registry -
dependsOnreferences selector fieldidvalues, notcanonicalParamId
- Validates all required fields are present before making API calls
- Validates optional numeric fields (checks
Number.isNaN, positive values) - Makes a lightweight API call to verify access (e.g., fetch 1 record, get profile)
- Uses
VALIDATE_RETRY_OPTIONSfor retry budget - Returns
{ valid: true }on success - Returns
{ valid: false, error: 'descriptive message' }on failure - Catches exceptions and returns user-friendly error messages
- Does NOT make expensive calls (full data listing, large queries)
- Fetches a single document by
externalId - Returns
nullfor 404 / not found (does not throw) - Returns the same
ExternalDocumentshape aslistDocuments - If
listDocumentsusescontentDeferred: true,getDocumentMUST fetch actual content and returncontentDeferred: false - If
listDocumentsusescontentDeferred: true,getDocumentMUST use the same stub function to ensurecontentHashis identical - Handles all content types that
listDocumentscan produce (e.g., iflistDocumentsreturns both pages and blogposts,getDocumentmust handle both — not hardcode one endpoint) - Forwards
syncContextif it needs cached state (user names, field maps, etc.) - Error handling is graceful (catches, logs, returns null or throws with context)
- Does not redundantly re-fetch data already included in the initial API response (e.g., if comments come back with the post, don't fetch them again separately)
- All external API calls use
fetchWithRetryfrom@/lib/knowledge/documents/utils - No raw
fetch()calls to external APIs -
VALIDATE_RETRY_OPTIONSused invalidateConfig - If
validateConfigcalls a shared helper (e.g.,linearGraphQL,resolveId), that helper must accept and forwardretryOptionstofetchWithRetry - Default retry options used in
listDocuments/getDocument
- APIs that support field selection (e.g.,
$select,sysparm_fields,fields) should request only the fields the connector needs — in bothlistDocumentsANDgetDocument - No redundant API calls: if a helper already fetches data (e.g., site metadata), callers should reuse the result instead of making a second call for the same information
- Sequential per-item API calls (fetching details for each document in a loop) should be batched with
Promise.alland a concurrency limit of 3-5
- Individual document failures are caught and logged without aborting the sync
- API error responses include status codes in error messages
- No unhandled promise rejections in concurrent operations
- Concurrent API calls use reasonable batch sizes (3-5 is typical)
- No unbounded
Promise.allover large arrays
- Uses
createLoggerfrom@sim/logger(notconsole.log) - Logs sync progress at
infolevel - Logs errors at
warnorerrorlevel with context
- Connector is exported from
connectors/{service}/index.ts - Connector is registered in
connectors/registry.ts - Registry key matches the connector's
idfield
Group findings by severity:
Critical (will cause runtime errors, data loss, or auth failures):
- Wrong API endpoint URL or HTTP method
- Invalid or missing OAuth scopes (not in provider config)
- Incorrect response field mapping (accessing wrong path)
- SOQL/query fields that don't exist on the target object
- Pagination that silently hits undocumented API limits
- Missing error handling that would crash the sync
requiredScopesnot a subset of OAuth provider scopes- Query/filter injection: user-controlled values interpolated into OData
$filter, SOQL, or query strings without escaping - Per-document content download in
listDocumentswithoutcontentDeferred: true— causes sync timeouts for large document sets contentHashmismatch betweenlistDocumentsstub andgetDocumentreturn — causes unnecessary re-processing every sync
Warning (incorrect behavior, data quality issues, or convention violations):
- HTML content not stripped via
htmlToPlainText getDocumentnot forwardingsyncContextgetDocumenthardcoded to one content type whenlistDocumentsreturns multiple (e.g., only pages but not blogposts)- Missing
tagDefinitionfor metadata fields returned bymapTags - Incorrect
useBasicAuthorsupportsRefreshTokenRotationin token refresh config - Invalid scope names that the API doesn't recognize (even if silently ignored)
- Private resources excluded from name-based lookup despite scopes being available
- Silent data truncation without logging
- Size checks using
text.length(character count) instead ofBuffer.byteLength(byte count) for byte-based limits - URL-type config fields not normalized (protocol prefix, trailing slashes cause API failures)
VALIDATE_RETRY_OPTIONSnot threaded through helper functions called byvalidateConfig
Suggestion (minor improvements):
- Missing incremental sync support despite API supporting it
- Overly broad scopes that could be narrowed (not wrong, but could be tighter)
- Source URL format could be more specific
- Missing
orderByfor deterministic pagination - Redundant API calls that could be cached in
syncContext - Sequential per-item API calls that could be batched with
Promise.all(concurrency 3-5) - API supports field selection but connector fetches all fields (e.g., missing
$select,sysparm_fields,fields) getDocumentre-fetches data already included in the initial API response (e.g., comments returned with post)- Last page of pagination requests full
PAGE_SIZEwhen fewer records remain (Math.min(PAGE_SIZE, remaining))
After reporting, fix every critical and warning issue. Apply suggestions where they don't add unnecessary complexity.
After fixing, confirm:
bun run lintpasses- TypeScript compiles clean
- Re-read all modified files to verify fixes are correct
- Read connector implementation, types, utils, registry, and OAuth config
- Pulled and read official API documentation for the service
- Validated every API endpoint URL, method, headers, and body against API docs
- Validated input sanitization: no query/filter injection, URL fields normalized
- Validated OAuth scopes:
requiredScopes⊆ OAuth providerscopesinoauth.ts - Validated each scope is real and recognized by the service's API
- Validated scopes are sufficient for all API endpoints the connector calls
- Validated token refresh config (
useBasicAuth,supportsRefreshTokenRotation) - Validated pagination: cursor names, page sizes, hasMore logic, no silent caps
- Validated content deferral:
contentDeferred: trueused when per-doc content fetch required, metadata-basedcontentHashconsistent between stub andgetDocument - Validated data transformation: plain text extraction, HTML stripping, content hashing
- Validated tag definitions match mapTags output, correct fieldTypes
- Validated config fields: canonical pairs, selector keys, required flags
- Validated validateConfig: lightweight check, error messages, retry options
- Validated getDocument: null on 404, all content types handled, no redundant re-fetches, syncContext forwarding
- Validated fetchWithRetry used for all external calls (no raw fetch), VALIDATE_RETRY_OPTIONS threaded through helpers
- Validated API efficiency: field selection used, no redundant calls, sequential fetches batched
- Validated error handling: graceful failures, no unhandled rejections
- Validated logging: createLogger, no console.log
- Validated registry: correct export, correct key
- Reported all issues grouped by severity
- Fixed all critical and warning issues
- Ran
bun run lintafter fixes - Verified TypeScript compiles clean