Skip to content

FF: restructure and edit#3312

Draft
kaitlynmichael wants to merge 13 commits into
mainfrom
DOC-6576
Draft

FF: restructure and edit#3312
kaitlynmichael wants to merge 13 commits into
mainfrom
DOC-6576

Conversation

@kaitlynmichael
Copy link
Copy Markdown
Contributor

@kaitlynmichael kaitlynmichael commented May 12, 2026

Note

Low Risk
Low-risk documentation-only restructure, but several new pages contain empty front matter (e.g., missing title/linkTitle) which could affect site navigation/build output.

Overview
Moves the provider/secret-provider/workspace CLI documentation out of streaming.md into new dedicated pages: a full register-providers.md guide and a smaller manage-workspace.md reference.

Adds several new FeatureForm docs stubs (concepts.md, configure-auth.md, define-and-deploy-features.md, query-data.md, reference.md, serve-features.md, update-features.md) that currently only include front matter placeholders, and leaves streaming.md effectively empty aside from its header.

Reviewed by Cursor Bugbot for commit b5c5666. Bugbot is set up for automated code reviews on this repo. Configure here.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 12, 2026

DOC-6576

@kaitlynmichael kaitlynmichael marked this pull request as draft May 12, 2026 17:51
@jit-ci
Copy link
Copy Markdown

jit-ci Bot commented May 12, 2026

🛡️ Jit Security Scan Results

CRITICAL HIGH MEDIUM

✅ No security findings were detected in this PR


Security scan by Jit

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit b5c5666. Configure here.

---

Redis Feature Form supports multiple providers, secrets provider management, and workspaces.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Emptied page leaves stale title and broken link

Medium Severity

All body content was moved out of streaming.md to register-providers.md and manage-workspace.md, but the frontmatter still says title: Providers and workspaces and description: Build stream-backed features with Kafka, streaming transformations, and Redis serving.. This renders as an empty page with a misleading title in site navigation. Additionally, overview.md links to this file as "Connect providers" — that link now lands on a blank page. The frontmatter needs updating (or the file needs a redirect/removal).

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit b5c5666. Configure here.

description:
linkTitle:
weight: 1
---
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Page with content has empty frontmatter fields

Medium Severity

manage-workspace.md contains real body content (workspace management commands moved from streaming.md) but its title, description, and linkTitle frontmatter fields are all blank. Unlike the other new placeholder files that are intentionally empty stubs, this page has substantive content. The missing frontmatter means it will render with no page title in navigation and no heading, and register-providers.md already links to it.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit b5c5666. Configure here.

Comment thread content/develop/ai/featureform/register-providers.md Outdated
Copy link
Copy Markdown
Collaborator

@dwdougherty dwdougherty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from the issues already identified by Bugbot, LGTM.

Comment thread content/develop/ai/featureform/manage-workspace.md Outdated
Comment thread content/develop/ai/featureform/register-providers.md Outdated
Comment thread content/develop/ai/featureform/register-providers.md Outdated
Comment thread content/develop/ai/featureform/register-providers.md Outdated
Comment thread content/develop/ai/featureform/register-providers.md Outdated
Comment thread content/develop/ai/featureform/register-providers.md Outdated
Comment thread content/develop/ai/featureform/register-providers.md Outdated
Comment thread content/develop/ai/featureform/register-providers.md Outdated
@kaitlynmichael kaitlynmichael requested a review from epps May 14, 2026 19:28

A workspace is a self-contained environment in Feature Form. Each one owns its own resource graph, providers, secret references, and serving metadata. Nothing is shared between workspaces.

Use workspaces to keep environments such as dev, staging, and prod separate, or to give independent teams their own area on a shared deployment. Two workspaces can connect to the same external Postgres database and remain fully isolated, because each workspace tracks its own resources.
Copy link
Copy Markdown
Contributor

@epps epps Jun 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I caution against the phrasing "... their own area or shared deployment ..." because while it's correct, I wonder if it would be better to make clear exactly how this works.

A more apt framing would be something to the effect of a workspace can be viewed as a single tenant inside of a multi-tenant system à la SaaS architecture. This makes it clearer that compute resources allocated to a single deployment of Feature Form are shared across workspaces even if workspaces have the effect of making it appear as if they are a separate deployment (e.g. everything is scoped to workspaces, included RBAC bindings, secret providers, providers, etc.).


To create, inspect, update, or delete workspaces, see [Manage workspaces]({{< relref "/develop/ai/featureform/manage-workspace" >}}).

## The resource graph
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You decide if you think it's necessary to mention this at the concepts level, but it might be helpful to explain that Feature Form's planner converts the resource graph—which is a logical, declarative representation of a workspace's feature engineering pipeline—into a task DAG that is executed by Feature Form such that the actual state of the workspace is reflected in the users' data infrastructure (e.g. Snowflake).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also the Feature Form catalog to mention on this point: it shows where logical resources (e.g. datasets) exist as physical tables in providers (e.g. Snowflake (offline provider) and/or Redis (online provider).


A graph is built from seven resource types. New users encountering Feature Form for the first time benefit from learning these as a vocabulary list — every other concept on this page builds on them.

- **Entities** identify the real-world objects features describe, such as a `customer` or `order`. Other resources join on the entity's key column.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might make it clear that "... entity's key column ..." isn't a column name per se but rather the primary key (PK) of the entity regardless of table naming conventions.

For example, the "customer" entity in the example below could have column names in various tables (e.g. customer_id, user_id, id, etc.) but so long as the values in these columns all points to the same business entity, Feature Form will use the column names provided to manage joining datasets, feature views, etc. to create a unified view of the entity in whatever resource references it.

A graph is built from seven resource types. New users encountering Feature Form for the first time benefit from learning these as a vocabulary list — every other concept on this page builds on them.

- **Entities** identify the real-world objects features describe, such as a `customer` or `order`. Other resources join on the entity's key column.
- **Datasets** point at an existing table, view, or file on an offline store and make it visible to the graph. The data itself stays where it lives; Feature Form just registers a handle to it.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"... in an offline store ..." sounds more common than "... on an offline store ..."


- **Entities** identify the real-world objects features describe, such as a `customer` or `order`. Other resources join on the entity's key column.
- **Datasets** point at an existing table, view, or file on an offline store and make it visible to the graph. The data itself stays where it lives; Feature Form just registers a handle to it.
- **Transformations** produce new datasets from existing ones, expressed as SQL or as a Spark job. A transformation describes the shape of the output; the compute that runs it is supplied by a provider.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"... or PySpark ..." is more accurate given SQL and Python (i.e. PySpark) are the 2 query languages Feature Form supports.

Currently, SQL is the only query language Feature Form supports as PySpark is currently under new development, so perhaps we should mark PySpark as something that will be included in a feature release.

Comment on lines +42 to +45
- **Features** are entity-keyed values that get served at inference time. A feature attaches to a column of a dataset, optionally applies an aggregation (such as `SUM` over a 7-day window), and declares which provider owns its computation.
- **Labels** look like features but feed offline training rather than online serving. They carry the value a model is trying to predict.
- **Training sets** join one or more features with a label on the entity key, so an offline training job reads a single time-aligned table instead of stitching things together by hand.
- **Feature views** are the online serving interface for a group of features. They are the only resource that downstream applications and model services interact with directly.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should tighten the wording around these 4 resource types and how they relate:

  • Feature - The most common definition of a machine learning feature is "... an individual measurable property or characteristic of a data set ...". I would reword this to align better with Feature Form's domain vocabulary and say "A feature in machine learning is a measurable characteristic of an entity used as an input to a model" or alternatively "A feature is a named, measurable characteristic of an entity, computed from one or more data sources and used as input to machine learning models."
  • Label - I'm not convinced the "... look[s] like features but feed offline training ..." best defines labels in the context of Feature Form's domain model. I might suggest instead "A label in machine learning is the known target value or outcome associated with a training example, which a model is trained to predict." Labels are exclusively used in model training as they are the dataset responsible for grounding a model's prediction based on its input features in the reality of the outcome the model is learning to predict.
  • Training set - This definition is nearly perfect (i.e. it conveys that it represents a single dataset derived from N features and 1 label for the purpose of training a model), but I would insist on some tweaks given how crucial the temporal aspect of training sets are: "Training sets join one or more features with a label by entity key and time, so an offline training job reads a single point-in-time-correct table instead of stitching inputs together by hand." The "point-in-time correct" part is crucial because data leakage (i.e. features that don't actually align with their label value's time) leads to models that are trained on events that never actually occurred in reality.
  • Feature View - I suggest something along the following lines: "Feature views group related features for an entity behind a shared definition, materialization policy, and serving contract. They are the primary interface model-serving systems use to retrieve feature values for inference in production."

)
```

### Definitions files and `ff apply` {#definitions-files-and-ff-apply}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might mention that ff is the Feature Form CLI tool that supports all the operations necessary for creating/updating:

  • RBAC role bindings
  • workspaces
  • resources
  • serving
  • etc.


To register providers in a workspace, see [Register providers]({{< relref "/develop/ai/featureform/register-providers" >}}).

## Secrets and secret references
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could make the case for moving this section before the provider section given logically secret-provider registration is a prerequisite for provider registration.


## Secrets and secret references

Feature Form never stores plaintext credentials in the graph. A provider configuration carries a secret reference. Feature Form resolves it through a registered secret provider when the credential is needed.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: "Feature Form never stores credentials in any form in a workspace's graph metadata; instead provider configuration carries a secret reference only." Or something to that effect. I just want it to be clear that we're not saying, _"Feature Form hashes credentials with a secure salt to keep credentials safe ..." and instead are making it clear that Feature Form exclusively relies on references to secrets that must be configured and managed by users.

@@ -0,0 +1,142 @@
---
title: Manage workspaces
description: Create, verify access to, monitor, and delete Redis Feature Form workspaces with the ff CLI.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this page is meant to only show the CLI, but if that isn't the intention, I might at least mention that the dashboard also has the ability to managed workspaces.

The tasks on this page require one of two roles:

- A global admin (`global_admin`) creates workspaces and grants access.
- A workspace admin (`workspace_admin`) verifies their access, runs health checks, and updates or deletes the workspace.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're currently debating removing the workspace.delete permission from workspace_admin, so I might advise we remove "... or deletes ..." here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants