[RFC]: Expanding the Vega engine to support the analysis of multidimensional data

### Full name

Sachin Pangal

### University status

Yes

### University name

Walchand Institute of Technology

### University program

Computer Science & Engineering

### Expected graduation

2027

### Short biography

I’m a third-year Computer Science undergrad who likes building things and figuring out how they actually work. I mainly work with JavaScript and Node.js, and through freelancing I’ve built production-level features that made me comfortable working on real systems.

Outside of tutorial hell, freelancing, and hackathons, I’ve spent a lot of time just writing code on my own, which has helped me focus on building cleaner and more scalable codebases.

I’ve worked with JavaScript and TypeScript, along with C/C++ and Python. I’ve also explored things like WebSockets, WebRTC, Turborepo, Next.js, and some DevOps practices. Over time, I’ve developed a habit of writing modular, clean code, usually structuring things in an MVC-style way so it stays maintainable.

### Timezone

Indian Standard Time (GMT+5:30)

### Contact details

email: sachinprogramming62@gmail.com, github: [Sachinn-64](https://github.com/Sachinn-64)

### Platform

Mac

### Editor

I prefer using VS Code because it’s clean and minimal, which helps me stay focused without unnecessary distractions. At the same time, it has a huge ecosystem of extensions, so I can easily customize it based on whatever I’m working on.

### Programming experience

I started coding around the age of 14, with Python as my first language. That early exposure got me curious about how things work under the hood. Later, I picked up C, which gave me a strong foundation in problem-solving. From there, I gradually moved into full-stack development, working with JavaScript and TypeScript to build end-to-end applications.

Here are some of the projects I’ve worked on:

1. [Trinetra](https://sachinn-64.github.io/Trinetra/): An AI-powered surveillance and crowd management system for large-scale events like Mahakumbh. It uses facial recognition, real-time GPS tracking, crowd density heatmaps, and disaster prediction to ensure safety. The platform includes a mobile app (SOS, ambulance booking, offline navigation) and an admin dashboard for monitoring and emergency coordination. Built with React Native, FastAPI, YOLO, FaceNet, Kafka, and Docker, it’s designed to scale to a large number of users.

2. [CureConnect](https://sachinn-64.github.io/CureConnect/): It is an AI-powered telemedicine platform designed for rural and low-bandwidth areas. It enables video consultations via WebRTC, offline chat with doctors, and uses machine learning to help analyze ECGs, X-rays, and skin conditions. It also includes a smart chatbot, IVR-based emergency support, medicine delivery, and a tool to check eligibility for government healthcare schemes, all in one place.

3. [Whisplore](https://github.com/Sachinn-64/Whisplore): It is a React Native app for discovering and sharing hidden gems in Gwalior like romantic rooftops, serene lakesides, and creative street corners. Users can explore spots via a GPS-based interactive map, upload photos, and rate places on vibe, safety, and uniqueness. Built with Node.js, MongoDB, and Cloudinary.


### JavaScript experience

JavaScript was one of the few things that just felt easy while learning. It made sense quickly, and I found myself actually enjoying working with it. I've used it across almost all my projects, from building robust APIs to full-stack applications. Going deeper into things like the V8 engine and contributing to open-source projects, including stdlib, only strengthened my foundation.

Async/await and Promises are among my preferred features, as they  they simplify asynchronous logic and make the code feel much more readable.

One limitation I’ve encountered is the lack of type safety. This can allow subtle bugs to slip in, but TypeScript addresses this effectively and has become a part of my workflow.

### Node.js experience

I’ve spent a good amount of time working with Node.js, mainly building backend systems and APIs that actually get used in real-world scenarios. I started with small projects, but over time they turned into more serious work, where I had to think beyond just getting things to work.

Some of the systems I’ve built handle 1000+ monthly users, which pushed me to focus more on writing reliable code, thinking about performance, and designing APIs that can scale without breaking under load. That shift from just building features to building systems that hold up in production has been a really valuable learning experience for me.

### C/Fortran experience

I picked up C, which gave me a strong foundation in problem-solving, data structures, and core concepts. Working with C also introduced me to areas like computer graphics, helping me understand low-level programming more deeply. I’ve been able to apply this knowledge in my contributions to stdlib, especially in stats-related modules.

I’m currently exploring Fortran and building a solid foundation, with the goal of improving further through future contributions.

### Interest in stdlib

I was naturally drawn to stdlib as its tech stack closely aligns with my background. Around the same time, a friend introduced me to the ecosystem, which encouraged me to explore it further. While doing so, I also came across a [podcast](https://www.inspiringcomputing.com/2107763/episodes/16662797-exploring-stdlib-javascript-s-answer-to-technical-computing) that sparked my interest in the project’s broader vision.

What stood out to me is the idea of bringing scientific computing into JavaScript extending it beyond typical web use into something much more powerful.

I found the ndarray support really interesting, since it makes working with multidimensional data much more efficient, something you don’t usually see in JavaScript. I also liked the built-in REPL with integrated help and examples. It makes experimenting and learning feel very natural, almost like having a small interactive numerical environment right inside JavaScript.

I also appreciate how active and supportive the maintainers are, and how structured and well-maintained the codebase is.

### Version control

Yes

### Contributions to stdlib

Here are all the contributions I’ve made so far -

**[Merged Work](https://github.com/stdlib-js/stdlib/pulls?q=is%3Apr+author%3ASachinn-64+is%3Amerged+)** 

- Added C and JS implementations to `stats/strided` routines
- Added JS implementations across `stats/*` modules
- Added JS implementations for `stats/base/ndarray`
- Added JS implementations for `stats/array`
- Implemented **`number/float64/base/sub3`** and **`complex/float32/base/add3`**
- Added tests and assertions for plot-related modules
- Refactored setters in the plot module
- Improved and updated documentation across `stats/*`
- Performed cleanup, refactoring, and fixed inconsistencies

[Open work
](https://github.com/stdlib-js/stdlib/pulls/Sachinn-64) 

- I have a few pull requests open—some are under review, and a few need changes.

[Reviewed](https://github.com/stdlib-js/stdlib/pulls?q=is%3Apr+involves%3ASachinn-64+-author%3ASachinn-64) Work

- I’ve also contributed by reviewing a few pull requests.

### stdlib showcase

[stdlib-data-playground](https://github.com/Sachinn-64/stdlib-showcase): I built a data playground web app where users can upload CSV, TSV, or JSON datasets, preview the data, and instantly compute numeric summaries using stdlib-js. It also lets you create interactive Vega-Lite charts, along with a Noisy Data Explorer that visualizes sine waves with Gaussian noise and moving mean smoothing rendered as a scattered plot. 

List of stdlib packages used: 

- `@stdlib/stats/base/mean`
- `@stdlib/stats/base/stdev`
- `@stdlib/stats/base/min`
- `@stdlib/stats/base/max`
- `@stdlib/random/base/normal`
- `@stdlib/ndarray/array`
- `@stdlib/stats/incr/mmean`

It is a simple data playground built with React, TypeScript, and an Express backend. You can upload CSV, TSV, or JSON files and instantly get stats like mean, median, standard-deviation using stdlib-js. It also includes a Noisy Data Explorer that generates sine wave data with Gaussian noise, applies smoothing, lets you visualize  and download it as PNG, SVG, or JSON via Vega-Lite.

Note : This is the deployed URL of the project [stdlib-showcase](https://stdlib-showcase-frontend.vercel.app/).  However, I recommend running it locally, as it’s deployed using free-tier services.

### Goals

**Abstract**

My goal is to implement the remaining Vega components required to programmatically generate commonly used chart types such as bar, line, scatter, histogram, and column charts. This includes building core elements like the value-reference system, essential mark types, a complete legend system, signal bindings for interactivity, layouts, and key data transforms such as binning, aggregation, filtering, and stacking. The end objective is to enable the transformation of raw data into complete Vega specifications for these charts using stdlib constructors.

**Main Goal**

- Implement the complete **value-reference system** that maps data fields to visual properties which is the foundation every mark encoding depends on.
- Implement all **11 specialized Vega mark types** `rect`, `line`, `symbol`, `rule`, `text`, `area`, `arc`, `path`, `shape`, `image`, `trail`, each inheriting from the existing `mark/base/ctor`, to enable bar charts, column charts, line charts, scatter plots, histograms.
- Implement a complete **legend system** `legend/ctor`, type/orientation enumerations with full `toJSON()` support, needed for scatter plots and multi-series charts.
- Implement the **essential data transforms** types: `bin`, `aggregate`, `extent` (histograms), `stack` (stacked bar/column), `collect` , `filter` , and `formula` , all inheriting from a new `transform/base/ctor`.
- Implement **`plot/vega/layout/ctor`** — Grid layout for group mark composition.
- Implement **signal bindings** (`signal/bind/ctor`) for interactive chart elements.
- Write **end-to-end chart examples** for all  target chart types (bar, column, line, scatter, histogram) demonstrating the full pipeline from data to Vega specification using only stdlib constructors.

**Supporting Goals**

- Implement corresponding **assertions** for all new constructors `is-value-reference`, `is-field-value`, `is-color-value`, `is-gradient-value`, `is-legend`, `is-legend-type`, `is-transform`, `is-binding`, etc.
- Add test’s, benchmarks, proper example’s to each new module.
- Add documentation for each new module.
- Improve test coverage and add benchmarks/examples for **existing** `plot/vega` packages.
- If time permits, contribute documentation improvements for existing plot packages.
- If time permits, I would also like to explore building a Plot CLI.

**_Note_**: Both the main and supporting goals can progress in parallel, while keeping the main goals as the primary focus

**Approach**

I’ll begin with **value references**, since they act as the bridge between raw data and visual output. Every visual property — like position, size, or color — depends on them, so building this layer first makes everything that follows much easier to structure.

Next, I’ll move on to **marks**, which define how data is actually drawn. Once marks are in place, I can then build **legends and layouts**, as they depend on marks to organize and explain the visuals.

Finally, I’ll implement **data transforms**. Since transforms operate on data before it reaches the rendering layer, they can be added once the visual pipeline is stable.

While planning, I also analyzed common chart types and identified shared dependencies between them. For instance, the **rect mark** and **aggregate transform** are used across bar, column, and histogram charts. By implementing these early, I can unlock multiple chart types together, making the development process more efficient and avoiding redundant work.

_Below, I’ve provided a table that outlines which modules are required for each chart type, giving a clear view of dependencies and implementation priorities._


Chart | Marks to Implement | Transforms to Implement | Other Modules
-- | -- | -- | --
Bar Chart | mark/rect/ctor | transform/aggregate/ctor, transform/stack/ctor | legend/ctor, layout/ctor, value/color/ctor
Column Chart | mark/rect/ctor | transform/aggregate/ctor | layout/ctor
Line Chart | mark/line/ctor | transform/collect/ctor | value/field/ctor
Scatter Plot | mark/symbol/ctor | transform/filter/ctor ,transform/loess/ctor | legend/ctor ,value/gradient/ctor
Histogram | mark/rect/ctor | transform/bin/ctor, transform/aggregate/ctor , transform/extent/ctor | value/base/ctor




### Why this project?

I became interested in stdlib’s plot work while exploring the plot/vega module, where I saw the potential to build Vega visualizations programmatically without writing raw JSON. This approach really stood out to me, as it brings a level of structured, programmatic visualization to JavaScript that isn’t commonly seen in the ecosystem.

What excites me most is the practicality charts like bar, line, scatter, and histograms are used everywhere, and each component directly enables these use cases. It’s also rewarding to see the output of this work directly, whether it’s a chart rendering on the screen or being used by real users.

Through my contributions so far, working on the axis constructor, adding tests, and getting familiar with the Vega spec, I’ve built a solid grasp of the codebase. This project feels like a natural next step to build on that, contribute something meaningful, and keep learning.

### Qualifications

This project demands a solid grasp of JavaScript, the Vega visualization grammar, and stdlib’s internal architecture, all of which I’ve been developing through practical contributions and in-depth exploration of the Vega ecosystem.

While working with the plot/vega module, I spent time studying the Vega specification in depth—understanding value references, different mark types, legends, and the transform system. I also mapped how these pieces come together for common chart types, which helped me connect the Vega JSON structure with stdlib’s constructor-based approach.

I’m comfortable with JavaScript and Node.js, with experience in object-oriented design, event-driven architectures, and writing modular, well-tested code, which prepares me to contribute effectively to this project.

### Prior art

The plot/vega module in stdlib already has a strong foundation, with components like the spec builder, signals, scales, axes, base marks, and data constructors in place. This significantly reduces the initial workload. However, key pieces required to generate complete charts such as value references, specialized mark types, legends, and data transforms are still missing. This project focuses on bridging that gap to enable end-to-end support for common chart types.

This area has been explored in other ecosystems through tools like [Altair](https://altair-viz.github.io/) and [Observable Plot](https://observablehq.com/plot/). However, these either abstract away the Vega specification or do not produce portable Vega outputs.

stdlib takes a different approach by closely mirroring the Vega grammar through modular constructors, allowing fine-grained, programmatic control over visualization building in JavaScript. Extending this to full chart support would make it a unique offering in the JS ecosystem.

### Commitment

Given my familiarity with the codebase and understanding of the proposed work, I plan to start contributing even before the official coding period. During the community bonding phase, I will engage with mentors, explore the codebase, and begin initial tasks.

I have exams from May 17th to May 30th, during which I will dedicate 25 hours per week. After this period, I will increase my involvement to 40 hours per week. Post midterm evaluation, I will continue contributing at a steady pace of 30 hours per week.

I intend to remain actively involved with the community and continue contributing beyond the GSoC period.

### Schedule

Assuming a 12 week schedule,

- **Community Bonding Period**: 
    - **Implement `plot/vega/value/base/ctor`**— Base value-reference class.
    - Properties: `value`, `signal`, `scale`, `band`, `exponent`, `mult`, `offset`, `round`.
    - **Implement `plot/vega/value/field/ctor`** — Per-datum field lookups.
    - Properties: `field`, `datum`, `group`, `parent`, `level`.
    - Implement assertions: `is-value-reference`, `is-field-value`.
    - **Existing module contributions:** additional tests for `signal/ctor`, `scale/base/ctor`, `mark/base/ctor`;
    - **Tests & docs** for both constructors

- **Week 1**: Value-Reference: ColorValue + GradientValue
    - Implement **`plot/vega/value/color/ctor`** — Per-channel color specification.
        - Color spaces: RGB, HSL, LAB, HCL.
    - Implement **`plot/vega/value/color/ctor`** — Per-channel color specification.
        - Color spaces: RGB, HSL, LAB, HCL.
    - Implement assertions: `is-color-value`, `is-gradient-value`, `is-color-space`

- **Week 2**: Core Marks: Rect + Line + Symbol + Path
    - Implement **`plot/vega/mark/rect/ctor`** — Bar charts, column charts, histograms.
    - Implement **`plot/vega/mark/line/ctor`** — Line charts.
    - Implement **`plot/vega/mark/symbol/ctor`** — Scatter plots.
    - Implement **`plot/vega/mark/path/ctor`** — Arbitrary SVG paths.

- **Week 3**:  Supporting Marks: Rule + Text + Area + Arc + Shape
  
    - Implement **`plot/vega/mark/text/ctor`** — Labels, titles, annotations.
    - Implement **`plot/vega/mark/area/ctor`** — Filled area charts.
    - Implement  **`plot/vega/mark/rule/ctor`** — Axis ticks, grid lines, reference lines
    - Implement **`plot/vega/mark/arc/ctor`** — Pie/donut charts.
    - Implement **`plot/vega/mark/shape/ctor`** — Cartographic marks.

- **Week 4**: Legend Constructor
    - Implement **`plot/vega/legend/ctor`** — Full legend constructor.
    - Implement **`plot/vega/legend/types`** — Enum: `symbol`, `gradient`, `discrete`.
    - Implement **`plot/vega/legend/orientations`** — Enum: `left`, `right`, `top`, `bottom`, etc.

- **Week 5**: Legend Completion + Assertions
    - Implement legend `toJSON()`, edge cases, builder integration.
    - Implement assertions: `is-legend`, `is-legend-array`, `is-legend-type`, `is-legend-orientation`, `is- encode`, `is-production-rule`, `is-expression`.

- **Week 6**: (midterm) Remaining Marks + Signal Binding
    - Implement **`plot/vega/mark/image/ctor`** — Embedded images.
    - Implement **`plot/vega/mark/trail/ctor`** — Variable-width lines.
    - Implement **`plot/vega/signal/bind/ctor`** — Input bindings.
    - Implement assertion: `is-binding`.

- **Week 7**: Vega Layouts
    - Implement **`plot/vega/layout/ctor`** — Grid layout for group mark composition.
    - Implement assertion: `is-layout`.

- **Week 8**: Transforms: Research + Base Constructor
    - Implement **`plot/vega/transform/base/ctor`** — Base transform class.
   - Implement `plot/vega/transform/stack/ctor` — Stacked bar/column layouts.
   - Implement assertion: `is-transform`, `is-transform-array`.

- **Week 9**: Transforms: Bin + Collect + Extent
    - Implement **`plot/vega/transform/bin/ctor`** — Numeric binning for histograms.
    - Implement **`plot/vega/transform/collect/ctor`** — Sort data streams for line charts.
    - Implement **`plot/vega/transform/extent/ctor`** — Min/max computation.

- **Week 10**: Chart Type Integration (Bar + Column)
    - Wire up & validate working bar chart spec using builder + rect mark + band scale + axes.
    - Wire up & validate working column chart spec.

- **Week 11**: Chart Type Integration (Scatter + Histogram) + Advanced Transform
     - Implement **`plot/vega/transform/sequence/ctor`** — Generate numeric sequences.
     - Wire up & validate working scatter plot spec using builder + symbol mark + linear scales + axes.
     - Wire up & validate working histogram spec using builder + rect mark + bin transform + linear scale + axes.
              
- **Week 12**: Final Submission + Remaining Transforms + Documentation
     - Do the documentation part for all chart type examples.
     - Final documentation review ensuring all README files, JSDoc comments, and type definitions are complete.
     - Create a tracking issue documenting any remaining work for post-GSoC continuation.

- IF TIME PERMITS :) 
     - Implement advance transforms like 
         - `plot/vega/transform/density/ctor` — Probability distributions
         - `plot/vega/transform/loess/ctor` — Scatterplot smoothing
         - `plot/vega/transform/heatmap/ctor` — Heatmap images
         - `plot/vega/transform/cross/ctor` — Cross-product
         - `plot/vega/transform/isocontour/ctor` — Contour lines

**_Extras_**: If I get blocked while mentor's are reviewing PRs, I’ll shift focus to other productive work, like contributing to documentation, exploring high-priority tasks.

- **Post GSoC**: I intend to stay active in the stdlib community after the program ends. The tracking issue  created in Week 12 will outline remaining work . I'll continue picking these up as a regular contributor and help with reviews, documentation for the existing plot packages. Also we will work on the plot cli.

Notes:

- The community bonding period is a 3 week period built into GSoC to help you get to know the project community and participate in project discussion. This is an opportunity for you to setup your local development environment, learn how the project's source control works, refine your project plan, read any necessary documentation, and otherwise prepare to execute on your project project proposal.
- Usually, even week 1 deliverables include some code.
- By week 6, you need enough done at this point for your mentor to evaluate your progress and pass you. Usually, you want to be a bit more than halfway done.
- By week 11, you may want to "code freeze" and focus on completing any tests and/or documentation.
- During the final week, you'll be submitting your project.


### Related issues

Here we go https://github.com/stdlib-js/google-summer-of-code/issues/8.

### Checklist

- [x] I have read and understood the [Code of Conduct](https://github.com/stdlib-js/stdlib/blob/develop/CODE_OF_CONDUCT.md).
- [x] I have read and understood the application materials found in this repository.
- [x] I understand that plagiarism will not be tolerated, and I have authored this application in my own words.
- [x] I have read and understood the [patch requirement](https://github.com/stdlib-js/google-summer-of-code/blob/main/README.md#patch-requirement) which is necessary for my application to be considered for acceptance.
- [x] I have read and understood the [stdlib showcase requirement](https://github.com/stdlib-js/google-summer-of-code/blob/main/README.md#showcase-requirement) which is necessary for my application to be considered for acceptance.
- [x] The issue name begins with `[RFC]:` and succinctly describes your proposal.
- [x] I understand that, in order to apply to be a GSoC contributor, I must submit my final application to <https://summerofcode.withgoogle.com/> **before** the submission deadline.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC]: Expanding the Vega engine to support the analysis of multidimensional data #206

Full name

University status

University name

University program

Expected graduation

Short biography

Timezone

Contact details

Platform

Editor

Programming experience

JavaScript experience

Node.js experience

C/Fortran experience

Interest in stdlib

Version control

Contributions to stdlib

stdlib showcase

Goals

Why this project?

Qualifications

Prior art

Commitment

Schedule

Related issues

Checklist

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Chart	Marks to Implement	Transforms to Implement	Other Modules
Bar Chart	mark/rect/ctor	transform/aggregate/ctor, transform/stack/ctor	legend/ctor, layout/ctor, value/color/ctor
Column Chart	mark/rect/ctor	transform/aggregate/ctor	layout/ctor
Line Chart	mark/line/ctor	transform/collect/ctor	value/field/ctor
Scatter Plot	mark/symbol/ctor	transform/filter/ctor ,transform/loess/ctor	legend/ctor ,value/gradient/ctor
Histogram	mark/rect/ctor	transform/bin/ctor, transform/aggregate/ctor , transform/extent/ctor	value/base/ctor

[RFC]: Expanding the Vega engine to support the analysis of multidimensional data #206

Description

Full name

University status

University name

University program

Expected graduation

Short biography

Timezone

Contact details

Platform

Editor

Programming experience

JavaScript experience

Node.js experience

C/Fortran experience

Interest in stdlib

Version control

Contributions to stdlib

stdlib showcase

Goals

Why this project?

Qualifications

Prior art

Commitment

Schedule

Related issues

Checklist

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions