Skip to content

[RFC]: Expanding the Vega engine to support the analysis of multidimensional data #206

@Sachinn-64

Description

@Sachinn-64

Full name

Sachin Pangal

University status

Yes

University name

Walchand Institute of Technology

University program

Computer Science & Engineering

Expected graduation

2027

Short biography

I’m a third-year Computer Science undergrad who likes building things and figuring out how they actually work. I mainly work with JavaScript and Node.js, and through freelancing I’ve built production-level features that made me comfortable working on real systems.

Outside of tutorial hell, freelancing, and hackathons, I’ve spent a lot of time just writing code on my own, which has helped me focus on building cleaner and more scalable codebases.

I’ve worked with JavaScript and TypeScript, along with C/C++ and Python. I’ve also explored things like WebSockets, WebRTC, Turborepo, Next.js, and some DevOps practices. Over time, I’ve developed a habit of writing modular, clean code, usually structuring things in an MVC-style way so it stays maintainable.

Timezone

Indian Standard Time (GMT+5:30)

Contact details

email: sachinprogramming62@gmail.com, github: Sachinn-64

Platform

Mac

Editor

I prefer using VS Code because it’s clean and minimal, which helps me stay focused without unnecessary distractions. At the same time, it has a huge ecosystem of extensions, so I can easily customize it based on whatever I’m working on.

Programming experience

I started coding around the age of 14, with Python as my first language. That early exposure got me curious about how things work under the hood. Later, I picked up C, which gave me a strong foundation in problem-solving. From there, I gradually moved into full-stack development, working with JavaScript and TypeScript to build end-to-end applications.

Here are some of the projects I’ve worked on:

  1. Trinetra: An AI-powered surveillance and crowd management system for large-scale events like Mahakumbh. It uses facial recognition, real-time GPS tracking, crowd density heatmaps, and disaster prediction to ensure safety. The platform includes a mobile app (SOS, ambulance booking, offline navigation) and an admin dashboard for monitoring and emergency coordination. Built with React Native, FastAPI, YOLO, FaceNet, Kafka, and Docker, it’s designed to scale to a large number of users.

  2. CureConnect: It is an AI-powered telemedicine platform designed for rural and low-bandwidth areas. It enables video consultations via WebRTC, offline chat with doctors, and uses machine learning to help analyze ECGs, X-rays, and skin conditions. It also includes a smart chatbot, IVR-based emergency support, medicine delivery, and a tool to check eligibility for government healthcare schemes, all in one place.

  3. Whisplore: It is a React Native app for discovering and sharing hidden gems in Gwalior like romantic rooftops, serene lakesides, and creative street corners. Users can explore spots via a GPS-based interactive map, upload photos, and rate places on vibe, safety, and uniqueness. Built with Node.js, MongoDB, and Cloudinary.

JavaScript experience

JavaScript was one of the few things that just felt easy while learning. It made sense quickly, and I found myself actually enjoying working with it. I've used it across almost all my projects, from building robust APIs to full-stack applications. Going deeper into things like the V8 engine and contributing to open-source projects, including stdlib, only strengthened my foundation.

Async/await and Promises are among my preferred features, as they they simplify asynchronous logic and make the code feel much more readable.

One limitation I’ve encountered is the lack of type safety. This can allow subtle bugs to slip in, but TypeScript addresses this effectively and has become a part of my workflow.

Node.js experience

I’ve spent a good amount of time working with Node.js, mainly building backend systems and APIs that actually get used in real-world scenarios. I started with small projects, but over time they turned into more serious work, where I had to think beyond just getting things to work.

Some of the systems I’ve built handle 1000+ monthly users, which pushed me to focus more on writing reliable code, thinking about performance, and designing APIs that can scale without breaking under load. That shift from just building features to building systems that hold up in production has been a really valuable learning experience for me.

C/Fortran experience

I picked up C, which gave me a strong foundation in problem-solving, data structures, and core concepts. Working with C also introduced me to areas like computer graphics, helping me understand low-level programming more deeply. I’ve been able to apply this knowledge in my contributions to stdlib, especially in stats-related modules.

I’m currently exploring Fortran and building a solid foundation, with the goal of improving further through future contributions.

Interest in stdlib

I was naturally drawn to stdlib as its tech stack closely aligns with my background. Around the same time, a friend introduced me to the ecosystem, which encouraged me to explore it further. While doing so, I also came across a podcast that sparked my interest in the project’s broader vision.

What stood out to me is the idea of bringing scientific computing into JavaScript extending it beyond typical web use into something much more powerful.

I found the ndarray support really interesting, since it makes working with multidimensional data much more efficient, something you don’t usually see in JavaScript. I also liked the built-in REPL with integrated help and examples. It makes experimenting and learning feel very natural, almost like having a small interactive numerical environment right inside JavaScript.

I also appreciate how active and supportive the maintainers are, and how structured and well-maintained the codebase is.

Version control

Yes

Contributions to stdlib

Here are all the contributions I’ve made so far -

Merged Work

  • Added C and JS implementations to stats/strided routines
  • Added JS implementations across stats/* modules
  • Added JS implementations for stats/base/ndarray
  • Added JS implementations for stats/array
  • Implemented number/float64/base/sub3 and complex/float32/base/add3
  • Added tests and assertions for plot-related modules
  • Refactored setters in the plot module
  • Improved and updated documentation across stats/*
  • Performed cleanup, refactoring, and fixed inconsistencies

Open work

  • I have a few pull requests open—some are under review, and a few need changes.

Reviewed Work

  • I’ve also contributed by reviewing a few pull requests.

stdlib showcase

stdlib-data-playground: I built a data playground web app where users can upload CSV, TSV, or JSON datasets, preview the data, and instantly compute numeric summaries using stdlib-js. It also lets you create interactive Vega-Lite charts, along with a Noisy Data Explorer that visualizes sine waves with Gaussian noise and moving mean smoothing rendered as a scattered plot.

List of stdlib packages used:

  • @stdlib/stats/base/mean
  • @stdlib/stats/base/stdev
  • @stdlib/stats/base/min
  • @stdlib/stats/base/max
  • @stdlib/random/base/normal
  • @stdlib/ndarray/array
  • @stdlib/stats/incr/mmean

It is a simple data playground built with React, TypeScript, and an Express backend. You can upload CSV, TSV, or JSON files and instantly get stats like mean, median, standard-deviation using stdlib-js. It also includes a Noisy Data Explorer that generates sine wave data with Gaussian noise, applies smoothing, lets you visualize and download it as PNG, SVG, or JSON via Vega-Lite.

Note : This is the deployed URL of the project stdlib-showcase. However, I recommend running it locally, as it’s deployed using free-tier services.

Goals

Abstract

My goal is to implement the remaining Vega components required to programmatically generate commonly used chart types such as bar, line, scatter, histogram, and column charts. This includes building core elements like the value-reference system, essential mark types, a complete legend system, signal bindings for interactivity, layouts, and key data transforms such as binning, aggregation, filtering, and stacking. The end objective is to enable the transformation of raw data into complete Vega specifications for these charts using stdlib constructors.

Main Goal

  • Implement the complete value-reference system that maps data fields to visual properties which is the foundation every mark encoding depends on.
  • Implement all 11 specialized Vega mark types rectlinesymbolruletextareaarcpathshapeimagetrail, each inheriting from the existing mark/base/ctor, to enable bar charts, column charts, line charts, scatter plots, histograms.
  • Implement a complete legend system legend/ctor, type/orientation enumerations with full toJSON() support, needed for scatter plots and multi-series charts.
  • Implement the essential data transforms types: binaggregateextent (histograms), stack (stacked bar/column), collect , filter , and formula , all inheriting from a new transform/base/ctor.
  • Implement plot/vega/layout/ctor — Grid layout for group mark composition.
  • Implement signal bindings (signal/bind/ctor) for interactive chart elements.
  • Write end-to-end chart examples for all target chart types (bar, column, line, scatter, histogram) demonstrating the full pipeline from data to Vega specification using only stdlib constructors.

Supporting Goals

  • Implement corresponding assertions for all new constructors is-value-referenceis-field-valueis-color-valueis-gradient-valueis-legendis-legend-typeis-transformis-binding, etc.
  • Add test’s, benchmarks, proper example’s to each new module.
  • Add documentation for each new module.
  • Improve test coverage and add benchmarks/examples for existing plot/vega packages.
  • If time permits, contribute documentation improvements for existing plot packages.
  • If time permits, I would also like to explore building a Plot CLI.

Note: Both the main and supporting goals can progress in parallel, while keeping the main goals as the primary focus

Approach

I’ll begin with value references, since they act as the bridge between raw data and visual output. Every visual property — like position, size, or color — depends on them, so building this layer first makes everything that follows much easier to structure.

Next, I’ll move on to marks, which define how data is actually drawn. Once marks are in place, I can then build legends and layouts, as they depend on marks to organize and explain the visuals.

Finally, I’ll implement data transforms. Since transforms operate on data before it reaches the rendering layer, they can be added once the visual pipeline is stable.

While planning, I also analyzed common chart types and identified shared dependencies between them. For instance, the rect mark and aggregate transform are used across bar, column, and histogram charts. By implementing these early, I can unlock multiple chart types together, making the development process more efficient and avoiding redundant work.

Below, I’ve provided a table that outlines which modules are required for each chart type, giving a clear view of dependencies and implementation priorities.

Chart Marks to Implement Transforms to Implement Other Modules
Bar Chart mark/rect/ctor transform/aggregate/ctor, transform/stack/ctor legend/ctor, layout/ctor, value/color/ctor
Column Chart mark/rect/ctor transform/aggregate/ctor layout/ctor
Line Chart mark/line/ctor transform/collect/ctor value/field/ctor
Scatter Plot mark/symbol/ctor transform/filter/ctor ,transform/loess/ctor legend/ctor ,value/gradient/ctor
Histogram mark/rect/ctor transform/bin/ctor, transform/aggregate/ctor , transform/extent/ctor value/base/ctor

Why this project?

I became interested in stdlib’s plot work while exploring the plot/vega module, where I saw the potential to build Vega visualizations programmatically without writing raw JSON. This approach really stood out to me, as it brings a level of structured, programmatic visualization to JavaScript that isn’t commonly seen in the ecosystem.

What excites me most is the practicality charts like bar, line, scatter, and histograms are used everywhere, and each component directly enables these use cases. It’s also rewarding to see the output of this work directly, whether it’s a chart rendering on the screen or being used by real users.

Through my contributions so far, working on the axis constructor, adding tests, and getting familiar with the Vega spec, I’ve built a solid grasp of the codebase. This project feels like a natural next step to build on that, contribute something meaningful, and keep learning.

Qualifications

This project demands a solid grasp of JavaScript, the Vega visualization grammar, and stdlib’s internal architecture, all of which I’ve been developing through practical contributions and in-depth exploration of the Vega ecosystem.

While working with the plot/vega module, I spent time studying the Vega specification in depth—understanding value references, different mark types, legends, and the transform system. I also mapped how these pieces come together for common chart types, which helped me connect the Vega JSON structure with stdlib’s constructor-based approach.

I’m comfortable with JavaScript and Node.js, with experience in object-oriented design, event-driven architectures, and writing modular, well-tested code, which prepares me to contribute effectively to this project.

Prior art

The plot/vega module in stdlib already has a strong foundation, with components like the spec builder, signals, scales, axes, base marks, and data constructors in place. This significantly reduces the initial workload. However, key pieces required to generate complete charts such as value references, specialized mark types, legends, and data transforms are still missing. This project focuses on bridging that gap to enable end-to-end support for common chart types.

This area has been explored in other ecosystems through tools like Altair and Observable Plot. However, these either abstract away the Vega specification or do not produce portable Vega outputs.

stdlib takes a different approach by closely mirroring the Vega grammar through modular constructors, allowing fine-grained, programmatic control over visualization building in JavaScript. Extending this to full chart support would make it a unique offering in the JS ecosystem.

Commitment

Given my familiarity with the codebase and understanding of the proposed work, I plan to start contributing even before the official coding period. During the community bonding phase, I will engage with mentors, explore the codebase, and begin initial tasks.

I have exams from May 17th to May 30th, during which I will dedicate 25 hours per week. After this period, I will increase my involvement to 40 hours per week. Post midterm evaluation, I will continue contributing at a steady pace of 30 hours per week.

I intend to remain actively involved with the community and continue contributing beyond the GSoC period.

Schedule

Assuming a 12 week schedule,

  • Community Bonding Period:

    • Implement plot/vega/value/base/ctor— Base value-reference class.
    • Properties: valuesignalscalebandexponentmultoffsetround.
    • Implement plot/vega/value/field/ctor — Per-datum field lookups.
    • Properties: fielddatumgroupparentlevel.
    • Implement assertions: is-value-referenceis-field-value.
    • Existing module contributions: additional tests for signal/ctorscale/base/ctormark/base/ctor;
    • Tests & docs for both constructors
  • Week 1: Value-Reference: ColorValue + GradientValue

    • Implement plot/vega/value/color/ctor — Per-channel color specification.
      • Color spaces: RGB, HSL, LAB, HCL.
    • Implement plot/vega/value/color/ctor — Per-channel color specification.
      • Color spaces: RGB, HSL, LAB, HCL.
    • Implement assertions: is-color-valueis-gradient-valueis-color-space
  • Week 2: Core Marks: Rect + Line + Symbol + Path

    • Implement plot/vega/mark/rect/ctor — Bar charts, column charts, histograms.
    • Implement plot/vega/mark/line/ctor — Line charts.
    • Implement plot/vega/mark/symbol/ctor — Scatter plots.
    • Implement plot/vega/mark/path/ctor — Arbitrary SVG paths.
  • Week 3: Supporting Marks: Rule + Text + Area + Arc + Shape

    • Implement plot/vega/mark/text/ctor — Labels, titles, annotations.
    • Implement plot/vega/mark/area/ctor — Filled area charts.
    • Implement  plot/vega/mark/rule/ctor — Axis ticks, grid lines, reference lines
    • Implement plot/vega/mark/arc/ctor — Pie/donut charts.
    • Implement plot/vega/mark/shape/ctor — Cartographic marks.
  • Week 4: Legend Constructor

    • Implement plot/vega/legend/ctor — Full legend constructor.
    • Implement plot/vega/legend/types — Enum: symbolgradientdiscrete.
    • Implement plot/vega/legend/orientations — Enum: leftrighttopbottom, etc.
  • Week 5: Legend Completion + Assertions

    • Implement legend toJSON(), edge cases, builder integration.
    • Implement assertions: is-legendis-legend-arrayis-legend-typeis-legend-orientationis- encodeis-production-ruleis-expression.
  • Week 6: (midterm) Remaining Marks + Signal Binding

    • Implement plot/vega/mark/image/ctor — Embedded images.
    • Implement plot/vega/mark/trail/ctor — Variable-width lines.
    • Implement plot/vega/signal/bind/ctor — Input bindings.
    • Implement assertion: is-binding.
  • Week 7: Vega Layouts

    • Implement plot/vega/layout/ctor — Grid layout for group mark composition.
    • Implement assertion: is-layout.
  • Week 8: Transforms: Research + Base Constructor

    • Implement plot/vega/transform/base/ctor — Base transform class.
    • Implement plot/vega/transform/stack/ctor — Stacked bar/column layouts.
    • Implement assertion: is-transform, is-transform-array.
  • Week 9: Transforms: Bin + Collect + Extent

    • Implement plot/vega/transform/bin/ctor — Numeric binning for histograms.
    • Implement plot/vega/transform/collect/ctor — Sort data streams for line charts.
    • Implement plot/vega/transform/extent/ctor — Min/max computation.
  • Week 10: Chart Type Integration (Bar + Column)

    • Wire up & validate working bar chart spec using builder + rect mark + band scale + axes.
    • Wire up & validate working column chart spec.
  • Week 11: Chart Type Integration (Scatter + Histogram) + Advanced Transform

    • Implement plot/vega/transform/sequence/ctor — Generate numeric sequences.
    • Wire up & validate working scatter plot spec using builder + symbol mark + linear scales + axes.
    • Wire up & validate working histogram spec using builder + rect mark + bin transform + linear scale + axes.
  • Week 12: Final Submission + Remaining Transforms + Documentation

    • Do the documentation part for all chart type examples.
    • Final documentation review ensuring all README files, JSDoc comments, and type definitions are complete.
    • Create a tracking issue documenting any remaining work for post-GSoC continuation.
  • IF TIME PERMITS :)

    • Implement advance transforms like
      • plot/vega/transform/density/ctor — Probability distributions
      • plot/vega/transform/loess/ctor — Scatterplot smoothing
      • plot/vega/transform/heatmap/ctor — Heatmap images
      • plot/vega/transform/cross/ctor — Cross-product
      • plot/vega/transform/isocontour/ctor — Contour lines

Extras: If I get blocked while mentor's are reviewing PRs, I’ll shift focus to other productive work, like contributing to documentation, exploring high-priority tasks.

  • Post GSoC: I intend to stay active in the stdlib community after the program ends. The tracking issue created in Week 12 will outline remaining work . I'll continue picking these up as a regular contributor and help with reviews, documentation for the existing plot packages. Also we will work on the plot cli.

Notes:

  • The community bonding period is a 3 week period built into GSoC to help you get to know the project community and participate in project discussion. This is an opportunity for you to setup your local development environment, learn how the project's source control works, refine your project plan, read any necessary documentation, and otherwise prepare to execute on your project project proposal.
  • Usually, even week 1 deliverables include some code.
  • By week 6, you need enough done at this point for your mentor to evaluate your progress and pass you. Usually, you want to be a bit more than halfway done.
  • By week 11, you may want to "code freeze" and focus on completing any tests and/or documentation.
  • During the final week, you'll be submitting your project.

Related issues

Here we go #8.

Checklist

  • I have read and understood the Code of Conduct.
  • I have read and understood the application materials found in this repository.
  • I understand that plagiarism will not be tolerated, and I have authored this application in my own words.
  • I have read and understood the patch requirement which is necessary for my application to be considered for acceptance.
  • I have read and understood the stdlib showcase requirement which is necessary for my application to be considered for acceptance.
  • The issue name begins with [RFC]: and succinctly describes your proposal.
  • I understand that, in order to apply to be a GSoC contributor, I must submit my final application to https://summerofcode.withgoogle.com/ before the submission deadline.

Metadata

Metadata

Assignees

No one assigned

    Labels

    20262026 GSoC proposal.received feedbackA proposal which has received feedback.rfcProject proposal.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions