Skip to content

[RFC]: Extending Level 2 and Level 3 BLAS routines for linear algebra #205

@MeKaustubh07

Description

@MeKaustubh07

Full name

Kaustubh Patange

University status

Yes

University name

Walchand Institute of Technology

University program

Computer Science & Engineering

Expected graduation

2027

Short biography

I am a third-year B.Tech student in Computer Science and Engineering with a deep passion for software development and open-source world.I am driven by a strong curiosity to understand how systems function under the hood. I have a keen interest in mathematics, problem-solving, and experimenting with AI. I have worked across a full-stack environment using things like Next.js, PostgreSQL, and TypeScript, and I am comfortable building and understanding end-to-end systems.

My programming experience includes languages such as Python, C, C++, JavaScript, and TypeScript, along with tools and frameworks like React, Docker, Firebase, and DBs like Prisma/PostgreSQL. I enjoy exploring how systems work under the hood and continuously try to improve my understanding of both high-level design and low-level implementation.

Apart from academics, I have a keen interest in mathematics, algorithms, and problem-solving. I also like experimenting with AI tools and automation, and I actively contribute to open source as a way to learn and build meaningful software.

Timezone

Indian Standard Time ( IST ), UTC+5:30

Contact details

Email: kaustubh.mp007@gmail.com , GitHub : MeKaustubh07

Platform

Mac

Editor

I prefer VS Code because it offers a lightweight yet highly customizable environment that adapts well to different workflows. Its extension ecosystem allows me to efficiently handle everything from development to debugging in one place.It lets me stay focused while still giving me the tools I need to work efficiently across different technologies.

Programming experience

My programming journey has been exciting from the very beginning. I started by building a strong foundation through C and C++, which helped me understand core programming concepts. To deepen my understanding, I focused on data structures and algorithms, regularly practicing problem-solving and improving my logical thinking.
As I grew more confident in problem-solving, I transitioned into full-stack development, where I began working on real-world applications and practical use cases. This shift allowed me to understand how different parts of a system come together and how to build scalable and maintainable solutions.
Here are some of my real-world projects:

CourseHive : A full-featured course selling and creation platform that allows users to purchase, view, and interact with online courses, while providing administrators with powerful tools to manage course content, users, and data.
This project emphasizes scalability, role-based authentication, and real-time collaboration features. Built with a modern MERN + TypeScript stack, ensuring security and performance.
Admin End : Lets you Create and manage courses with structured modules, upload lecture videos, notes, and papers to Cloudinary,Manage users, purchases, and payments, monitor platform analytics and course engagement.
Secure authentication and role-based access control
User End :Browse and purchase courses securely.stream video lectures online with smooth playback.download materials and past exam papers,attempt online test series with instant results.Track purchased courses and progress.

AthleteApp : Athlete App is a comprehensive, AI-powered sports management and performance tracking platform. It is designed to serve athletes, coaches, and sports administrators (like government bodies or the sports ministry). It focuses heavily on accessibility, gamified progression, safety (injury management), and transparent compliance.
Tech Stack
1.Frontend: Next.js 15 (React 19), styled with Tailwind CSS v4 and Framer Motion for premium animations. Charts are handled by Recharts.
2.Backend Support: Next.js API routes combined with a dedicated Python FastAPI backend for heavy AI workloads.
3.Database & Auth: Google Firebase / Firestore.
4.Real-time Communication: Socket.io integration for instant community chat and mentoring.

JavaScript experience

I started learning JavaScript during my early high school years. Since my first programming language was C/C++, which is object-oriented, the transition to JavaScript was relatively smooth and intuitive. I later deepened my understanding of JavaScript through full-stack development and structured learning resources, which helped me build a strong, industry-relevant skill set.

Over time, I’ve worked extensively with JavaScript while building real-world applications, including handling complex backend logic, cross-framework integration, and dependency management. These experiences have helped me develop strong intuition and practical fluency in the language.

Javascript allowed me to work seamlessly across both frontend and backend environments, enabling full-stack development with a single language.My least favorite aspect is its loosely typed nature, which can sometimes lead to unexpected bugs if not handled carefully. This is one of the reasons I prefer using TypeScript in larger projects for better type safety and maintainability.

Node.js experience

My initial experience with Node.js was both challenging and fascinating. I was introduced to it through a full-stack development course, and while it felt a bit confusing at first, consistent hands-on practice helped me build a strong understanding over time.

By actively working with Node.js in multiple projects, I became comfortable with backend development concepts such as handling APIs, managing asynchronous operations, and structuring server-side logic. This practical exposure allowed me to develop confidence and fluency in using Node.js effectively.

C/Fortran experience

C was my first programming language, and I have developed a solid understanding of its core concepts through regular practice, DSA problem-solving, and working with the stdlib codebase. I also have experience building intermediate-level projects, which helped me understand performance considerations and low-level behavior.

My Fortran experience is more recent, gained through contributing to stdlib PRs, and I am steadily improving as I work more with its implementation patterns and logic. While I am still building fluency, I am comfortable understanding existing Fortran code and translating its logic into practical implementations. I am confident that with continued contributions, my proficiency in Fortran will grow quickly.

Interest in stdlib

Open source is one of the most effective ways to develop a deeper and more practical understanding of the software development ecosystem, as it allows you to work on projects with real-world impact. With that in mind, I set a goal to contribute seriously to open source and began exploring various organizations across different domains to get started.

After gaining some initial experience and becoming comfortable with GitHub workflows, I chose to continue my journey with stdlib. Given my interest in algebra and my current development goals, stdlib stood out as a natural choice for active contribution.

Moreover, the consistency of activity within stdlib and its well-structured codebase make it an excellent environment for learning and contributing. The maintainers and mentors are approachable and supportive, which makes it easier for new contributors to get involved. The clear structure and strong communication within the community significantly lower the barrier to entry and help contributors grow effectively.

Version control

Yes

Contributions to stdlib

Since past few months I have contributed to stdlib under various namespaces.
Here is combined list of Pull Requests created by me,

Merged PRs

  1. Added structured package data for math/base/special/*
  2. Added Js implementation for stats/base/ndarray/* packages
  3. Added Js implementation for blas/ext/base/ndarray/* packages
  4. Added C implementation for blas/ext/base/ndarray/* packages

Open PRs

  1. Added Js and C implementation for blas/ext/base/* packages
  2. Added C and Fortran implementation for blas/base/* packages
  3. Added Js and C implementation for blas/ext/base/ndarray/* packages

Reviewed PRs

  1. #6019 : Added Js and C implementation for blas/ext/base/dnancusumkbn2
  2. #987 : feat: add C and Fortran implementation for blas/base/srotg
  3. #6613 : feat: add blas/base/zdotu
  4. #4473 : feat: add blas/base/cdotu

stdlib showcase

Linear-Algebra Playground
Github
it's a modern web application built to visually demonstrate the power of @stdlib/blas. The project bridges the gap between low-level mathematical code and visual understanding. Instead of static documentation, it provides a dynamic, browser-based playground where users can input real data and instantly see how standard Basic Linear Algebra Subprograms routines compute complex linear algebra under the hood. All calculations are strictly powered by stdlib’s native mathematical packages.

dgemm.pdf
Attached is my blog format understanding of the @stdlib/blas/base/*packages such as, dgemm architecture, covering the three-layer design, stride-based layout handling, and cache optimization strategies. based on that, I've also built an interactive DGEMM feature visualization in my Linear-Algebra Playground with other related features that demonstrates how the operation maps across all three API layers.

Goals

The primary goal is to systematically drive the existing tracking issue, Add BLAS bindings and implementations for linear algebra, toward completion. This issue has been active for over two years, during which significant progress has been made, particularly in JavaScript implementations.

Throughout the proposed timeline, my primary focus will be on contributing within the blas/base namespace, ensuring steady and meaningful progress toward closing the gap in implementations.

Main Goals
-- Review and complete existing draft/open PRs, identifying blockers and resolving pending issues
-- Prioritize JavaScript implementations as the foundational layer
-- Extend existing packages by adding corresponding C and Fortran implementations
-- Introduce WebAssembly implementations for existing C/Fortran functions where applicable

Approach

When adding a new math routine under Issue, The ddot operation computes the dot product of two vectors, $x$ and $y$. Its standard, default signature matches the classic Fortran BLAS specification:

var z = ddot( N, x, strideX, y, strideY );

Instead of directly implementing package inside ddot we implement it under two files ndarray.js and main.js.
At the lowest level, stdlib requires absolute control over memory to handle non-contiguous arrays safely. Thus, the actual math is implemented in ndarray.js. This engine demands explicit offsets (starting indices) alongside the strides.

@stdlib/blas/base/ddot/lib/ndarray.js 
// Operates purely on raw arrays, explicit strides, and explicit offsets
function ddot( N, x, strideX, offsetX, y, strideY, offsetY ) {
    var ix = offsetX; 
    var iy = offsetY;
    var dot = 0.0;
    
    for ( var i = 0; i < N; i++ ) {
        dot += x[ix] * y[iy];
        ix += strideX; 
        iy += strideY;
    }
    return dot;
}
module.exports = ddot;

Because manually calculating starting offsets for negative strides is hard for standard users, main.js acts as a protective wrapper. It takes the classic 5-parameter signature, dynamically computes what the safe starting offset should be, and then delegates the work down to the ndarray.js engine.

@stdlib/blas/base/ddot/lib/ddot.js (or main.js)
// Auto-calculates offsets and maintains backward compatibility
var strided = require('./ndarray.js');
var stride2offset = require('@stdlib/strided/base/stride2offset');

function ddot_classic( N, x, strideX, y, strideY ) {
    // Dynamically calculate safe starting indices
    var ix = stride2offset( N, strideX ); 
    var iy = stride2offset( N, strideY );
    
    return strided( N, x, strideX, ix, y, strideY, iy ); 
}
module.exports = ddot_classic;

This "Dual-API" design pattern is the architectural standard across the @stdlib/blas/base/* namespace. main.js maintains strict fidelity to the original CBLAS API, ensuring direct compatibility for users from Python, Fortran, and C ecosystems. The BLIS-style ndarray.js interface exists to handle a deeper problem: ndarrays are arbitrary multi-dimensional views on linear memory buffers that may not be contiguous. Unlike libraries such as NumPy, which must perform explicit data copies when arrays are non-contiguous in the trailing dimension, BLIS-style interface natively supports non-unit trailing strides, completely eliminating that extra data movement.

Supporting Goals
-- Work towards Adding the blas/base/ndarray/* packages

Approach

The signature progressively generalizes from basic to fully flexible. blas/base/ddot (Layer 1) takes 5 parameters, N, array, stride per vector with offset computed internally from the stride. The .ndarray export (Layer 2) takes 7 parameters, adding an explicit offset per vector so data can start anywhere in the buffer. The ndarray wrapper blas/base/ndarray/ddot (Layer 3) takes a single array of ndarray objects, extracting all raw values internally.

function ddot( N, x, sx, offsetX, y, sy, offsetY ) {
     // Ex: @stdlib/blas/base/ddot/lib/ndarray.js
    // Receives raw arrays, explicit strides, and explicit starting offsets
    var ix = offsetX;
    var iy = offsetY;

    // Fast-path: Unrolled loop for contiguous memory (strides = 1)
    if ( sx === 1 && sy === 1 ) {
        // ... (5 multiplies per iteration for performance)
    } else {
        // Walk through memory using arbitrary strides
        for ( var i = 0; i < N; i++ ) {
            dot += x[ix] * y[iy];
            ix += sx;
            iy += sy;
        }
    }
    return dot;
}

Layer 1 is a convenience API that computes offsets from strides and delegates to Layer 2. Layer 2 is the computation engine, it receives raw arrays with explicit strides and offsets and performs the actual math (with loop unrolling for unit strides). Layer 3 is a thin wrapper that extracts buffer, stride, and offset from ndarray objects and feeds them to Layer 2. The chain is always: Layer 3 -> Layer 2 -> result (Layer 1 is a sibling of Layer 3, also calling Layer 2).

// Ex: @stdlib/blas/base/ndarray/ddot/lib/main.js
// Receives an array of high-level ndarray objects
function ddot( arrays ) {
    var x = arrays[ 0 ];
    var y = arrays[ 1 ];

    // Unpacks object metadata and calls the math engine (strided)
    return strided(
        numelDimension( x, 0 ),   // N (Length)
        getData( x ),             // Raw buffer for x
        getStride( x, 0 ),        // Stride for x
        getOffset( x ),           // Starting offset for x
        getData( y ),             // Raw buffer for y
        getStride( y, 0 ),        // Stride for y
        getOffset( y )            // Starting offset for y
    );
}

Layer 1 provides backward compatibility with the classic BLAS API. Its only job is computing offsets from strides using stride2offset(N, stride) negative strides need a non-zero starting index. It does zero math itself. Layer 2 does the actual computation. It uses the explicit offsets as starting positions, then walks through the arrays using strides. When both strides equal 1, it uses loop unrolling (processing 5 elements per iteration) for better performance. This layer is callable independently when you have raw typed arrays and know your offsets. Layer 3 bridges the ndarray ecosystem to the strided BLAS world. It uses four helper functions to unpack each ndarray object, getData() for the buffer, getStride() for the stride, getOffset() for the starting index, and numelDimension() for the length. For functions with auxiliary parameters (like dgemm's transpose flags), it additionally resolves enum integers to strings using resolveStr().

Why this project?

A dedicated, project-based goal like this is exactly what I need to upscale my open-source journey. My strong interest in linear algebra mathematics makes the BLAS namespace a natural and exciting choice for me.

What excites me most, however, is the opportunity to build on the groundwork I've already laid. Having actively contributed to the @stdlib/blas namespace, I'm kind of ahead of initial learning curve. I am now familiar with the project’s rigorous standards: its conventions, testing methodologies, API design patterns, and C/JavaScript bindings.

My decision to pursue this specific project is not spontaneous; it is the culmination of over five months of consistent, dedicated contributions specifically within the @stdlib/blas and @stdlib/stats namespaces. During this time, I have immersed myself in the codebase, successfully implementing variousndarray kernels and native C add-ons across different levels of complexity.

Qualifications

My primary qualification for executing this proposal is my extensive, hands-on experience developing directly within the @stdlib/blas ecosystem. I have dedicated myself to learning the technical concepts, algorithmic nuances, and structural conventions required to build high-performance numerical routines for stdlib.

I am uniquely suited to work on this project because my past contributions span across all critical layers of the BLAS infrastructure like@stdlib/blas/base/* , @stdlib/blas/ext/base/* , @stdlib/blas/ext/base/ndarray given me a comprehensive understanding of both the technical bottlenecks and the non-technical expectations.
Crucially, the hefty number of contributions I have already made under these specific namespace could give me a distinct edge.

Prior art

The foundational work for expanding stdlib's linear algebra capabilities—specifically tracked in Issue #2039: Add BLAS bindings and implementations for linear algebra was successfully laid by previous contributors. Initial cross-level implementations by aman-095 have been continued and refined over the past year by ShabiShett07. Alongside their heavy, meaningful progress, numerous other general contributors have also dedicated their time to pushing this massive issue steadily toward the finish line. Their collective, established groundwork provides a structured path and serves as a highly valuable reference for me.

primary references are:

Netlib BLAS
Serves as the original reference for mathematical correctness and standard API signatures.

OpenBLAS
Provides modern, useful alternative repository that could be helpful for deep insights.

NumPy and SciPy
Act as the primary ground-truth testing fixtures for validating multidimensional array behaviors.

Commitment

My commitment to this project is absolute, from concluding my internships to adjusting the academic calendar I am prepared to commit at least 30 - 40 hours per week exclusively to stdlib. Because my current focus is this, I am highly flexible and willing to engage in any supplementary work.

I would also like to note that I have academic examinations scheduled from May 17th to May 30th. During this period, my availability will be reduced to approximately 20–25 hours per week. However, I will ensure continuity by planning my work in advance and maintaining consistent communication. Outside of this timeframe, I will be fully available and committed to the proposed schedule.

Furthermore, my involvement with stdlib is not bound by the timeline. My recent months of prior contribution were not just preparation for this, but the beginning of a long-term investment. I am fully committed to remaining an active, reliable maintainer and contributor within the stdlib community long after the summer concludes.

Schedule

Implementation Strategy
My Current Goals related to timeline is straightforward, I have comprehensively divided my timeline into 5 structured phases that eventually targets to drag the respective issue towards its completion.

To keep the detailed track specified for every routine, I have created a comprehensive design,

BLAS-2039.pdf

The above timeline implementation plan is solely base on gsheet that I have created from scratch.
Initially I have visited each and every individual PR from the issue #2039 then,

1.took a High-level knowledge and noted a description of how a respective package works
2.Marked their current implementation status
3.rated all ( JS + C + Fortran) according to their difficulty out of 5
4.And then came to conclusion for deriving their priority
5.At the end arranging them in their respective order

I have done the same above process for all 146 packages ( JS + C + Fortran ) in #2039

Assuming a 12 week schedule,

  • Community Bonding Period:
    During the community bonding period, I will focus on establishing a productive working relationship with my mentors and other contributors. I also will use this time to audit all existing Draft and Open PRs across the tracking issue (#2039), identifying those that require minor fixes (lint errors, test coverage gaps) versus those needing full standards refresh, this triage directly feeds into my Phase 1 execution plan. Additionally, I will begin addressing low-hanging PRs such as fixing lint failures and improving test coverages, so that the coding period begins with a clean and unblocked pipeline.

  • Week 1 - Week 2 : Implementation of LAPACK and Level 1 Routines
    Phase 1 : resolve every BLAS routine that blocks LAPACK (Issue #2464, and complete all remaining Level 1 routines across every precision type (real-valued single/double, complex single, complex double). This is feasible in two weeks because the majority of these packages have existing Draft/Open PRs that are near-ready , they primarily need standards refresh, test coverage fixes, or C add-on additions.

Strategic Approach: Prioritizing Foundational PRs & ndarray Wrappers
To ensure a level-wise, systematic expansion of the @stdlib/blas/base ecosystem, it is critical to address the foundational layers first. Proceeding otherwise risks severe dependency issues, duplication of architectural flaws, and complex merge conflicts. Therefore, my immediate priority during the initial phase of the timeline will be to refactor, finalize, and push the currently pending OPEN/DRAFT packages across the finish line.

However, to maintain steady momentum while those foundational packages are undergoing maintainer review and awaiting merge approval, I propose working concurrently on the @stdlib/blas/base/ndarray/* packages. This dual-track approach ensures that development under the @stdlib/blas/base/* namespace progresses continuously without being bottlenecked by review cycles during the crucial early weeks of the program.

  • Week 3 - Week 4 : Level 2 Real-Valued: Double + Single Precision
    Phase 2 : Complete remaining real-valued Level 2 routines. Double precision first, then Single precision (Week 4, templated from the double versions for rapid development).Complete all remaining single-precision L2 routines. Every package here has a double-precision counterpart done in Week 3, enabling direct templating.

  • Week 5 - Week 6 : Level 3 Real-Valued: Double + Single Precision
    Phase 3 : Complete all real-valued Level 3 matrix-matrix operations. These are the most complex BLAS routines, but all have active PRs. Double precision first (LAPACK-dependency), then single.Complete all single-precision L3 counterparts, directly templated from the double-precision implementations finalized in Week 5.

  • Week 7 - Week 8 : (midterm) Level 2 Complex: Double + Single Precision.
    -Summarizing work till Phase 3, Recalculating Optimal Approach , Addressing Mistake , getting the feedback from the mentors.
    -Phase 4 : Given the scope, implementing all complex double and single precision L2/L3 routines within the timeframe may compromise quality. Instead, I will prioritize building the most crucial, foundational Complex L2 operations (such as core matrix-vector multiplications like zgbmv and cgbmv) first. Constructing these cornerstone packages perfectly will establish the strict testing fixtures and structural templates required, enabling rapid generation of the remaining L2 complex routines later.

  • Week 9 - Week 10 : Level 3 Complex: Double + Single Precision.
    Phase 5 : Understanding the scale and complexity of Level 3 complex packages, I will first prioritize implementing the most essential operations that can serve as strong reference implementations for templating the remaining routines. These include both symmetric and Hermitian variants unique to complex types(e.g., cgemm , zgemm).Given the architectural complexity involved, completing all 18 complex packages within this timeframe may be somewhat optimistic, and adjustments will be made to ensure high-quality and maintainable implementations.

  • Week 11 - Week 12 : (Buffer Period) : Final Integration & Spillover to address.
    Work Remaining Complex L2-L3 Complex precision packages. address any review comments or CI failures from Phases 3–5. adding C implementations for any packages where only JS was landed. Update checkboxes for every completed routine

  • Final Week : Submitting the actual Project, finalize the project's documentation. compile a comprehensive final report that summarizes the milestones achieved and explicitly outlines the technical roadmap for adding the remaining routines while gathering feedback from the mentors.

Notes:

  • The community bonding period is a 3 week period built into GSoC to help you get to know the project community and participate in project discussion. This is an opportunity for you to setup your local development environment, learn how the project's source control works, refine your project plan, read any necessary documentation, and otherwise prepare to execute on your project project proposal.
  • Usually, even week 1 deliverables include some code.
  • By week 6, you need enough done at this point for your mentor to evaluate your progress and pass you. Usually, you want to be a bit more than halfway done.
  • By week 11, you may want to "code freeze" and focus on completing any tests and/or documentation.
  • During the final week, you'll be submitting your project.

Related issues

[RFC]: Add BLAS bindings and implementations for linear algebra (tracking issue) #2039
[Idea]: add BLAS bindings and implementations for linear algebra #36

Checklist

  • I have read and understood the Code of Conduct.
  • I have read and understood the application materials found in this repository.
  • I understand that plagiarism will not be tolerated, and I have authored this application in my own words.
  • I have read and understood the patch requirement which is necessary for my application to be considered for acceptance.
  • I have read and understood the stdlib showcase requirement which is necessary for my application to be considered for acceptance.
  • The issue name begins with [RFC]: and succinctly describes your proposal.
  • I understand that, in order to apply to be a GSoC contributor, I must submit my final application to https://summerofcode.withgoogle.com/ before the submission deadline.

Metadata

Metadata

Assignees

No one assigned

    Labels

    20262026 GSoC proposal.received feedbackA proposal which has received feedback.rfcProject proposal.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions