Skip to content

add first draft of wikipedia article#21105

Draft
gene-bordegaray wants to merge 5 commits intoapache:mainfrom
gene-bordegaray:issue-21076-wikipedia-draft
Draft

add first draft of wikipedia article#21105
gene-bordegaray wants to merge 5 commits intoapache:mainfrom
gene-bordegaray:issue-21076-wikipedia-draft

Conversation

@gene-bordegaray
Copy link
Contributor

@gene-bordegaray gene-bordegaray commented Mar 23, 2026

Which issue does this PR close?

  1. Go to this page
  2. Click Edit source
  3. Paste dev/wiki/apache-datafusion.wikitext
  4. Click Show preview

@github-actions github-actions bot added the development-process Related to development process of DataFusion label Mar 23, 2026
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @gene-bordegaray -- this looks great. I left some suggestions on how to make some of this language tighter.

Maybe we can wait a few days more and then submit to the wikipedia editors 🤔

@alamb alamb changed the title add first draft of article add first draft of wikipedia article Mar 23, 2026
@gene-bordegaray
Copy link
Contributor Author

also a side note. I wanted to add the DF logo but my account needs to be verified (I think will be in a day or two) 😅

| website = {{URL|https://datafusion.apache.org/}}
}}

'''Apache DataFusion''' is an [[open-source software|open-source]], embeddable analytical query engine written in [[Rust (programming language)|Rust]], built on [[Apache Arrow]]'s columnar memory format.<ref name="sigmod-paper">{{cite journal |last1=Lamb |first1=Andrew |last2=Shen |first2=Yijie |last3=Heres |first3=Daniel |last4=Chakraborty |first4=Jayjeet |last5=Kabak |first5=Mehmet Ozan |last6=Hsieh |first6=Liang-Chi |last7=Sun |first7=Chao |title=Apache Arrow DataFusion: A Fast, Embeddable, Modular Analytic Query Engine |journal=Proceedings of the 2024 International Conference on Management of Data |year=2024 |doi=10.1145/3626246.3653368}}</ref><ref name="intro-docs">{{cite web |title=Introduction |url=https://datafusion.apache.org/user-guide/introduction.html |website=Apache DataFusion |publisher=Apache Software Foundation |access-date=2026-03-22}}</ref> It provides [[SQL]] and DataFrame interfaces for analytical query execution and is designed to be used as a library by developers building databases, query engines, and analytical tools, rather than as a standalone database server.<ref name="sigmod-paper" /><ref name="intro-docs" /> The project originated in 2017, was donated to the [[Apache Arrow]] project in 2019, and became a top-level project of the [[Apache Software Foundation]] in 2024.<ref name="donation-post">{{cite web |title=DataFusion: A Rust-native Query Engine for Apache Arrow |url=https://datafusion.apache.org/blog/2019/02/04/datafusion-donation/ |website=Apache DataFusion Blog |publisher=Apache Software Foundation |date=2019-02-04 |access-date=2026-03-22}}</ref><ref name="asf-tlp">{{cite web |title=Apache Software Foundation Announces New Top-Level Project Apache DataFusion |url=https://news.apache.org/foundation/entry/apache-software-foundation-announces-new-top-level-project-apache-datafusion |website=The ASF Blog |publisher=Apache Software Foundation |date=2024-06-11 |access-date=2026-03-22}}</ref>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It provides [[SQL]] and DataFrame interfaces for analytical query execution and is designed to be used as a library by developers building databases, query engines, and analytical tools, rather than as a standalone database server.

I think we can make this a bit better in the sense of introducing DataFusion and its uniqueness. Here's what I think :

Often described as the "LLVM for Databases," [Source 1] Apache DataFusion is a modular, Arrow-native query engine library designed for embedding into custom systems rather than operating as a monolithic standalone server [Source 2 and 3]. This high-performance Rust framework provides a composable foundation, allowing developers to precisely extend query planning and vectorized execution to meet unique architectural requirements. [Source 2 and 3]

Source 1 : https://midas.bu.edu/assets/slides/andrew_lamb_slides.pdf (cc @alamb )

Source 2 and 3 (this is the first two reference) : {{cite journal |last1=Lamb |first1=Andrew |last2=Shen |first2=Yijie |last3=Heres |first3=Daniel |last4=Chakraborty |first4=Jayjeet |last5=Kabak |first5=Mehmet Ozan |last6=Hsieh |first6=Liang-Chi |last7=Sun |first7=Chao |title=Apache Arrow DataFusion: A Fast, Embeddable, Modular Analytic Query Engine |journal=Proceedings of the 2024 International Conference on Management of Data |year=2024 |doi=10.1145/3626246.3653368}}{{cite web |title=Introduction |url=https://datafusion.apache.org/user-guide/introduction.html |website=Apache DataFusion |publisher=Apache Software Foundation |access-date=2026-03-22}}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if we should add the "LLVM for databases". Mostly because the primary source for it is from not the strongest source (slide show) and doesnt appear in the other sources like the SIGMOD paper or other coverage.

I was reviwing the Wikipedia guidelines and they advise anything promotional unless well-cited which this may get flagged for.

https://en.wikipedia.org/wiki/Wikipedia:Verifiability

| website = {{URL|https://datafusion.apache.org/}}
}}

'''Apache DataFusion''' is an [[open-source software|open-source]], embeddable analytical query engine written in [[Rust (programming language)|Rust]], built on [[Apache Arrow]]'s columnar memory format.<ref name="sigmod-paper">{{cite journal |last1=Lamb |first1=Andrew |last2=Shen |first2=Yijie |last3=Heres |first3=Daniel |last4=Chakraborty |first4=Jayjeet |last5=Kabak |first5=Mehmet Ozan |last6=Hsieh |first6=Liang-Chi |last7=Sun |first7=Chao |title=Apache Arrow DataFusion: A Fast, Embeddable, Modular Analytic Query Engine |journal=Proceedings of the 2024 International Conference on Management of Data |year=2024 |doi=10.1145/3626246.3653368}}</ref><ref name="intro-docs">{{cite web |title=Introduction |url=https://datafusion.apache.org/user-guide/introduction.html |website=Apache DataFusion |publisher=Apache Software Foundation |access-date=2026-03-22}}</ref> It provides [[SQL]] and DataFrame interfaces for analytical query execution and is designed to be used as a library by developers building databases, query engines, and analytical tools, rather than as a standalone database server.<ref name="sigmod-paper" /><ref name="intro-docs" /> The project originated in 2017, was donated to the [[Apache Arrow]] project in 2019, and became a top-level project of the [[Apache Software Foundation]] in 2024.<ref name="donation-post">{{cite web |title=DataFusion: A Rust-native Query Engine for Apache Arrow |url=https://datafusion.apache.org/blog/2019/02/04/datafusion-donation/ |website=Apache DataFusion Blog |publisher=Apache Software Foundation |date=2019-02-04 |access-date=2026-03-22}}</ref><ref name="asf-tlp">{{cite web |title=Apache Software Foundation Announces New Top-Level Project Apache DataFusion |url=https://news.apache.org/foundation/entry/apache-software-foundation-announces-new-top-level-project-apache-datafusion |website=The ASF Blog |publisher=Apache Software Foundation |date=2024-06-11 |access-date=2026-03-22}}</ref>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the project will continue to grow so we can write at the end :

Apache DataFusion now sees over one million monthly downloads. [cite crate.io source]

source : https://crates.io/search?q=datafusion

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also say "as of March 2026, DataFusion saw one million monthly downloads" if we wanted to ensure the sstatement remained accurate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya I think this is great, definitely with the third party source 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

development-process Related to development process of DataFusion

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Write a wikipedia article for Apache DataFusion

3 participants