Skip to content

[Feature] Aule-attention #1182

@GreenShadows

Description

@GreenShadows

Feature Summary

Hardware-agnostic FlashAttention implementation No compilation required. Works on any GPU.

Detailed Description

Is it possible to implement? The current flash attention implementation used in SD.cpp degrades performance on AMD terribly.

https://github.com/AuleTechnologies/Aule-Attention

Aule-attention provides a drop-in FlashAttention implementation that works across all major GPU vendors without requiring compilation at install time. It automatically selects the optimal backend for your hardware:

Triton: For AMD ROCm and NVIDIA CUDA (training and inference)
Vulkan: For Intel, Apple, AMD consumer GPUs, and any Vulkan-capable device (inference)
CPU: NumPy fallback for systems without GPU support

Alternatives you considered

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions