-
Notifications
You must be signed in to change notification settings - Fork 503
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Feature Summary
Hardware-agnostic FlashAttention implementation No compilation required. Works on any GPU.
Detailed Description
Is it possible to implement? The current flash attention implementation used in SD.cpp degrades performance on AMD terribly.
https://github.com/AuleTechnologies/Aule-Attention
Aule-attention provides a drop-in FlashAttention implementation that works across all major GPU vendors without requiring compilation at install time. It automatically selects the optimal backend for your hardware:
Triton: For AMD ROCm and NVIDIA CUDA (training and inference)
Vulkan: For Intel, Apple, AMD consumer GPUs, and any Vulkan-capable device (inference)
CPU: NumPy fallback for systems without GPU support
Alternatives you considered
No response
Additional context
No response
Puiching-Memory
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request