Skip to content

zhouyuan888888/CARE-Transformer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Official PyTorch implementation of the paper CARE Transformer: Mobile-Friendly Linear Visual Transformer via Decoupled Dual Interaction

CARETrans overview CARETrans details

Data Preparation

  1. Download the ImageNet-1K dataset files (ILSVRC2012_img_train.tar and ILSVRC2012_img_val.tar) and place them in the folder ./data.

  2. Run the following command to preprocess the download files:

    cd ./data && bash extract_ILSVRC.sh
  3. Verify that the extracted dataset follows the following Image folder layout:

    DATA_PATH/
    ├── train/
    │   ├── n01440764/
    │   │   ├── n01440764_10026.JPEG
    │   │   └── ...
    │   └── ...
    └── val/
        ├── n01440764/
        │   ├── ILSVRC2012_val_00000293.JPEG
        │   └── ...
        └── ...
    

Environment Preparation

Before runing the code, please install some necessary packages required by this repository by using the following commands.

conda create -n care python=3.8 && conda activate care
conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.8 -c pytorch -c nvidia
cd CARETrans && pip install -r requirement.txt

Training

  • Use the following command to train your CARE Transformer models. We train CARE Transformers on 12 NVIDIA RTX 4090 GPUs; you can adjust the number of GPUs and other hyperparameters in the script cmd/train.sh according to your hardware and computational resources.
bash cmd/train.sh

Evaluation

  • To evaluate the accuracy of models on the ImageNet-1K dataset, please use the following command:
bash cmd/eval.sh 
  • To evaluate the GMACs, parameters, and GPU latency of models, please use the following command:
bash benchmark.sh
  • To evaluate the latency of models on mobile device, please first convert the model from PyTorch to the .mlmodel format:

Then, following EfficientFormer and coreml-performance, use coreml/coreml-performance to evaluate the model on xcode.

Note that our well-trained checkpoints are provided in Google Drive. Download the checkpoints and place the folder ./ckpt under ./CARETrans.

The results will be around the following.

Method Type GMACs Params (M) iPhone13 (ms) Intel i9 (ms) RTX 4090 (ms) Top-1 Acc (%)
MLLA-T LA+CONV 4.2 25.0 5.1 21.3 51.5 83.5
CARE-S0 LA+CONV 0.7 7.3 1.1 4.3 9.8 78.4
CARE-S1 LA+CONV 1.0 9.6 1.4 6.6 14.2 80.1
CARE-S2 LA+CONV 1.9 19.5 2.0 9.4 20.4 82.1

Bibtex

If you find CARE Transformer: Mobile-Friendly Linear Visual Transformer via Decoupled Dual Interaction useful, please cite:

@inproceedings{zhou2025care,
  title={CARE Transformer: Mobile-Friendly Linear Visual Transformer via Decoupled Dual Interaction},
  author={Zhou, Yuan and Xu, Qingshan and Cui, Jiequan and Zhou, Junbao and Zhang, Jing and Hong, Richang and Zhang, Hanwang},
  booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
  pages={20135--20145},
  year={2025}
}

Acknowledgment

Our code is based on pytorch-image-models, poolformer, ConvNeXt, inceptionnext, and metaformer.

About

CARE Transformer: Mobile-Friendly Linear Visual Transformer via Decoupled Dual Interaction

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published