A meta-dataset for
few-shot image classification

group fairness few-shot learning meta-learning continual learning transfer learning image classification

About

Stylized Meta-Album (SMA) is a new meta-dataset, consisting of 24 datasets(12 content datasets and 12 stylized datasets), designed to enhance studies on out-of-distribution generalization and related topics. SMA is constructed using style transfer techniques from 12 subject classification datasets (content datasets), SMA provides a diverse and extensive set of 4800 groups combining various subjects (objects, plants, animals, human actions, textures) with multiple styles. This scale facilitates rigorous and expansive research that traditional datasets, which offer limited group variability and/or class diversity, cannot support. SMA is a continuously growing meta-dataset. See our datasets in Datasets Section.

We repurposed datasets that were generously made available by Meta-Album , see credits page. All datasets are free for use for academic purposes, provided that proper credits are given. For your convenience, you may cite our paper, which references all original creators.

We have summarized the data generation in the flow diagram below:

License

Stylized Meta-Album is released under a CC BY-NC 4.0 license permitting non-commercial use for research purposes, provided that you cite us. Additionally, redistributed datasets have their own license, see the credits page. All resources made available through this website provided “as is”. The curators of Stylized Meta-Album (and their home institutions and their sponsors) who have worked on its preparation, this website, the code provided to read, process data, and run baseline methods, make no warranties concerning the licensed material, including fitness for any purpose, non-infringement absence of defects or errors, accuracy, and they decline any liability for losses or other possible consequences that may arise by using such material.
This briefly summarizes the terms of the license CC BY-NC 4.0 and the disclaimer (that the license includes).

Recommended use

The recommended use of Stylized Meta-Album is to conduct fundamental research on machine learning algorithms and conduct benchmarks, particularly in: group fairness, few-shot learning, meta-learning, continual learning, transfer learning, and image classification.

Code

We provide code in our GitHub Repository for

  1. Data processing
  2. Data formatting
  3. Quality control
  4. Stylized Meta-Album use cases

Visit our GitHub repository for more details.

Stylized Meta-Album GitHub Repository

Datasets

We list in the tables below the data statistics of the SMA datasets. We chose 12 datasets from Meta-Album (content datasets). We scrapped a style dataset from the internet and we stylized the content datasets with the style dataset to get 12 stylized datasets. Each instance in these datasets is an image of 256x256 pixel images. Both Content and Stylized datasets are available in 2 versions:

  1. Meta-Album Extended: all classes and all images per class
  2. Stylized Meta-Album Mini: same as Stylized Meta-Album Extended, but we randomly sampled only 40 examples for each class (hence the datasets are class-balanced).
Stylized Datasets
Dataset ID Domain # Classes # Images More
APL_STY Vehicles 400204,100 Details
AWA_STY Large Aninamls 400437,820 Details
BRD_STY Large Aninamls 40090,620 Details
DOG_STY Large Aninamls 40084,880 Details
INS2_STY Small Aninamls 400600,000 Details
MED_LF_STY Plant Diseases 40027,900 Details
PLT_DOC_STY Plant Diseases 40042,620 Details
PLT_NET_STY Plants 400600,000 Details
RESISC_STY Remote Sensing 400280,000 Details
RSICB_STY Remote Sensing 400373,820 Details
SPT_STY Human Actions 40070,220 Details
TEX_DTD_STY Manufacturing 40048,000 Details
Content Datasets (Meta-Album Datasets)
Dataset ID Domain # Classes # Images More
APL Vehicles 2010,205 Details
AWA Large Aninamls 2021,891 Details
BRD Large Aninamls 204,531 Details
DOG Large Aninamls 204,244 Details
INS2 Small Aninamls 2030,000 Details
MED_LF Plant Diseases 201,395 Details
PLT_DOC Plant Diseases 202,135 Details
PLT_NET Plants 2030,000 Details
RESISC Remote Sensing 2014,000 Details
RSICB Remote Sensing 2018,691 Details
SPT Human Actions 203,511 Details
TEX_DTD Manufacturing 202,400 Details

Citation

If you are using Stylized Meta-Album, cite our papers as mentioned below:

  @inproceedings{stylized-meta-album-2024,
    title={Stylized Meta-Album: Muti-domain computer vision meta-dataset},
    author={Mussard, Romain and Gauffre, Aurélien and Ullah, Ihsan and Khuong, Thanh Gia Hieu and Amini, Massih-Reza and Hosoya, Lisheng Sun},
    booktitle={Journal of Data-centric Machine Learning Research (DMLR)},
    url = {https://stylized-meta-album.github.io/},
    year = {2024}
  }

  @inproceedings{meta-album-2022,
    title={Meta-Album: Multi-domain Meta-Dataset for Few-Shot Image Classification},
    author={Ullah, Ihsan and Carrion, Dustin and Escalera, Sergio and Guyon, Isabelle M and Huisman, Mike and Mohr, Felix and van Rijn, Jan N and Sun, Haozhe and Vanschoren, Joaquin and Vu, Phan Anh},
    booktitle={Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
    url = {https://meta-album.github.io/},
    year = {2022}
  }
            
Download as bib Stylized Meta-Album Paper

Contributors and Institutions

Sponsors