license: cc-by-nc-4.0
base_model: Mitsua/swin-base-multi-fractal-1k
pipeline_tag: image-classification
library_name: transformers
Model Card for Mitsua Japanese Tagger
ミツアちゃんと日本語で楽しくタグ付けするためのモデルです。本モデルは以下のステップでスクラッチ学習しました。
- 100万枚のカラーフラクタル図形の画像で事前学習しました。事前学習モデルをSwin Base Multi Fractal 1kとして公開しました。
- オプトインの許諾済みデータ、オープンライセンスのデータ、パブリックドメインのデータでファインチューニングしました。
本モデルの学習では学習済みの基盤モデルは使用しておらず、ライセンスされていないデータや、AI生成画像などのライセンスされていないデータで学習したAIモデルの出力も学習データとして使用していません。 本モデルは、CC BY-NCライセンスに基づき、非商用の目的で使用していただく事が可能です。商用利用についてはinfo [at] elanmitsua.comまでお問い合わせください。
Swin Transformer model for Japanese image tagging, trained solely on opt-in licensed data, openly licensed data and public domain data. This is finetuned checkpoint from Swin Base Multi Fractal 1k, which is trained solely on formula driven fractal images. This model is licensed under CC BY-NC and is freely used for non-commercial, research and educational purposes. For commercial use, please contact us: info [at] elanmitsua.com
Model Details
- Developed by: ELAN MITSUA Project / Abstract Engine
- Model type: Multi label classification
- License: CC BY-NC 4.0
- For commercial use, please contact us: info [at] elanmitsua.com
Usage
from transformers import pipeline
pipe = pipeline("image-classification", model="Mitsua/mitsua-japanese-tagger")
ret = pipe("test.jpg", function_to_apply="sigmoid", top_k=100)
print(ret)
Official Public Characters
We have obtained official permission to train these Japanese fictional characters. The dataset includes official images and fan arts from opt-in contributors.
公式の許可を得て、以下のキャラクターの公式提供画像及びオプトイン参加者のファンアートを学習しています。
Mitsua Contributors Credit (Opt-in)
- 霧太郎/HAnS N Erhard, pikurusu39, Hussini, 灯坂アキラ, ムスビイト, ネセヨレワ, 亞襲, E-Ken, とまこ, Nr. N, RI-YAnks, mkbt, 夢観士, 最中亜梨香/中森あか, KIrishusei, 長岡キヘイ, username_Kk32056, 相生創, amabox, 柊 華久椰, nog, 加熱九真, 嘯(しゃお), 夢前黎, みきうさぎ, るな, テラ リソース / Tera Resource (素材系サークル), 力ナディス, 野々村のの, とあ, Roach=Jinx, ging ging.jpeg, 莉子, 毛玉, 寝てる猫, ぽーたー, やえした みえ, mizuchi, 262111, 乙幡皇斗羽, とどめの35番, 明煉瓦, ゆう, 桐生星斗(投稿物生成物使用自由), WAYA, rcc, ask, L, 弐人, Sulphuriy, 602e, 石川すゐす, cha, 中屋, IRICOMIX, 琵來山まろり(画像加工可), とりとめ, 鏡双司, えれいた, mariedoi, あると, aaa05302, netai98, らどん, 脂質, ろすえん, 善良, つあ🌠, UranosEBi, YR, lenbrant, 長谷川, 輝竜司 / citrocube, 詩原るいか, 末広うた, 翠泉, 月波 清火, ゆぬ, 駒込ぴぺっこ, 原動機, ふわふわわ
敬称略/Honorific titles are omitted.
Training Data
Our dataset is a mix of opt-in licensed data and openly licensed data. Pre-filtering based on metadata and captions are applied to exclude potential rights-infringing, harmful or NSFW data. For pre-filtering data, we built 146,041 words database which contains artist names, celebrity names, fictional character names, trademarks and bad words, based on Wikidata licensed under CC0. We pre-process with face-blurring.
- "Mitsua Likes" Dataset : Our licensed data from opt-in contributors
- Contributors Credit (Attribution)
- All training data can be browsed on our Discord server "Mitsua Contributors"
- All contributors were screened upon entry and all submitted images were human verified.
- AI generated contents detector is used to exclude potential AI generated images.
- "3R" and "3RG" licensed images and its captions are used to train this model.
- Poly Haven HDRI images licensed under CC0 are used to augment background composition.
- Localized Narratives (CC BY 4.0)
- Jordi Pont-Tuset, Jasper Uijlings, Soravit Changpinyo, Radu Soricut, and Vittorio Ferrari, "Connecting Vision and Language with Localized Narratives" ECCV (Spotlight), 2020
- A subset of images licensed under CC BY 2.0 are used for training.
- Finally 642,789 images are used for training. All attributons are found here.
- STAIR Captions (CC BY 4.0)
- Yuya Yoshikawa, Yutaro Shigeto, and Akikazu Takeuchi, “STAIR Captions: Constructing a Large-Scale Japanese Image Caption Dataset”, Annual Meeting of the Association for Computational Linguistics (ACL), Short Paper, 2017.
- A subset of images licensed under CC BY 2.0, CC BY-SA 2.0 are used for training.
- Finally 26,164 images are used for training. All attributons are found here.
- Wikidata Dataset (CC0)
- We built new dataset based on Wikidata structured data licensed under CC0 and Wikimedia Commons CC0 / public domain images.
- We used "depicts" and "made from material" property for training.
- To check if it is in the public domain, Wikimedia Commons category tags and Wikidata artist property were used together.
- Only images that were in the public domain in at least all of the source of origin country, Japan, EU, and the United States were used.
- Finally 267,573 images are used for training. All attributions are found here.
- Even if the dataset itself is CC-licensed, we did not use it if the image contained in the dataset is not properly licensed, is based on unauthorized use of copyrighted works, or is based on the synthetic data output of other pretrained models.
- English captions are translated into Japanese using ElanMT model which is trained solely on openly licensed corpus.
Disclaimer
- 免責事項:認識結果は不正確で、有害であったりバイアスがかかっている可能性があります。本モデルは比較的小規模でライセンスされたデータのみで達成可能な性能を調査するために開発されたモデルであり、認識の正確性が必要なユースケースでの使用には適していません。絵藍ミツアプロジェクト及び株式会社アブストラクトエンジンはCC BY-NC 4.0ライセンス第5条に基づき、本モデルの使用によって生じた直接的または間接的な損失に対して、一切の責任を負いません。
- Disclaimer: The recognition result may be very incorrect, harmful or biased. The model was developed to investigate achievable performance with only a relatively small, licensed data, and is not suitable for use cases requiring high recognition accuracy. Under Section 5 of the CC BY-NC 4.0 License, ELAN MITSUA Project / Abstract Engine is not responsible for any direct or indirect loss caused by the use of the model.