https://huggingface.co/google/siglip-large-patch16-256 with ONNX weights to be compatible with Transformers.js.

Usage (Transformers.js)

If you haven't already, you can install the Transformers.js JavaScript library from NPM using:

npm i @xenova/transformers

Example: Zero-shot image classification w/ Xenova/siglip-large-patch16-256:

import { pipeline } from '@xenova/transformers';

const classifier = await pipeline('zero-shot-image-classification', 'Xenova/siglip-large-patch16-256');
const url = 'http://images.cocodataset.org/val2017/000000039769.jpg';
const output = await classifier(url, ['2 cats', '2 dogs'], {
    hypothesis_template: 'a photo of {}',
});
console.log(output);
// [
//   { score: 0.3086719512939453, label: '2 cats' },
//   { score: 0.0000623430241830647, label: '2 dogs' }
// ]

Example: Compute text embeddings with SiglipTextModel.

import { AutoTokenizer, SiglipTextModel } from '@xenova/transformers';

// Load tokenizer and text model
const tokenizer = await AutoTokenizer.from_pretrained('Xenova/siglip-large-patch16-256');
const text_model = await SiglipTextModel.from_pretrained('Xenova/siglip-large-patch16-256');

// Run tokenization
const texts = ['a photo of 2 cats', 'a photo of 2 dogs'];
const text_inputs = tokenizer(texts, { padding: 'max_length', truncation: true });

// Compute embeddings
const { pooler_output } = await text_model(text_inputs);
// Tensor {
//   dims: [ 2, 768 ],
//   type: 'float32',
//   data: Float32Array(1536) [ ... ],
//   size: 1536
// }

Example: Compute vision embeddings with SiglipVisionModel.

import { AutoProcessor, SiglipVisionModel, RawImage} from '@xenova/transformers';

// Load processor and vision model
const processor = await AutoProcessor.from_pretrained('Xenova/siglip-large-patch16-256');
const vision_model = await SiglipVisionModel.from_pretrained('Xenova/siglip-large-patch16-256');

// Read image and run processor
const image = await RawImage.read('https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/football-match.jpg');
const image_inputs = await processor(image);

// Compute embeddings
const { pooler_output } = await vision_model(image_inputs);
// Tensor {
//   dims: [ 1, 768 ],
//   type: 'float32',
//   data: Float32Array(768) [ ... ],
//   size: 768
// }

Note: Having a separate repo for ONNX weights is intended to be a temporary solution until WebML gains more traction. If you would like to make your models web-ready, we recommend converting to ONNX using 🤗 Optimum and structuring your repo like this one (with ONNX weights located in a subfolder named onnx).

Downloads last month
12
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The HF Inference API does not support zero-shot-image-classification models for transformers.js library.

Model tree for Xenova/siglip-large-patch16-256

Quantized
(2)
this model