Cloudflare Docs
Workers AI
Edit this page on GitHub
Set theme to dark (⇧+D)

uform-gen2-qwen-500m

Beta

Model ID: @cf/unum/uform-gen2-qwen-500m

UForm-Gen is a small generative vision-language model primarily designed for Image Captioning and Visual Question Answering. The model was pre-trained on the internal image captioning dataset and fine-tuned on public instructions datasets: SVIT, LVIS, VQAs datasets.

More Information  

​​ Properties

Task Type: Image-to-Text

​​ API Schema

The following schema is based on JSON Schema

Input JSON Schema
{
"oneOf": [
{
"type": "string",
"format": "binary"
},
{
"type": "object",
"properties": {
"image": {
"type": "array",
"items": {
"type": "number"
}
},
"prompt": {
"type": "string"
},
"max_tokens": {
"type": "integer",
"default": 512
}
}
}
]
}
Output JSON Schema
{
"type": "object",
"contentType": "application/json",
"properties": {
"description": {
"type": "string"
}
}
}