uform-gen2-qwen-500m

Beta

Model ID: @cf/unum/uform-gen2-qwen-500m

UForm-Gen is a small generative vision-language model primarily designed for Image Captioning and Visual Question Answering. The model was pre-trained on the internal image captioning dataset and fine-tuned on public instructions datasets: SVIT, LVIS, VQAs datasets.

More Information

Properties

Task Type: Image-to-Text

API Schema

The following schema is based on JSON Schema

Input JSON Schema

{
"oneOf": [  {    "type": "string",    "format": "binary"  },  {    "type": "object",    "properties": {      "image": {        "type": "array",        "items": {          "type": "number"        }      },      "prompt": {        "type": "string"      },      "max_tokens": {        "type": "integer",        "default": 512      }    }  }
]
}

Output JSON Schema

{
"type": "object",
"contentType": "application/json",
"properties": {  "description": {    "type": "string"  }
}
}

uform-gen2-qwen-500m

​​ Properties

​​ API Schema

Properties

API Schema