LISA-AVS: LISA 7B Model Finetuned on AVS-Bench Dataset

Note: This is an adapted version of the online demo for LISA, where we finetune from scratch the LISA model (7B) with data from AVS-Bench (Search-TTA).

If multiple users are using it at the same time, they will enter a queue, which may delay some time.

Note: Different prompts can lead to significantly varied results.

Note: Please try to standardize your input text prompts to avoid ambiguity, and also pay attention to whether the punctuations of the input are correct.

Usage:
 (1) To let LISA segment something, input prompt like: "Can you segment xxx in this image?", "What is xxx in this image? Please output segmentation mask.";
 (2) To let LISA output an explanation, input prompt like: "What is xxx in this image? Please output segmentation mask and explain why.";
 (3) To obtain solely language output, you can input like what you should do in current multi-modal LLM (e.g., LLaVA).
Hope you can enjoy our work!