Multimodal models - Text and image
We support multimodal queries in models that offer these capabilities after the
blueprint of the OpenAI API. We can either add an image-containing message to
the conversation using the append_image_message
method, or we can pass an
image URL directly to the query
method:
# Either: Append image message
conversation.append_image_message(
message="Here is an attached image",
image_url="https://example.com/image.jpg"
)
# Or: Query with image included
msg, token_usage, correction = conversation.query(
"What's in this image?",
image_url="https://example.com/image.jpg"
)
Using local images
Following the recommendations by OpenAI, we can pass local images as
base64-encoded strings. We allow this by setting the local
flag to True
in
the append_image_message
method:
conversation.append_image_message(
message="Here is an attached image",
image_url="my/local/image.jpg",
local=True
)
We also support the use of local images in the query
method by detecting the
netloc of the image URL. If the netloc is empty, we assume that the image is
local and read it as a base64-encoded string:
msg, token_usage, correction = conversation.query(
"What's in this image?",
image_url="my/local/image.jpg"
)
Open-source multimodal models
While OpenAI models work seamlessly, open-source multimodal models can be buggy or incompatible with certain hardware. We have experienced mixed success with open models and, while they are technically supported by BioChatter, their outputs currently may be unreliable.