Python Adapter

Create custom integrations by uploading a Python script that implements your model logic. This approach gives you complete flexibility to test any AI application.

Use Cases

Proprietary models or custom-built AI systems
Applications with unique preprocessing or postprocessing requirements
Models accessed through non-standard authentication or invocation methods
Local inference pipelines

Configuration

Field	Required	Description
Application Name	Yes	Unique identifier (e.g., `my-python-adapter-app`)
Application Template File	Yes	Python file (max 5MB)
Input Modalities	Yes	Text, Image, Video
Output Modalities	Yes	Text, Image, Video

Template Structure

Your Python script must implement a chat function with this signature:

def chat(chats):
    """
    Process chat messages and return a response.

    Args:
        chats: List of message dictionaries with 'role' and 'content' keys
               Example: [{"role": "user", "content": "Hello"}]

    Returns:
        str: The model's response text
    """
    # Your implementation here
    pass

Example Implementation

import os
from transformers import pipeline, LlamaTokenizer, LlamaForCausalLM

# Step 1: Load the LLaMA model and tokenizer
model_name = "meta-llama/Meta-Llama-3-8B"
tokenizer = LlamaTokenizer.from_pretrained(model_name)
model = LlamaForCausalLM.from_pretrained(model_name)

# Initialize the Hugging Face pipeline with the model and tokenizer
chatbot = pipeline("text-generation", model=model, tokenizer=tokenizer)

# Don't change the name of the function or the function signature
def chat(chats):
    """
    Generates a response from the language model based on a given list of chat messages.

    Parameters:
    chats (list): A list of dictionaries representing a conversation. Each dictionary contains:
        - "role": Either "user" or "assistant".
        - "content": A string (text message) or a list of dictionaries for multimodal input.
        - The last entry should have a "prompt" without an "answer".

    Returns:
    str: The response from the language model.
    """

    # Step 2: Generate the model's response
    response = chatbot(chats, max_length=1000, num_return_sequences=1)

    # Step 3: Extract and return the generated text
    generated_text = response[0]['generated_text']
    assistant_response = generated_text.split("Assistant:")[-1].strip()

    return assistant_response

Handling Different Modalities

Input Processing

The chats parameter follows OpenAI's chat format, where multimodal content is represented as a list of dictionaries within the content field.

Text Input
Image Input
Video Input

# Simple text message
{
    "role": "user",
    "content": "What is the capital of France?"
}

# Image content (Base64 encoded)
{
    "role": "user",
    "content": [
        {"type": "text", "text": "What is in this image?"},
        {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,{base64_image}"}}
    ]
}

# Video content (Base64 encoded)
{
    "role": "user",
    "content": [
        {"type": "text", "text": "Describe this video"},
        {"type": "video_url", "video_url": {"url": "data:video/mp4;base64,{base64_video}"}}
    ]
}

Processing Mixed Input Types

def chat(chats):
    """
    Handle different input modalities in the chat function
    """
    try:
        # Process each message to handle different modalities
        for message in chats:
            if "content" in message and isinstance(message["content"], list):
                # Handle multimodal content
                for i, content_item in enumerate(message["content"]):
                    # Handle video input
                    if content_item.get("type") == "video_url" and "url" in content_item.get("video_url", {}):
                        video_url = content_item["video_url"]["url"]
                        if video_url.startswith("data:video/"):
                            base64_data = video_url.split(",", 1)[1]
                            # Process video data as needed for your model

                    # Handle image input
                    elif content_item.get("type") == "image_url" and "url" in content_item.get("image_url", {}):
                        image_url = content_item["image_url"]["url"]
                        if image_url.startswith("data:image/"):
                            base64_data = image_url.split(",", 1)[1]
                            # Process image data as needed for your model

                    # Handle text input
                    elif content_item.get("type") == "text":
                        text_content = content_item.get("text", "")
                        # Process text as needed

        # Your model processing logic here
        return generated_response

    except Exception as e:
        return f"Error processing input: {str(e)}"

Output Processing

The return value from your chat function depends on your model's output modalities:

Text Output
Image Output
Video Output

# Simply return a string containing the generated text.
return "The capital of France is Paris."

# Return the image as a Base64 encoded string (without the data URI prefix).

def chat(chats):
    try:
        prompt = chats[0]['content']

        # Generate image using your image generation model
        response = client.images.generate(
            prompt=prompt,
            model="black-forest-labs/FLUX.1-schnell",
            width=1024,
            height=768,
            steps=4,
            n=1,
            response_format="b64_json",
            stop=[]
        )

        # Return the base64 encoded image
        # Frontend will prepend "data:image/jpeg;base64," as needed
        return response.data[0].b64_json

    except Exception as e:
        return f"Error during image generation: {str(e)}"

# Return the video as a Base64 encoded string (without the data URI prefix).

import base64
import os
import uuid
from diffusers.utils import export_to_video

def chat(chats):
    try:
        prompt = chats[0]['content']

        # Generate video using your video generation model
        video_frames = pipe(
            prompt=prompt,
            width=1024,
            height=576,
            num_frames=49,
            num_inference_steps=50,
        ).frames[0]

        # Export video to a temporary file
        temp_dir = "temp_videos"
        os.makedirs(temp_dir, exist_ok=True)
        temp_filename = os.path.join(temp_dir, f"video_{uuid.uuid4()}.mp4")

        export_to_video(video_frames, temp_filename, fps=7)

        # Read the video file and encode to base64
        with open(temp_filename, "rb") as video_file:
            video_bytes = video_file.read()
            video_base64 = base64.b64encode(video_bytes).decode('utf-8')

        # Clean up the temporary file
        os.remove(temp_filename)

        # Return raw base64 string
        return video_base64

    except Exception as e:
        return f"Error during video generation: {str(e)}"

Important Notes

Input Format: All media inputs (images and videos) are provided as Base64-encoded data URIs following the OpenAI chat format
Output Format: For media outputs, return only the Base64-encoded string without the data URI prefix (e.g., without data:image/jpeg;base64,)
Error Handling: Always implement proper error handling and cleanup for temporary files

Best Practices

Error Handling: Wrap API calls in try-except blocks
Timeouts: Set reasonable timeouts for external requests
Dependencies: Use standard libraries when possible (requests, json, etc.)
Security: Never hardcode credentials in your script

Setup Steps

Navigate to AI Applications → New Application
Select Custom Applications tab
Click Python Adapter
Enter your Application Name
Upload your Python script (.py file)
Select Input Modalities (Text, Image, Video)
Select Output Modalities (Text, Image, Video)
Review and submit

Use Cases​

Configuration​

Template Structure​

Example Implementation​

Handling Different Modalities​

Input Processing​

Processing Mixed Input Types​

Output Processing​

Important Notes​

Best Practices​

Setup Steps​