Skip to main content

Examples

Inference

OpenAI Compatible Server

We host a deployment of ScalarLM on TensorWave for testing. You can access it at the https://llama8btensorwave.cray-lm.com endpoint.

For example, to submit a request to it:

curl https://llama8btensorwave.cray-lm.com/v1/openai/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "meta-llama/Llama-3.1-8B-Instruct",
        "messages": [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Who won the world series in 2020?"}
        ]
    }'

Using the Python client

You can also use the Python client to interact with the ScalarLM server.

import scalarlm

scalarlm.api_url = "https://llama8btensorwave.cray-lm.com"

def get_dataset():
    dataset = []

    count = 4

    for i in range(count):
        dataset.append(f"What is {i} + {i}?")

    return dataset


llm = scalarlm.SupermassiveIntelligence()

dataset = get_dataset()

results = llm.generate(prompts=dataset)

print(results)

Batching Support

ScalarLM supports batching through the python client. Notice in the example above that a list of prompts is provided to the llm.generate call.

ScalarLM will automatically distribute mini-batches of requests to inference GPUs and keep them fully utilized.

The requests are queued by ScalarLM, so you can submit very large numbers of queries. The parallelism and back pressure will automatically be handled by the ScalarLM client and server queues.

Image Batching Support

You can also use the Python client to submit multi-modal (text + image) requests to ScalarLM

import scalarlm

scalarlm.api_url = "https://llama70b.cray-lm.com"

def get_dataset():
    dataset = []

    count = 4

    for i in range(count):
        dataset.append({"text" : "What is in this image?", "image" : get_image(i)})

    return dataset


llm = scalarlm.SupermassiveIntelligence()

dataset = get_dataset()

results = llm.generate(prompts=dataset)

print(results)

def get_image(index):
    # Single-image input inference
    if index < len(images) and images[index] is not None:
        base64_string = pil_to_base64(images[index], format="PNG")
        return base64_string
    else:
        print(f"Invalid index {index} or image is None")
        return None

def pil_to_base64(pil_image, format="PNG"):
    # Create an in-memory bytes buffer
    buffer = BytesIO()

    # Save the image to the buffer in the specified format
    pil_image.save(buffer, format=format)

    # Get the bytes from the buffer
    img_bytes = buffer.getvalue()

    # Encode the bytes as base64
    base64_encoded = base64.b64encode(img_bytes).decode('utf-8')

    return base64_encoded

def download_image(url):
    """Download an image from URL and return as PIL Image"""
    try:
        response = requests.get(url, timeout=10)
        response.raise_for_status()  # Raise an exception for bad status codes
        
        # Convert bytes to PIL Image
        image = Image.open(BytesIO(response.content))
        return image
    except Exception as e:
        print(f"Error downloading image from {url}: {e}")
        return None


# Example URLs - replace these with your actual image URLs
image_urls = [
    "https://picsum.photos/300/200?random=1",
    "https://picsum.photos/300/200?random=2", 
    "https://picsum.photos/300/200?random=3",
    "https://picsum.photos/300/200?random=4"
]

# Download all images
images = []
print("Downloading images...")

for i, url in enumerate(image_urls):
    print(f"Downloading image {i+1} from {url}")
    img = download_image(url)
    images.append(img)
    
    if img:
        print(f"✓ Image {i+1} downloaded successfully - Size: {img.size}")
    else:
        print(f"✗ Failed to download image {i+1}")