Skip to content

Workers Bindings

Workers

Workers provides a serverless execution environment that allows you to create new applications or augment existing ones.

To use Workers AI with Workers, you must create a Workers AI binding. Bindings allow your Workers to interact with resources, like Workers AI, on the Cloudflare Developer Platform. You create bindings on the Cloudflare dashboard or by updating your wrangler.toml file.

To bind Workers AI to your Worker, add the following to the end of your wrangler.toml file:

[ai]
binding = "AI" # i.e. available in your Worker on env.AI

Pages Functions

Pages Functions allow you to build full-stack applications with Cloudflare Pages by executing code on the Cloudflare network. Functions are Workers under the hood.

To configure a Workers AI binding in your Pages Function, you must use the Cloudflare dashboard. Refer to Workers AI bindings for instructions.

Methods

async env.AI.run()

async env.AI.run() runs a model. Takes a model as the first parameter, and an object as the second parameter.

const answer = await env.AI.run("@cf/meta/llama-3.1-8b-instruct", {
prompt: "What is the origin of the phrase 'Hello, World'",
});
const answer = await env.AI.run("@cf/meta/llama-3.1-8b-instruct", {
prompt: "What is the origin of the phrase 'Hello, World'",
stream: true,
});
return new Response(answer, {
headers: { "content-type": "text/event-stream" },
});

Parameters

  • model string required

    • The model to run.

    Supported options

    • prompt string optional
      • Text prompt for the text-generation (maxLength: 131072, minLength: 1).
    • raw boolean optional
      • If true, a chat template is not applied and you must adhere to the specific model's expected formatting.
    • stream boolean optional
      • If true, the response will be streamed back incrementally using SSE, Server Sent Events.
    • max_tokens number optional
      • The maximum number of tokens to generate in the response.
    • temperature number optional
      • Controls the randomness of the output; higher values produce more random results (maximum: 5, minimum: 0).
    • top_p number optional
      • Adjusts the creativity of the AI's responses by controlling how many possible words it considers. Lower values make outputs more predictable; higher values allow for more varied and creative responses (maximum: 2, minimum: 0).
    • top_k number optional
      • Limits the AI to choose from the top 'k' most probable words. Lower values make responses more focused; higher values introduce more variety and potential surprises (maximum: 50, minimum: 1).
    • seed number optional
      • Random seed for reproducibility of the generation (maximum: 9999999999, minimum: 1).
    • repetition_penalty number optional
      • Penalty for repeated tokens; higher values discourage repetition (maximum: 2, minimum: 0).
    • frequency_penalty number optional
      • Decreases the likelihood of the model repeating the same lines verbatim (maximum: 2, minimum: 0).
    • presence_penalty number optional
      • Increases the likelihood of the model introducing new topics (maximum: 2, minimum: 0).
    • messages { role: "user" | "assistant" | "system" | "tool" | (string & NonNullable<unknown>); content: string; name?: string; }[] optional * An array of message objects representing the conversation history.
    • tools { type: "function" | (string & NonNullable<unknown>); function: { name: string; description: string; parameters?: { type: "object" | (string & NonNullable<unknown>); properties: { [key: string]: { type: string; description?: string; }; }; required: string[]; }; }; }[] optional * A list of tools available for the assistant to use.
    • functions { name: string; code: string; }[] optional * A list of functions available for the assistant to use.