Compiling AI Functions

Function is primarily designed to compile AI inference functions to run on-device. We will walk through the general workflow required to compile these functions.

Defining an AI Function

Let’s begin with a function that classifies an image, returning the label along with a confidence score. To do so, we will use the MobileNet v2 model from torchvision:

ai.py

from PIL import Image
from torch import argmax, inference_mode, softmax, randn
from torchvision.models import mobilenet_v2, MobileNet_V2_Weights
from torchvision.transforms import functional as F

weights = MobileNet_V2_Weights.DEFAULT
model = mobilenet_v2(weights=weights)
model.eval()

@inference_mode()
def predict (image: Image.Image) -> tuple[str, float]:
    """Classify an image."""
    # Preprocess
    image = image.convert("RGB")
    image = F.resize(image, 224)
    image = F.center_crop(image, 224)
    image_tensor = F.to_tensor(image)
    normalized_tensor = F.normalize(
        image_tensor,
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]
    )
    # Run model
    logits = model(normalized_tensor[None])
    scores = softmax(logits, dim=1)
    idx = argmax(scores, dim=1)
    score = scores[0,idx].item()
    label = weights.meta["categories"][idx]
    # Return
    return label, score

The code above has nothing to do with Function. It is plain PyTorch code.

Compiling the AI Function

There are a few steps needed to prepare an AI function for compilation:

In this section, required changes to the above code are highlighted.

Decorating the Function

First, apply the @compile decorator to the function to prepare it for compilation:

ai.py

from fxn import compile
from PIL import Image
from torch import argmax, inference_mode, softmax, randn
from torchvision.models import mobilenet_v2, MobileNet_V2_Weights
from torchvision.transforms import functional as F

weights = MobileNet_V2_Weights.DEFAULT
model = mobilenet_v2(weights=weights)
model.eval()

@compile(
    tag="@yusuf/classify-image",
    description="Classify an image with AI."
)
@inference_mode()
def predict (image: Image.Image) -> tuple[str, float]:
    """Classify an image."""
    # Preprocess
    image = image.convert("RGB")
    image = F.resize(image, 224)
    image = F.center_crop(image, 224)
    image_tensor = F.to_tensor(image)
    normalized_tensor = F.normalize(
        image_tensor,
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]
    )
    # Run model
    logits = model(normalized_tensor[None])
    scores = softmax(logits, dim=1)
    idx = argmax(scores, dim=1)
    score = scores[0,idx].item()
    label = weights.meta["categories"][idx]
    # Return
    return label, score

Defining the Compiler Sandbox

Depending on how you run AI inference, you will likely have to install libraries (e.g. PyTorch) and/or upload model weights. To do so, create a Sandbox:

ai.py

from fxn import compile, Sandbox
from PIL import Image
from torch import argmax, inference_mode, softmax, randn
from torchvision.models import mobilenet_v2, MobileNet_V2_Weights
from torchvision.transforms import functional as F

weights = MobileNet_V2_Weights.DEFAULT
model = mobilenet_v2(weights=weights)
model.eval()

@compile(
    tag="@yusuf/classify-image",
    description="Classify an image with AI.",
    sandbox=Sandbox().pip_install("torch", "torchvision")
)
@inference_mode()
def predict (image: Image.Image) -> tuple[str, float]:
    """Classify an image."""
    # Preprocess
    image = image.convert("RGB")
    image = F.resize(image, 224)
    image = F.center_crop(image, 224)
    image_tensor = F.to_tensor(image)
    normalized_tensor = F.normalize(
        image_tensor,
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]
    )
    # Run model
    logits = model(normalized_tensor[None])
    scores = softmax(logits, dim=1)
    idx = argmax(scores, dim=1)
    score = scores[0,idx].item()
    label = weights.meta["categories"][idx]
    # Return
    return label, score

Specifying an Inference Backend

Let’s use the ONNXRuntime inference backend to run the AI model:

ai.py

from fxn import compile, Sandbox
from fxn.beta import ONNXInferenceMetadata
from PIL import Image
from torch import argmax, inference_mode, softmax, randn
from torchvision.models import mobilenet_v2, MobileNet_V2_Weights
from torchvision.transforms import functional as F

weights = MobileNet_V2_Weights.DEFAULT
model = mobilenet_v2(weights=weights)
model.eval()

@compile(
    tag="@yusuf/classify-image",
    description="Classify an image with AI.",
    sandbox=Sandbox().pip_install("torch", "torchvision"),
    metadata=[
      ONNXInferenceMetadata(
        model=model,
        model_args=(randn(1,3,224,224),)
      )
    ]
)
@inference_mode()
def predict (image: Image.Image) -> tuple[str, float]:
    """Classify an image."""
    # Preprocess
    image = image.convert("RGB")
    image = F.resize(image, 224)
    image = F.center_crop(image, 224)
    image_tensor = F.to_tensor(image)
    normalized_tensor = F.normalize(
        image_tensor,
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]
    )
    # Run model
    logits = model(normalized_tensor[None])
    scores = softmax(logits, dim=1)
    idx = argmax(scores, dim=1)
    score = scores[0,idx].item()
    label = weights.meta["categories"][idx]
    # Return
    return label, score

Compiling the Function

Now, compile the function using the Function CLI:

# Compile the AI function
$ fxn compile --overwrite ai.py

Inference Backends

Function supports a fixed set of backends for running AI inference. You must opt in to using an inference backend for a specific model by providing inference metadata. The provided metadata will allow the Function compiler to lower the inference operation to native code.

Supported Backends

Below are supported inference metadata types:

ONNX

Use the OnnxInferenceMetadata metadata type to compile a PyTorch nn.Module to ONNX:

ai.py

from fxn.beta import OnnxInferenceMetadata

# Create ONNX inference metadata
metadata = OnnxInferenceMetadata(
    model=pytorch_model,
    model_args=(example_input,)
)

OnnxRuntime

Use the OnnxRuntimeInferenceSessionMetadata metadata type to compile an OnnxRuntime InferenceSession and run it with OnnxRuntime:

ai.py

from fxn.beta import OnnxRuntimeInferenceSessionMetadata

# Create OnnxRuntime inference metadata
metadata = OnnxRuntimeInferenceSessionMetadata(
    session=ort_session,
    model_path="/path/to/model.onnx"
)

The model must exist at the provided model_path in the compiler sandbox.

TensorRT

Use the TensorRTInferenceMetadata metadata type to compile a PyTorch nn.Module to TensorRT:

ai.py

from fxn.beta import TensorRTInferenceMetadata

# Create TensorRT inference metadata
metadata = TensorRTInferenceMetadata(
    model=pytorch_model,
    model_args=(example_input,),
    cuda_arch="sm_100",
    precision="int4"
)

The TensorRT inference backend is only available on Linux and Windows devices with compatible Nvidia GPUs.

We are working on adding support for consumer RTX GPUs with TensorRT for RTX.

Target CUDA Architectures

TensorRT engines must be compiled for specific target CUDA architectures. Below are CUDA architectures that our compiler supports:

CUDA Architecture	GPU Family
`sm_80`	Ampere (e.g. A100)
`sm_86`	Ampere
`sm_87`	Ampere
`sm_89`	Ada Lovelace (e.g. L40S)
`sm_90`	Hopper (e.g. H100)
`sm_100`	Blackwell (e.g. B200)

TensorRT Inference Precision

TensorRT allows for specifying the inference engine’s precision. Below are supported precision modes:

Precision	Notes
`fp32`	32-bit single precision inference.
`fp16`	16-bit half precision inference.
`int8`	8-bit quantized integer inference.

CoreML

Use the CoreMLInferenceMetadata metadata type to compile a PyTorch nn.Module to CoreML:

ai.py

from fxn.beta import CoreMLInferenceMetadata

# Create CoreML inference metadata
metadata = CoreMLInferenceMetadata(
    model=pytorch_model,
    model_args=(example_input,)
)

The CoreML inference backend is only available on iOS, macOS, and visionOS devices.

LiteRT (TensorFlow Lite)

Use the LiteRTInferenceMetadata metadata type to compile a PyTorch nn.Module to LiteRT:

ai.py

from fxn.beta import LiteRTInferenceMetadata

# Create LiteRT inference metadata
metadata = LiteRTInferenceMetadata(
    model=pytorch_model,
    model_args=(example_input,)
)

The LiteRT inference backend is only available on Android devices.

QNN

Use the QnnInferenceMetadata metadata type to compile a PyTorch nn.Module to a Qualcomm QNN context binary:

ai.py

from fxn.beta import QnnInferenceMetadata

# Create QNN inference metadata
metadata = QnnInferenceMetadata(
    model=pytorch_model,
    model_args=(example_input,),
    backend="gpu",
    quantization=None
)

The QNN inference backend is only available on Android and Windows devices with Qualcomm processors.

QNN Hardware Backends

QNN requires that a hardware device backend is specified ahead of time. Below are supported backends:

Backend	Notes
`cpu`	Reference `aarch64` CPU backend.
`gpu`	Adreno GPU backend, accelerated by OpenCL.
`htp`	Hexagon NPU backend.

Learn more about QNN hardware backends.

QNN Model Quantization

When using the htp backend, you must specify a model quantization mode as the Hexagon NPU only supports running integer-quantized models. Below are supported quantization modes:

Quantization	Notes
`w8a8`	Weights and activations are quantized to `uint8`.
`w8a16`	Weights are quantized to `uint8` while activations are quantized to `uint16`.
`w4a8`	Weights are quantized to `uint4` while activations are quantized to `uint8`.
`w4a16`	Weights are quantized to `uint4` while activations are quantized to `uint16`.

OpenVINO

Use the OpenVINOInferenceMetadata metadata type to compile a PyTorch nn.Module to OpenVINO IR:

ai.py

from fxn.beta import OpenVINOInferenceMetadata

# Create OpenVINO inference metadata
metadata = OpenVINOInferenceMetadata(
    model=pytorch_model,
    model_args=(example_input,)
)

At runtime, the OpenVINO IR will be used for inference with the OpenVINO toolkit.

The OpenVINO inference backend is only available on Linux and Windows x86_64 devices with Intel processors.

Llama.cpp

MIGraphX

A single model can be lowered to use multiple inference backends. Simply provide multiple metadata instances that refer to the model.

Request a Backend

We are always looking to add support for new inference backends. So if there is an inference backend you would like to see supported in Function, please reach out to us.

Get Started

Making Predictions

Creating Predictors

Insiders

Defining an AI Function

Compiling the AI Function

Decorating the Function

Defining the Compiler Sandbox

Specifying an Inference Backend

Compiling the Function

Inference Backends

Supported Backends

Target CUDA Architectures

TensorRT Inference Precision

QNN Hardware Backends

QNN Model Quantization

Request a Backend

Get Started

Making Predictions

Creating Predictors

Insiders

​Defining an AI Function

​Compiling the AI Function

​Decorating the Function

​Defining the Compiler Sandbox

​Specifying an Inference Backend

​Compiling the Function

​Inference Backends

​Supported Backends

​Request a Backend

Defining an AI Function

Compiling the AI Function

Decorating the Function

Defining the Compiler Sandbox

Specifying an Inference Backend

Compiling the Function

Inference Backends

Supported Backends

Request a Backend