Function is primarily designed to compile AI inference functions to run on-device. We will walk through the general
workflow required to compile these functions.
Defining an AI Function
Let’s begin with a function that classifies an image, returning
the label along with a confidence score. To do so, we will use the MobileNet v2 model from
torchvision
:
from PIL import Image
from torch import argmax, inference_mode, softmax, randn
from torchvision.models import mobilenet_v2, MobileNet_V2_Weights
from torchvision.transforms import functional as F
weights = MobileNet_V2_Weights. DEFAULT
model = mobilenet_v2( weights = weights)
model.eval()
@inference_mode ()
def predict ( image : Image.Image) -> tuple[ str , float ]:
"""Classify an image."""
# Preprocess
image = image.convert( "RGB" )
image = F.resize(image, 224 )
image = F.center_crop(image, 224 )
image_tensor = F.to_tensor(image)
normalized_tensor = F.normalize(
image_tensor,
mean = [ 0.485 , 0.456 , 0.406 ],
std = [ 0.229 , 0.224 , 0.225 ]
)
# Run model
logits = model(normalized_tensor[ None ])
scores = softmax(logits, dim = 1 )
idx = argmax(scores, dim = 1 )
score = scores[ 0 ,idx].item()
label = weights.meta[ "categories" ][idx]
# Return
return label, score
See all 30 lines
The code above has nothing to do with Function. It is plain PyTorch code.
Compiling the AI Function
There are a few steps needed to prepare an AI function for compilation:
In this section, required changes to the above code are highlighted.
Decorating the Function
First, apply the @compile
decorator to the function to prepare it for compilation:
from fxn import compile
from PIL import Image
from torch import argmax, inference_mode, softmax, randn
from torchvision.models import mobilenet_v2, MobileNet_V2_Weights
from torchvision.transforms import functional as F
weights = MobileNet_V2_Weights. DEFAULT
model = mobilenet_v2( weights = weights)
model.eval()
@ compile (
tag = "@yusuf/classify-image" ,
description = "Classify an image with AI."
)
@inference_mode ()
def predict ( image : Image.Image) -> tuple[ str , float ]:
"""Classify an image."""
# Preprocess
image = image.convert( "RGB" )
image = F.resize(image, 224 )
image = F.center_crop(image, 224 )
image_tensor = F.to_tensor(image)
normalized_tensor = F.normalize(
image_tensor,
mean = [ 0.485 , 0.456 , 0.406 ],
std = [ 0.229 , 0.224 , 0.225 ]
)
# Run model
logits = model(normalized_tensor[ None ])
scores = softmax(logits, dim = 1 )
idx = argmax(scores, dim = 1 )
score = scores[ 0 ,idx].item()
label = weights.meta[ "categories" ][idx]
# Return
return label, score
See all 35 lines
Defining the Compiler Sandbox
Depending on how you run AI inference, you will likely have to install libraries (e.g. PyTorch) and/or upload model
weights. To do so, create a Sandbox
:
from fxn import compile , Sandbox
from PIL import Image
from torch import argmax, inference_mode, softmax, randn
from torchvision.models import mobilenet_v2, MobileNet_V2_Weights
from torchvision.transforms import functional as F
weights = MobileNet_V2_Weights. DEFAULT
model = mobilenet_v2( weights = weights)
model.eval()
@ compile (
tag = "@yusuf/classify-image" ,
description = "Classify an image with AI." ,
sandbox = Sandbox().pip_install( "torch" , "torchvision" )
)
@inference_mode ()
def predict ( image : Image.Image) -> tuple[ str , float ]:
"""Classify an image."""
# Preprocess
image = image.convert( "RGB" )
image = F.resize(image, 224 )
image = F.center_crop(image, 224 )
image_tensor = F.to_tensor(image)
normalized_tensor = F.normalize(
image_tensor,
mean = [ 0.485 , 0.456 , 0.406 ],
std = [ 0.229 , 0.224 , 0.225 ]
)
# Run model
logits = model(normalized_tensor[ None ])
scores = softmax(logits, dim = 1 )
idx = argmax(scores, dim = 1 )
score = scores[ 0 ,idx].item()
label = weights.meta[ "categories" ][idx]
# Return
return label, score
See all 36 lines
Specifying an Inference Backend
Let’s use the ONNXRuntime inference backend to run the AI model:
from fxn import compile , Sandbox
from fxn.beta import ONNXInferenceMetadata
from PIL import Image
from torch import argmax, inference_mode, softmax, randn
from torchvision.models import mobilenet_v2, MobileNet_V2_Weights
from torchvision.transforms import functional as F
weights = MobileNet_V2_Weights. DEFAULT
model = mobilenet_v2( weights = weights)
model.eval()
@ compile (
tag = "@yusuf/classify-image" ,
description = "Classify an image with AI." ,
sandbox = Sandbox().pip_install( "torch" , "torchvision" ),
metadata = [
ONNXInferenceMetadata(
model = model,
model_args = (randn( 1 , 3 , 224 , 224 ),)
)
]
)
@inference_mode ()
def predict ( image : Image.Image) -> tuple[ str , float ]:
"""Classify an image."""
# Preprocess
image = image.convert( "RGB" )
image = F.resize(image, 224 )
image = F.center_crop(image, 224 )
image_tensor = F.to_tensor(image)
normalized_tensor = F.normalize(
image_tensor,
mean = [ 0.485 , 0.456 , 0.406 ],
std = [ 0.229 , 0.224 , 0.225 ]
)
# Run model
logits = model(normalized_tensor[ None ])
scores = softmax(logits, dim = 1 )
idx = argmax(scores, dim = 1 )
score = scores[ 0 ,idx].item()
label = weights.meta[ "categories" ][idx]
# Return
return label, score
See all 43 lines
Compiling the Function
Now, compile the function using the Function CLI:
# Compile the AI function
$ fxn compile --overwrite ai.py
Inference Backends
Function supports a fixed set of backends for running AI inference. You must opt in to using an inference backend
for a specific model by providing inference metadata. The provided metadata will allow the Function compiler to
lower the inference operation to native code.
Supported Backends
Below are supported inference metadata types:
Use the OnnxInferenceMetadata
metadata type to compile a PyTorch nn.Module
to ONNX :
from fxn.beta import OnnxInferenceMetadata
# Create ONNX inference metadata
metadata = OnnxInferenceMetadata(
model = pytorch_model,
model_args = (example_input,)
)
Use the OnnxRuntimeInferenceSessionMetadata
metadata type to compile an OnnxRuntime InferenceSession
and run it with
OnnxRuntime :
from fxn.beta import OnnxRuntimeInferenceSessionMetadata
# Create OnnxRuntime inference metadata
metadata = OnnxRuntimeInferenceSessionMetadata(
session = ort_session,
model_path = "/path/to/model.onnx"
)
The model must exist at the provided model_path
in the compiler sandbox.
Use the TensorRTInferenceMetadata
metadata type to compile a PyTorch nn.Module
to TensorRT :
from fxn.beta import TensorRTInferenceMetadata
# Create TensorRT inference metadata
metadata = TensorRTInferenceMetadata(
model = pytorch_model,
model_args = (example_input,),
cuda_arch = "sm_100" ,
precision = "int4"
)
The TensorRT inference backend is only available on Linux and Windows devices with compatible Nvidia GPUs.
Target CUDA Architectures TensorRT engines must be compiled for specific target CUDA architectures. Below are CUDA architectures that our compiler supports:
CUDA Architecture GPU Family sm_80
Ampere (e.g. A100) sm_86
Ampere sm_87
Ampere sm_89
Ada Lovelace (e.g. L40S) sm_90
Hopper (e.g. H100) sm_100
Blackwell (e.g. B200)
TensorRT Inference Precision TensorRT allows for specifying the inference engine’s precision. Below are supported precision modes:
Precision Notes fp32
32-bit single precision inference. fp16
16-bit half precision inference. int8
8-bit quantized integer inference.
Use the CoreMLInferenceMetadata
metadata type to compile a PyTorch nn.Module
to
CoreML :
from fxn.beta import CoreMLInferenceMetadata
# Create CoreML inference metadata
metadata = CoreMLInferenceMetadata(
model = pytorch_model,
model_args = (example_input,)
)
The CoreML inference backend is only available on iOS, macOS, and visionOS devices.
Use the LiteRTInferenceMetadata
metadata type to compile a PyTorch nn.Module
to LiteRT :
from fxn.beta import LiteRTInferenceMetadata
# Create LiteRT inference metadata
metadata = LiteRTInferenceMetadata(
model = pytorch_model,
model_args = (example_input,)
)
The LiteRT inference backend is only available on Android devices.
Use the QnnInferenceMetadata
metadata type to compile a PyTorch nn.Module
to a Qualcomm QNN context binary:
from fxn.beta import QnnInferenceMetadata
# Create QNN inference metadata
metadata = QnnInferenceMetadata(
model = pytorch_model,
model_args = (example_input,),
backend = "gpu" ,
quantization = None
)
The QNN inference backend is only available on Android and Windows devices with Qualcomm processors.
QNN Hardware Backends QNN requires that a hardware device backend
is specified ahead of time. Below are supported backends:
Backend Notes cpu
Reference aarch64
CPU backend. gpu
Adreno GPU backend, accelerated by OpenCL. htp
Hexagon NPU backend.
QNN Model Quantization When using the htp
backend, you must specify a model quantization
mode as the Hexagon NPU only supports
running integer-quantized models. Below are supported quantization modes:
Quantization Notes w8a8
Weights and activations are quantized to uint8
. w8a16
Weights are quantized to uint8
while activations are quantized to uint16
. w4a8
Weights are quantized to uint4
while activations are quantized to uint8
. w4a16
Weights are quantized to uint4
while activations are quantized to uint16
.
Use the OpenVINOInferenceMetadata
metadata type to compile a PyTorch nn.Module
to OpenVINO IR:
from fxn.beta import OpenVINOInferenceMetadata
# Create OpenVINO inference metadata
metadata = OpenVINOInferenceMetadata(
model = pytorch_model,
model_args = (example_input,)
)
At runtime, the OpenVINO IR will be used for inference with the OpenVINO toolkit .
The OpenVINO inference backend is only available on Linux and Windows x86_64
devices with Intel processors.
A single model can be lowered to use multiple inference backends. Simply provide multiple metadata
instances
that refer to the model.
Request a Backend
We are always looking to add support for new inference backends. So if there is an inference backend you would like to
see supported in Function, please reach out to us .