Function is primarily designed to compile AI inference functions to run on-device. We will walk through the general
workflow required to compile these functions.
Let’s begin with a function that classifies an image, returning
the label along with a confidence score. To do so, we will use the MobileNet v2 model from
torchvision:
Depending on how you run AI inference, you will likely have to install libraries (e.g. PyTorch) and/or upload model
weights. To do so, create a Sandbox:
Function supports a fixed set of backends for running AI inference. You must opt in to using an inference backend
for a specific model by providing inference metadata. The provided metadata will allow the Function compiler to
lower the inference operation to native code.
When using the htp backend, you must specify a model quantization mode as the Hexagon NPU only supports
running integer-quantized models. Below are supported quantization modes:
Quantization
Notes
w8a8
Weights and activations are quantized to uint8.
w8a16
Weights are quantized to uint8 while activations are quantized to uint16.
w4a8
Weights are quantized to uint4 while activations are quantized to uint8.
w4a16
Weights are quantized to uint4 while activations are quantized to uint16.
Coming soon 🤫.
Coming soon 🤫.
Coming soon 🤫.
Coming soon 🤫.
A single model can be lowered to use multiple inference backends. Simply provide multiple metadata instances
that refer to the model.
We are always looking to add support for new inference backends. So if there is an inference backend you would like to
see supported in Function, please reach out to us.