Iterating through results as they become available.
Function supports consuming the partial results of a prediction as they are made available by the predictor.
Use the fxn.predictions.stream
function to consume a prediction stream:
You can use prediction streaming to implement text generation user interfaces for LLMs.
Streaming in Function is designed to be highly intuitive to use. We fully separate how a prediction function is implemented (i.e. eager vs generator functions) from how the function might be consumed (i.e. creating vs. streaming). Consider these two predictors:
You can choose how to consume a prediction function depending on what works best for your user experience.
Here are the reuslts of creating vs. streaming each function at runtime:
Creating Predictions with eager.py
In this case, the single prediction is returned:
Creating Predictions with generator.py
In this case, the Function client will consume all partial predictions yielded by the predictor then return the very last one:
Streaming Predictions with eager.py
In this case, the Function client will return a prediction stream with the single prediction returned by the predictor:
Streaming Predictions with generator.py
In this case, the Function client will provide a prediction stream containing all partial predictions yielded by the predictor:
Iterating through results as they become available.
Function supports consuming the partial results of a prediction as they are made available by the predictor.
Use the fxn.predictions.stream
function to consume a prediction stream:
You can use prediction streaming to implement text generation user interfaces for LLMs.
Streaming in Function is designed to be highly intuitive to use. We fully separate how a prediction function is implemented (i.e. eager vs generator functions) from how the function might be consumed (i.e. creating vs. streaming). Consider these two predictors:
You can choose how to consume a prediction function depending on what works best for your user experience.
Here are the reuslts of creating vs. streaming each function at runtime:
Creating Predictions with eager.py
In this case, the single prediction is returned:
Creating Predictions with generator.py
In this case, the Function client will consume all partial predictions yielded by the predictor then return the very last one:
Streaming Predictions with eager.py
In this case, the Function client will return a prediction stream with the single prediction returned by the predictor:
Streaming Predictions with generator.py
In this case, the Function client will provide a prediction stream containing all partial predictions yielded by the predictor: