I’m building an app that uses a couple of LLMs. I can go into more detail, but in effect when the user posts a message, a quick response is generated using a fairly fast LLM, and then a better response is generated using a more powerful (and much slower LLM).
What I’d like to do is something like:
onMessage = async* ({message}) => {
const fastResponse = FastLLM(message)
yield fastResponse
const slowResponse = SlowLLM(message, fastResponse)
yield slowResponse
}
It’s important that SlowLLM is dependent on the results of FastLLM.
Originally I wanted to use the @stream directive, but it seems that Apollo Client doesn’t support it. And then I wanted to use @defer (since there will always be exactly 2 responses), but it wasn’t clear to me how to have slowResponse get the value of fastResponse.
In the example for @defer, there is:
export const fastField = async () => {
return 'I am fast'
}
export const slowField = async (_, { waitFor = 5000 }) => {
logger.debug('waiting on slowField')
await wait(waitFor)
return 'I am slow'
}
but what I’d really like is
export const fastField = async () => {
return 7 // e.g. use the fast LLM here
}
export const slowField = async (_, { waitFor = 5000 }) => {
logger.debug('waiting on slowField')
await wait(waitFor)
return context.fastField * 2 /// Or something - get the result of the fast query.
}
One thought - is it possible for a serverless function to stream a response with transfer-encoding: chunked? The docs suggest not, but perhaps there’s a way?
Hi @articulatehat You’re correct that Apollo Client does not yet support @stream
but if does support defer.
I have some documentation here: Realtime | RedwoodJS Docs and also here: https://redwoodjs.com/docs/realtime#slow-and-fast-field-defer-example.
As you saw.
But I’m confused. You want the slowField to return fastField?
Do you mean you want to query for both? Like:
query SlowAndFastFieldWithDefer {
... on Query @defer {
slowField
}
fastField
}
Ohh…
It’s important that SlowLLM is dependent on the results of FastLLM.
Ok, so you need some sort of chaining.
LLM 1 → returns fast “hot dog”
LLM 2 - returns slow – “generate an image of a ”
You might want to look at Langbase to offload LLM prompts and be able to pipe them together and then make an api call to Langbase.
Also, your slow one maybe can just be a service and fast calls that service, and you return slow (hot dog) in a normal query and the latter in a defer.
I think you just want 1 defered resolver and the slow one calls the service, but another async field.
Yeah - that’s along the lines of what I ended up with. The real use case I have I’m generating audio as well, and wanting to return the two audio files to the client.
You might also consider Inngest for AI workflows. See Case Study - Aomni