A Fast Path to Functional RAG Agents
Last updated
Was this helpful?
Last updated
Was this helpful?
Building an AI agent that’s aware of your docs or support pages is one of the more useful and practical things you can do with LLMs right now and a perfect use-case for showing off how Midio can be used to integrate various services and easily connect to a frontend.
In this guide, we’ll walk through how to build a complete retrieval-augmented generation (RAG) system that:
Crawls and structures web content using
Stores and retrieves scraped data using OpenAI’s vector store API
Uses Midio to orchestrate the flow, from extraction to response
And delivers it all through a frontend built with
Here's what we're building: Midio makes it incredibly easy to iterate quickly, with a live, always-up-to-date editor that shows request traces in real time.
A common approach for RAG systems is the FTI architecture (Feature, Training, Inference). It divides the system into three main parts:
Feature pipeline - prepares our data for use in the RAG system.
Training pipeline - fine-tunes our model for specific needs (this step is sometimes skipped if the raw model performs well enough, which it does in our case).
Inference pipeline - gathers relevant data based on user requests and generates responses.
In this guide, we skip the training step but create a simple feature pipeline using Firecrawl to gather data and OpenAI's vector store to chunk and store data for retrieval. We also build a basic inference pipeline to retrieve relevant data from the vector store and use an LLM to create a formatted response.
NOTE: Firecrawl offers generous free credits to get started, so sign up on their website if you want to follow along.
The extract API is asynchronous, so the first thing we do is to start an extraction job. We can specify the format of the returned data by using the Generate JSON Schema
node, passing the generated schema into the schema
input of the Extract
node.
We then check its progress every 5 seconds in a loop until completion (notice the green arrow looping back to Get Extraction
from the Delay
node.
Now we can add data to our vector store. The OpenAI API requires two steps:
Upload data as a file.
Connect the uploaded file to the vector store.
In Midio, this is straightforward.
Next, we connect these two nodes to our Firecrawl workflow, ensuring that once the extraction completes, it automatically creates and uploads one file per extracted page and adds it to the vector store.
If we want regular extractions, we can use a Schedule
node, configuring it to run daily, for example.
The inference pipeline generates responses based on user queries, typically questions about the service we extracted documentation from—in this case, Midio.
Our pipeline includes two steps
Retrieve data from the vector store.
Use an LLM to format a response based on the retrieved data and the user's query.
There are two important details here:
User queries are passed directly into the vector store (this is not necessarily an optimal approach, but it works well enough in our case).
Search results are converted into a string and combined with the original query in a template using XML tags (we do this to help the LLM distinguish between our data and the users query):
This template is then provided to the Chat Complete
node with the following system message:
The inference pipeline is simple yet effective.
The last step before creating our Lovable frontend is connecting our inference pipeline to an HTTP endpoint.
This is also simple in Midio. We use two nodes: an Endpoint
and a Respond to HTTP Request
, placing our inference flow between them.
We configure the endpoint to accept POST requests to the /ask
route, parse the JSON body, and feed the query into our inference pipeline. The response is piped directly into the body
input of the Respond to HTTP Request
node.
Finally, we create a frontend to interact with our RAG application. Lovable makes it easy to quickly build frontends that connect to REST endpoints.
To try this yourself, you can launch Lovable with the following prompt, often sufficient to create this simple app after a couple back and fourth edits.
The Midio project is available as a template when you create a new project. It also contains some additional features, like a question history and simple user tracking.
is a fantastic service for gathering data from any website, returning it in an easy-to-understand format for LLMs. They recently introduced the new , which we'll use to gather our data. Midio has a Firecrawl package available, which we first add using the package manager.
First, we create a new vector store using the Create Vector Store
function from the open-ai
package (also added via the This returns an ID, which we must retain for later use. The simplest way is to copy it and store it securely as an , allowing retrieval using the Get Environment Variable
function.
The vector store supports various , but we use the default 'auto' strategy by leaving that input blank. The attributes field is also left blank.
Alternatively, you can .
Before testing, we need to handle one final issue: . Since our backend and frontend are on different domains, explicitly allowing requests from other domains is necessary. A simple solution is allowing requests from any domain by returning these headers in the Respond to HTTP Request
node:
Additionally, it's beneficial to handle by adding an OPTIONS handler returning the headers above.
If you haven’t already, you can sign up for the Midio beta .