Hook
Building a Slack chatbot is easy — keeping it running 24/7 without a dedicated server, and making it actually smart about your internal knowledge base, is where most projects fall apart.
Why It Matters
A chatbot that only works when your laptop is open isn't a chatbot — it's a demo. And a bot that just pattern-matches keywords instead of understanding context frustrates users fast. This project shows how to combine Domo's AI Workflow engine with Replit's always-on hosting to ship a RAG-powered Slack assistant on a budget. It unlocks the ability to point a bot at your own KB articles, embed and similarity-search them at query time, and return GPT-generated answers grounded in your actual documentation — all without standing up any infrastructure yourself.
What You'll Learn
- Set up a Python Slack bot using socket mode to listen for slash commands without a public webhook endpoint
- Implement retrieval-augmented generation (RAG) by embedding KB articles in Domo Filesets and running cosine similarity search at query time
- Orchestrate the full question → retrieve → generate → reply pipeline using a Domo Workflow with AI agent tasks
- Wire Domo GPT into the workflow so answers are grounded in the top-5 retrieved articles rather than hallucinated
- Deploy and host the Python bot on Replit so it stays live without managing a server or cron job
How the RAG Pipeline Actually Works
The architecture has three distinct layers that are worth understanding separately before you wire them together.
Slack → Domo Workflow. The Python bot runs in Replit using socket mode — no public URL required. When a user fires /quest what is an adrenaline dataflow?, the socket listener catches the event and POSTs a payload to a Domo Workflow carrying the channel ID, message ID, question text, Slack token, and user ID.
Domo Workflow → RAG. Inside the workflow, an AI agent task called Find KB Article receives the question text, embeds it (converting it to a vector of floats), then compares it against pre-embedded KB articles stored in a Domo Fileset. Embedding lets you match semantically similar content even when the exact words don't overlap — the workflow retrieves the top 5 closest articles by vector distance.
Domo GPT → Threaded Reply. Those 5 articles are passed as context into a Domo GPT call: "Use these articles to answer this question." The generated answer is sent back to Slack as a threaded reply to the original message using the token and IDs captured at step one. The result is a response like "An adrenaline dataflow is a high-performance data transformation tool for massive datasets in Domo..." — accurate, cited, and delivered in-thread.
Hosting on Replit. Replit keeps the Python process alive between requests for a few dollars a month. No EC2, no Docker, no ops overhead. For a community tool or internal assistant, it's the right tradeoff: cheap, always-on, and easy to iterate on directly in the browser.
The pattern here — Slack event → workflow orchestration → vector search → grounded LLM response — is reusable for any internal knowledge base. Swap the Domo KB articles for your own docs, adjust the slash command, and the same pipeline applies.



