Show HN: GuMCP – Open-source MCP servers, hosted for free
18 by murb | 3 comments on Hacker News.
Hello! We open sourced all our current MCP servers to platforms like Slack, Google sheets, Linear, Perplexity and will be contributing a few more integrations every day to the project. problems we're hoping to solve: - Many people are creating MCP servers for the same apps. They're scattered across different repos but flavors of the same thing. We're making one standardized mono project for all MCP servers. - Startups are charging for hosting MCP servers. This is blocking tons of people from being able to play around with MCP casually. We're hosting them for free. - Non-technical people should be able to use MCP without needing to learn how to clone a repo and set up a venv. We're trying to enable a one click integration if people want to use the free hosted service. The plan is to keep contributing until we have an MCP server for basically every useful app anyone could want.
Monday, March 31, 2025
Sunday, March 30, 2025
Saturday, March 29, 2025
Friday, March 28, 2025
Thursday, March 27, 2025
Wednesday, March 26, 2025
New top story on Hacker News: Botswana Successfully Launches First Satellite, Botsat-1
Botswana Successfully Launches First Satellite, Botsat-1
11 by vinnyglennon | 1 comments on Hacker News.
11 by vinnyglennon | 1 comments on Hacker News.
Tuesday, March 25, 2025
Monday, March 24, 2025
Sunday, March 23, 2025
Saturday, March 22, 2025
Friday, March 21, 2025
Thursday, March 20, 2025
New top story on Hacker News: Show HN: AgentKit – JavaScript Alternative to OpenAI Agents SDK with Native MCP
Show HN: AgentKit – JavaScript Alternative to OpenAI Agents SDK with Native MCP
35 by tonyhb | 9 comments on Hacker News.
Hi HN! I’m Tony, co-founder of Inngest. I wanted to share AgentKit, our Typescript multi-agent library we’ve been cooking and testing with some early users in prod for months. Although OpenAI’s Agents SDK has been launched since, we think an Agent framework should offer more deterministic and flexible routing, work with multiple model providers, embrace MCP (for rich tooling), and support the unstoppable and growing community of TypeScript AI developers by enabling a smooth transition to production use cases. This is why we are building AgentKit, and we’re really excited about it for a few reasons: Firstly, it’s simple. We embrace KISS principles brought by Anthropic and HuggingFace by allowing you to gradually add autonomy to your AgentKit program using primitives: - Agents: LLM calls that can be combined with prompts, tools, and MCP native support. - Networks: a simple way to get Agents to collaborate with a shared State, including handoff. - State: combines conversation history with a fully typed state machine, used in routing. - Routers: where the autonomy lives, from code-based to LLM-based (ex: ReAct) orchestration The routers are where the magic happens, and allow you to build deterministic, reliable, testable agents. AgentKit routing works as follows: the network calls itself in a loop, inspecting the State to determine which agents to call next using a router. The returned agent runs, then optionally updates state data using its tools. On the next loop, the network inspects state data and conversation history, and determines which new agent to run. This fully typed state machine routing allows you to deterministically build agents using any of the effective agent patterns — which means your code is easy to read, edit, understand, and debug. This also makes handoff incredibly easy: you define when agents should hand off to each other using regular code and state (or by calling an LLM in the router for AI-based routing). This is similar to the OpenAI Agents SDK but easier to manage, plan, and build. Then comes the local development and moving to production capabilities. AgentKit is compatible with Inngest’s tooling, meaning that you can test agents using Inngest’s local DevServer, which provides traces, inputs, outputs, replay, tool, and MCP inputs and outputs, and (soon) a step-over debugger so that you can easily understand and visually see what's happening in the agent loop. In production, you can also optionally combine AgentKit with Inngest for fault-tolerant execution. Each agent’s LLM call is wrapped in a step, and tools can use multiple steps to incorporate things like human-in-the-loop. This gives you native orchestration, observability, and out-of-the-box scale. You will find the documentation as an example of an AgentKit SWE-bench and multiple Coding Agent examples. It’s fully open-source under the Apache 2 license. If you want to get started: - npm: npm i @inngest/agent-kit - GitHub: https://ift.tt/SsMORIL - Docs: https://ift.tt/5Qs6BTo We’re excited to finally launch AgentKit; let us know what you think!
35 by tonyhb | 9 comments on Hacker News.
Hi HN! I’m Tony, co-founder of Inngest. I wanted to share AgentKit, our Typescript multi-agent library we’ve been cooking and testing with some early users in prod for months. Although OpenAI’s Agents SDK has been launched since, we think an Agent framework should offer more deterministic and flexible routing, work with multiple model providers, embrace MCP (for rich tooling), and support the unstoppable and growing community of TypeScript AI developers by enabling a smooth transition to production use cases. This is why we are building AgentKit, and we’re really excited about it for a few reasons: Firstly, it’s simple. We embrace KISS principles brought by Anthropic and HuggingFace by allowing you to gradually add autonomy to your AgentKit program using primitives: - Agents: LLM calls that can be combined with prompts, tools, and MCP native support. - Networks: a simple way to get Agents to collaborate with a shared State, including handoff. - State: combines conversation history with a fully typed state machine, used in routing. - Routers: where the autonomy lives, from code-based to LLM-based (ex: ReAct) orchestration The routers are where the magic happens, and allow you to build deterministic, reliable, testable agents. AgentKit routing works as follows: the network calls itself in a loop, inspecting the State to determine which agents to call next using a router. The returned agent runs, then optionally updates state data using its tools. On the next loop, the network inspects state data and conversation history, and determines which new agent to run. This fully typed state machine routing allows you to deterministically build agents using any of the effective agent patterns — which means your code is easy to read, edit, understand, and debug. This also makes handoff incredibly easy: you define when agents should hand off to each other using regular code and state (or by calling an LLM in the router for AI-based routing). This is similar to the OpenAI Agents SDK but easier to manage, plan, and build. Then comes the local development and moving to production capabilities. AgentKit is compatible with Inngest’s tooling, meaning that you can test agents using Inngest’s local DevServer, which provides traces, inputs, outputs, replay, tool, and MCP inputs and outputs, and (soon) a step-over debugger so that you can easily understand and visually see what's happening in the agent loop. In production, you can also optionally combine AgentKit with Inngest for fault-tolerant execution. Each agent’s LLM call is wrapped in a step, and tools can use multiple steps to incorporate things like human-in-the-loop. This gives you native orchestration, observability, and out-of-the-box scale. You will find the documentation as an example of an AgentKit SWE-bench and multiple Coding Agent examples. It’s fully open-source under the Apache 2 license. If you want to get started: - npm: npm i @inngest/agent-kit - GitHub: https://ift.tt/SsMORIL - Docs: https://ift.tt/5Qs6BTo We’re excited to finally launch AgentKit; let us know what you think!
Wednesday, March 19, 2025
New top story on Hacker News: Show HN: We built an agentic image editor that preserves the original structure
Show HN: We built an agentic image editor that preserves the original structure
8 by sakofchit | 5 comments on Hacker News.
Hi everyone, I’ve been experimenting with app where you can edit images in your camera roll simply by tweaking your photo’s metadata (changing location/time) and our agent will contextually regenerate the photo in that place & time in one shot. There's no prompting involved. One of the hardest problems we’ve seen with these ai image editing/creation tools is that they struggle with preserving the subjects of the original image (faces, genders, number of people, bodies, animals, etc), and I think we’ve gotten a step closer to making it feel more realistic. The gallery has some examples that people have been regenerating. https://ift.tt/TQxIOb1 Here’s a demo: https://ift.tt/yXbx63Y Feel free to dm me on Twitter: https://twitter.com/sakofchit if you’d like to try out the TestFlight in the meantime Would love to know what y'all think!
8 by sakofchit | 5 comments on Hacker News.
Hi everyone, I’ve been experimenting with app where you can edit images in your camera roll simply by tweaking your photo’s metadata (changing location/time) and our agent will contextually regenerate the photo in that place & time in one shot. There's no prompting involved. One of the hardest problems we’ve seen with these ai image editing/creation tools is that they struggle with preserving the subjects of the original image (faces, genders, number of people, bodies, animals, etc), and I think we’ve gotten a step closer to making it feel more realistic. The gallery has some examples that people have been regenerating. https://ift.tt/TQxIOb1 Here’s a demo: https://ift.tt/yXbx63Y Feel free to dm me on Twitter: https://twitter.com/sakofchit if you’d like to try out the TestFlight in the meantime Would love to know what y'all think!
Tuesday, March 18, 2025
Monday, March 17, 2025
Sunday, March 16, 2025
New top story on Hacker News: Show HN: 10 teams are racing to build a pivotal tracker replacement
Show HN: 10 teams are racing to build a pivotal tracker replacement
3 by jFriedensreich | 0 comments on Hacker News.
A lot has changed since the shutdown of pivotal tracker was discussed here. As there were no viable alternatives it seems every month there was a new project popping up. With the last month before the sunsetting approaching, it starts to get exciting who will make it in time, who stays in the race and what the differentiating features of the projects will be.
3 by jFriedensreich | 0 comments on Hacker News.
A lot has changed since the shutdown of pivotal tracker was discussed here. As there were no viable alternatives it seems every month there was a new project popping up. With the last month before the sunsetting approaching, it starts to get exciting who will make it in time, who stays in the race and what the differentiating features of the projects will be.
Saturday, March 15, 2025
Friday, March 14, 2025
New top story on Hacker News: Show HN: Pi Labs – AI scoring and optimization tools for software engineers
Show HN: Pi Labs – AI scoring and optimization tools for software engineers
11 by achintms | 0 comments on Hacker News.
Hey HN, after years building some of the core AI and NLU systems in Google Search, we decided to leave and build outside. Our goal was to put the advanced ML and DS techniques we’ve been using in the hands of all software engineers, so that everyone can build AI and Search apps at the same level of performance and sophistication as the big labs. This was a hard technical challenge but we were very inspired by the MVC architecture for Web development. The intuition there was that when a data model changes, its view would get auto-updated. We built a similar architecture for AI. On one side is a scoring system, which encapsulates in a set of metrics what’s good about the AI application. On the other side is a set of optimizers that “compile” against this scorer - prompt optimization, data filtering, synthetic data generation, supervised learning, RL, etc. The scoring system can be calibrated using developer, user or rater feedback, and once it’s updated, all the optimizers get recompiled against it. The result is a setup that makes it easy to incrementally improve the quality of your AI in a tight feedback loop: You update your scorers, they auto-update your optimizers, your app gets better, you see that improvement in interpretable scores, and then you repeat, progressing from simpler to more advanced optimizers and from off-the-shelf to calibrated scorers. We would love your feedback on this approach. https://build.withpi.ai has a set of playgrounds to help you quickly build a scorer and multiple optimizers. No sign in required. https://code.withpi.ai has the API reference and Notebook links. Finally, we have a Loom demo [1]. More technical details Scorers: Our scoring system has three key differences from the common LLM-as-a-judge pattern. First, rather than a single label or metric from an LLM judge, our scoring system is represented as a tunable tree of metrics, with 20+ dimensions which get combined into a final (non-linear) weighted score. The tree structure makes scores easily interpretable (just look at the breakdown by dimension), extensible (just add/remove a dimension), and adjustable (just re-tune the weights). Training the scoring system with labeled/preference data adjusts the weights. You can automate this process with user feedback signals, resulting in a tight feedback loop. Second, our scoring system handles natural language dimensions (great for free-form, qualitative questions requiring NLU) alongside quantitative dimensions (like computations over dates or doc length, which can be provided in Python) in the same tree. When calibrating with your labeled or preference data, the scorer learns how to balance these. Third, for natural language scoring, we use specialized smaller encoder models rather than autoregressive models. Encoders are a natural fit for scoring as they are faster and cheaper to run, easier to fine-tune, and more suitable architecturally (bi-directional attention with regression or classification head) than similar sized decoder models. For example, we can score 20+ dimensions in sub-100ms, making it possible to use scoring everywhere from evaluation to agent orchestration to reward modeling. Optimizers: We took the most salient ML techniques and reformulated them as optimizers against our scoring system e.g. for DSPy, the scoring system acts as its validator. For GRPO, the scoring system acts as its reward model. We’re keen to hear the community’s feedback on which techniques to add next. Overall stack: Playgrounds next.js and Vercel. AI: Runpod and GCP for training GPUs, TRL for training algos, ModernBert & Llama as base models. GCP and Azure for 4o and Anthropic calls. We’d love your feedback and perspectives: Our team will be around to answer questions and discuss. If there’s a lot of interest, happy to host a live session! - Achint, co-founder of Pi Labs [1] https://ift.tt/TSn2KBs
11 by achintms | 0 comments on Hacker News.
Hey HN, after years building some of the core AI and NLU systems in Google Search, we decided to leave and build outside. Our goal was to put the advanced ML and DS techniques we’ve been using in the hands of all software engineers, so that everyone can build AI and Search apps at the same level of performance and sophistication as the big labs. This was a hard technical challenge but we were very inspired by the MVC architecture for Web development. The intuition there was that when a data model changes, its view would get auto-updated. We built a similar architecture for AI. On one side is a scoring system, which encapsulates in a set of metrics what’s good about the AI application. On the other side is a set of optimizers that “compile” against this scorer - prompt optimization, data filtering, synthetic data generation, supervised learning, RL, etc. The scoring system can be calibrated using developer, user or rater feedback, and once it’s updated, all the optimizers get recompiled against it. The result is a setup that makes it easy to incrementally improve the quality of your AI in a tight feedback loop: You update your scorers, they auto-update your optimizers, your app gets better, you see that improvement in interpretable scores, and then you repeat, progressing from simpler to more advanced optimizers and from off-the-shelf to calibrated scorers. We would love your feedback on this approach. https://build.withpi.ai has a set of playgrounds to help you quickly build a scorer and multiple optimizers. No sign in required. https://code.withpi.ai has the API reference and Notebook links. Finally, we have a Loom demo [1]. More technical details Scorers: Our scoring system has three key differences from the common LLM-as-a-judge pattern. First, rather than a single label or metric from an LLM judge, our scoring system is represented as a tunable tree of metrics, with 20+ dimensions which get combined into a final (non-linear) weighted score. The tree structure makes scores easily interpretable (just look at the breakdown by dimension), extensible (just add/remove a dimension), and adjustable (just re-tune the weights). Training the scoring system with labeled/preference data adjusts the weights. You can automate this process with user feedback signals, resulting in a tight feedback loop. Second, our scoring system handles natural language dimensions (great for free-form, qualitative questions requiring NLU) alongside quantitative dimensions (like computations over dates or doc length, which can be provided in Python) in the same tree. When calibrating with your labeled or preference data, the scorer learns how to balance these. Third, for natural language scoring, we use specialized smaller encoder models rather than autoregressive models. Encoders are a natural fit for scoring as they are faster and cheaper to run, easier to fine-tune, and more suitable architecturally (bi-directional attention with regression or classification head) than similar sized decoder models. For example, we can score 20+ dimensions in sub-100ms, making it possible to use scoring everywhere from evaluation to agent orchestration to reward modeling. Optimizers: We took the most salient ML techniques and reformulated them as optimizers against our scoring system e.g. for DSPy, the scoring system acts as its validator. For GRPO, the scoring system acts as its reward model. We’re keen to hear the community’s feedback on which techniques to add next. Overall stack: Playgrounds next.js and Vercel. AI: Runpod and GCP for training GPUs, TRL for training algos, ModernBert & Llama as base models. GCP and Azure for 4o and Anthropic calls. We’d love your feedback and perspectives: Our team will be around to answer questions and discuss. If there’s a lot of interest, happy to host a live session! - Achint, co-founder of Pi Labs [1] https://ift.tt/TSn2KBs
New top story on Hacker News: Show HN: OCR Benchmark Focusing on Automation
Show HN: OCR Benchmark Focusing on Automation
3 by prats226 | 0 comments on Hacker News.
OCR/Document extraction field has seen lot of action recently with releases like Mixtral OCR, Andrew Ng's agentic document processing etc. Also there are several benchmarks for OCR, however all testing for something slightly different which make good comparison of models very hard. To give an example, some models like mixtral-ocr only try to convert a document to markdown format. You have to use another LLM on top of it to get the final result. Some VLM’s directly give structured information like key fields from documents like invoices, but you have to either add business rules on top of it or use some LLM as a judge kind of system to get sense of which output needs to be manually reviewed or can be taken as correct output. No benchmark attempts to measure the actual rate of automation you can achieve. We have tried to solve this problem with a benchmark that is only applicable for documents/usecases where you are looking for automation and its trying to measure that end to end automation level of different models or systems. We have collected a dataset that represents documents like invoices etc which are applicable in processes where automation is needed vs are more copilot in nature where you would need to chat with document. Also have annotated these documents and published the dataset and repo so it can be extended. Here is writeup: https://ift.tt/qyvACdU Dataset: https://ift.tt/sEQR058 Github: https://ift.tt/o8Bk4F0 Looking for suggestions on how this benchmark can be improved further.
3 by prats226 | 0 comments on Hacker News.
OCR/Document extraction field has seen lot of action recently with releases like Mixtral OCR, Andrew Ng's agentic document processing etc. Also there are several benchmarks for OCR, however all testing for something slightly different which make good comparison of models very hard. To give an example, some models like mixtral-ocr only try to convert a document to markdown format. You have to use another LLM on top of it to get the final result. Some VLM’s directly give structured information like key fields from documents like invoices, but you have to either add business rules on top of it or use some LLM as a judge kind of system to get sense of which output needs to be manually reviewed or can be taken as correct output. No benchmark attempts to measure the actual rate of automation you can achieve. We have tried to solve this problem with a benchmark that is only applicable for documents/usecases where you are looking for automation and its trying to measure that end to end automation level of different models or systems. We have collected a dataset that represents documents like invoices etc which are applicable in processes where automation is needed vs are more copilot in nature where you would need to chat with document. Also have annotated these documents and published the dataset and repo so it can be extended. Here is writeup: https://ift.tt/qyvACdU Dataset: https://ift.tt/sEQR058 Github: https://ift.tt/o8Bk4F0 Looking for suggestions on how this benchmark can be improved further.
Thursday, March 13, 2025
Wednesday, March 12, 2025
New top story on Hacker News: Experiment with Gemini 2.0 Flash native image generation
Experiment with Gemini 2.0 Flash native image generation
14 by meetpateltech | 0 comments on Hacker News.
14 by meetpateltech | 0 comments on Hacker News.
Tuesday, March 11, 2025
Monday, March 10, 2025
Sunday, March 9, 2025
Saturday, March 8, 2025
Friday, March 7, 2025
Thursday, March 6, 2025
Wednesday, March 5, 2025
Tuesday, March 4, 2025
New top story on Hacker News: Show HN: Time travel debugging AI for more reliable vibe coding
Show HN: Time travel debugging AI for more reliable vibe coding
15 by bhackett | 5 comments on Hacker News.
Hi HN, I'm the CEO at https://replay.io . We've been building a time travel debugger for web apps for several years now (previous HN post: https://ift.tt/hULuFNX ) and are combining our tech with AI to automate the debugging process. AIs are really good at writing code but really bad at debugging -- it's amazing to use Claude to prompt an app into existence, and pretty frustrating when that app doesn't work right and Claude is all thumbs fixing the problem. The basic reason for this is a lack of context. People can use devtools to understand what's going on in the app, but AIs struggle here. With a recording of the app its behavior becomes a giant database for querying using RAG. We've been giving Claude tools to explore and understand what happens in a Replay recording, from basic stuff like seeing console messages to more advanced analysis of React, control dependencies, and dataflow. For now this is behind a chat API ( https://ift.tt/8q5TBhe ). We recently launched Nut ( https://nut.new ) as an open source project which uses this tech for building apps through prompting (vibe coding), similar to e.g. https://bolt.new and https://v0.dev . We want Nut to fix bugs effectively (cracking nuts, so to speak) and are working to make it a reliable tool for building complete production grade apps. It's been pretty neat to see Nut fixing bugs that totally stump the AI otherwise. Each of the problems below has a short video but you can also load the associated project and try it yourself. - Exception thrown from a catch block unmounts the entire app: https://ift.tt/RWPmpFS - A settings button doesn't work because its modal component isn't always created: https://ift.tt/WI3jvGC - An icon is really tiny due to sizing constraints imposed by other elements: https://ift.tt/zRwHnjq - Loading doesn't finish due to a problem initializing responsive UI state: https://ift.tt/14Nw6dI - Infinite rendering loop caused by a missing useCallback: https://ift.tt/ocQsHne Nut is completely free. You get some free uses or can add an API key, and we're also offering unlimited free access for folks who can give us feedback we'll use to improve Nut. Email me at hi@replay.io if you're interested. For now Nut is best suited for building frontends but we'll be rolling out more full stack features in the next few weeks. I'd love to know what you think!
15 by bhackett | 5 comments on Hacker News.
Hi HN, I'm the CEO at https://replay.io . We've been building a time travel debugger for web apps for several years now (previous HN post: https://ift.tt/hULuFNX ) and are combining our tech with AI to automate the debugging process. AIs are really good at writing code but really bad at debugging -- it's amazing to use Claude to prompt an app into existence, and pretty frustrating when that app doesn't work right and Claude is all thumbs fixing the problem. The basic reason for this is a lack of context. People can use devtools to understand what's going on in the app, but AIs struggle here. With a recording of the app its behavior becomes a giant database for querying using RAG. We've been giving Claude tools to explore and understand what happens in a Replay recording, from basic stuff like seeing console messages to more advanced analysis of React, control dependencies, and dataflow. For now this is behind a chat API ( https://ift.tt/8q5TBhe ). We recently launched Nut ( https://nut.new ) as an open source project which uses this tech for building apps through prompting (vibe coding), similar to e.g. https://bolt.new and https://v0.dev . We want Nut to fix bugs effectively (cracking nuts, so to speak) and are working to make it a reliable tool for building complete production grade apps. It's been pretty neat to see Nut fixing bugs that totally stump the AI otherwise. Each of the problems below has a short video but you can also load the associated project and try it yourself. - Exception thrown from a catch block unmounts the entire app: https://ift.tt/RWPmpFS - A settings button doesn't work because its modal component isn't always created: https://ift.tt/WI3jvGC - An icon is really tiny due to sizing constraints imposed by other elements: https://ift.tt/zRwHnjq - Loading doesn't finish due to a problem initializing responsive UI state: https://ift.tt/14Nw6dI - Infinite rendering loop caused by a missing useCallback: https://ift.tt/ocQsHne Nut is completely free. You get some free uses or can add an API key, and we're also offering unlimited free access for folks who can give us feedback we'll use to improve Nut. Email me at hi@replay.io if you're interested. For now Nut is best suited for building frontends but we'll be rolling out more full stack features in the next few weeks. I'd love to know what you think!
New top story on Hacker News: Show HN: Open-source Deep Research across workplace applications
Show HN: Open-source Deep Research across workplace applications
8 by yuhongsun | 1 comments on Hacker News.
I’ve been using deep research on OpenAI and Perplexity and it’s been just amazing at gathering data across a lot of related and chained searches. Just earlier today, I asked “What are some marquee tech companies / hot startups (not including the giants like FAAMG, Samsung, Nvidia etc.)”. It’s a pretty involved question and looking up “marquee tech startups” or "hot tech startups" on Google gave me nothing useful. Deep research on both ChatGPT and Perplexity gave really high quality responses with ChatGPT siding on slightly larger scaleups and Perplexity siding more on up and coming companies. Given how useful AI research agents are across the internet, we decided to build an open-source equivalent for the workplace since a ton of questions at work also cannot be easily resolved with a single search. Onyx supports deep research connected to company applications like Google Drive, Salesforce, Sharepoint, GitHub, Slack, and 30+ others. For example, an engineer may want to know “What’s happening with the verification email failure?” Onyx’s AI agent would first figure out what it needs to answer this question: What is the cause of the failure, what has been done to address it, has this come up before, and what’s the latest status on the issue. The agent would run parallel searches through Confluence, email, Slack, and GitHub to get the answers to these then combine them to build a coherent overview. If the agent finds that there was a technical blocker that will delay the resolution, it will adjust mid-flight and research to get more context on the blocker. Here’s a video demo I recorded: https://www.youtube.com/watch?v=drvC0fWG4hE If you want to get started with the GitHub repo, you can check out our guides at https://docs.onyx.app . Or to play with it without needing to deploy anything, you can go to https://ift.tt/osKmLXF P.S. There’s a lot of cool technical details behind building a system like this so I’ll continue the conversation in the comments.
8 by yuhongsun | 1 comments on Hacker News.
I’ve been using deep research on OpenAI and Perplexity and it’s been just amazing at gathering data across a lot of related and chained searches. Just earlier today, I asked “What are some marquee tech companies / hot startups (not including the giants like FAAMG, Samsung, Nvidia etc.)”. It’s a pretty involved question and looking up “marquee tech startups” or "hot tech startups" on Google gave me nothing useful. Deep research on both ChatGPT and Perplexity gave really high quality responses with ChatGPT siding on slightly larger scaleups and Perplexity siding more on up and coming companies. Given how useful AI research agents are across the internet, we decided to build an open-source equivalent for the workplace since a ton of questions at work also cannot be easily resolved with a single search. Onyx supports deep research connected to company applications like Google Drive, Salesforce, Sharepoint, GitHub, Slack, and 30+ others. For example, an engineer may want to know “What’s happening with the verification email failure?” Onyx’s AI agent would first figure out what it needs to answer this question: What is the cause of the failure, what has been done to address it, has this come up before, and what’s the latest status on the issue. The agent would run parallel searches through Confluence, email, Slack, and GitHub to get the answers to these then combine them to build a coherent overview. If the agent finds that there was a technical blocker that will delay the resolution, it will adjust mid-flight and research to get more context on the blocker. Here’s a video demo I recorded: https://www.youtube.com/watch?v=drvC0fWG4hE If you want to get started with the GitHub repo, you can check out our guides at https://docs.onyx.app . Or to play with it without needing to deploy anything, you can go to https://ift.tt/osKmLXF P.S. There’s a lot of cool technical details behind building a system like this so I’ll continue the conversation in the comments.
Monday, March 3, 2025
Sunday, March 2, 2025
Saturday, March 1, 2025
Subscribe to:
Posts (Atom)