Tree Driven Interaction – Vision and Upcoming Features

Tight on time? Here’s a pre-configured chat bot via Chat-GPT with the blog post already loaded so you can explore the topic conversationally

If you’re a heavy Chat-GPT user frustrated at how awkward and time consuming it is to find and reuse past conversations then this post is for you. I’m sharing my progress in developing a tree-oriented GPT chat interface and knowledge base – a promising alternative to Chat-GPT which seeks to avoid the issue of past conversations rapidly becoming infeasible to maintain at scale.

The goal is to create a system where conversations are an asset to invest in, build on top of, and even connect together over time, rather than serve as mostly disposable vehicles not worth the time to revisit and dig through. The goal is to make the AI assistant not just reply and perform actions, but also collaborate with the user to link together branches of the tree of conversations and iterate on assets for future reuse in the system as a first class feature rather than a manual afterthought.

In the following sections, I’ll elaborate on the reasons for this project, a practical example, the remaining work to be done, and why this approach represents a meaningful advancement in our interaction with AI.

The Problem

March 2023 marked a significant advancement in AI with OpenAI’s release of GPT-4. In my eyes, its standout feature was the ability to generate functional code from simple language requests, hinting at a future where AI could act as not just a simple script writer, but a comprehensive software engineering partner.

However, interacting with GPT-4 through Chat-GPT presents practical challenges. Its ability to let the user edit past user messages and fork into new branches in the conversation, while initially straightforward, quickly leads to a convoluted mess. The only way to navigate between branches is to manually scroll and find each forking point and toggle the branch index until you find the one you want. If you can’t remember the exact series of “left and right turns” to get to the message you’re looking for then it might as well be gone.

This branching, essential for efficiently iterating on code and text assets without polluting the conversation with discarded drafts, can quickly become unmanageable after even a dozen branches. The only way around this problem with the Chat-GPT interface is to manually collect all the best parts of the different branches, copy and paste into a text editor, manually stitch them together with peppered in explained context, and carry them into a new conversation and do it all over again with the next series of iterations. With no way to link to or reference specific messages from other conversations it can often feel like working on larger-scale programming projects with Chat-GPT involves so much manual curation work that you’re just trading time spent writing code for time spent herding conversations.

The Pitfalls of Fully Automated Systems

OpenAI’s API offers more direct and customizable access to GPT-4, but at the cost of having to build or find an interface, as well as pay a reasonable fee for every interaction. Tools like BabyAGI and AutoGPT have emerged, aiming to streamline the AI programming process. Yet, these tools often overestimate GPT-4’s capabilities, assuming it can autonomously handle complex, multi-layered tasks from minimal inputs. They tend to end up in super-expensive, brute-force feedback loops of detecting something isn’t coded right and then pivoting sideways rather than forwards right into another issue.

While certainly innovative, they fail to reach the nuanced understanding and adaptability required for engineering software to solve more abstract and novel tasks.

My Approach – Tree Driven Interaction

Recognizing these gaps, my project proposes a blend of intuitive chat interaction and the dynamic growth potential seen in more autonomous systems. The focus is on a unified conversation tree that evolves organically, guiding the AI assistant to build a repository of composable behavioral elements. Drawing on principles from tree traversal and version control systems like Git, this approach empowers the AI to expand its capabilities without overwhelming users with manual navigation.

The remainder of this post explores the early version of this interface and its implications for future AI interactions.

Practical Demonstration: Building a Joke Writer Interface

Although this tool is being built with an eye towards writing software, like Chat-GPT it’s useful for any text generation task. For the purposes of simplicity I’ll walk through a contrived example of building a reusable joke-writing conversation stub by leveraging a prior-made system message writer conversation stub, starting from a new, blank conversation. This sounds confusing at first, so let’s take things step-by-step.

(see what I did there? 😉)

Animation demonstrating key tree-driven interactions

User starts a new conversation.
User searches for and finds a system message writer conversation stub built previously and navigates to its address by clicking on the emoji SHA.
- “emoji SHA” refers to a clickable, visual way of representing a message address so that we don’t need to get headaches from countless strings of 898b87f978d879c978a…
User verifies the stub, then returns to the orchestration conversation.
User requests the creation of a ‘system message for writing dry jokes’ via the system message writer stub.
The system processes the request, generates the joke writer system message, and returns a reply containing it.
User initiates a new conversation using the generated joke writer system message.
User navigates to the new conversation and requests jokes.
Jokes are generated, proving to the user they can now save and reuse this stub for future joke generation.

The sequence depicted in the animation showcases the system’s capability to handle complex, multi-step interactions in a streamlined manner. By utilizing the emoji SHA link and meticulously implemented pushState navigation, the user efficiently locates and verifies the necessary conversation stubs without losing their place in the orchestration conversation. The indirect method of leveraging and appending to conversation stubs by their message addresses illustrates the system’s unique approach to AI interaction. It’s not just about sending and receiving messages; it’s about activating specific functions that extend the system’s capabilities.

This design creates a cohesive user experience, where multiple tasks are orchestrated within a single conversation path. It’s akin to operating a central dashboard, where diverse functions and tasks are controlled and monitored from one point. This not only simplifies the interaction process but also ensures continuity and coherence in the conversation history.

Deep Dive: The Theoretical Underpinnings

In abstract terms, our system can be understood through one fundamental operation: taking a conversation path and a new user reply as inputs, and producing a new branch in the conversation tree as the output.

Initial Operation: (prior_conversation_path, new_user_reply) => (prior_conversation_path_appended_with_two_new_messages)
Focused Operation: For a given prior_conversation_path and if we disregard the newly produced conversation path, we get a more streamlined version: (new_user_reply) => (new_assistant_reply)
Generalized Operation: In its most abstract form, this can be seen as: (anything representable in text) => (anything representable in text)

This abstract representation illustrates the system’s extraordinary flexibility. Any type of data or request can be processed, and the system can generate a wide range of responses. This is made possible by the capabilities of GPT-4, which excels at interpreting and generating diverse formats and domains of data.

The system uses tree traversal algorithms and cosine similarity of embeddings for message lookups. These mechanisms allow for quick retrieval of relevant messages and branches, essential in a system where data can become extensive and layered. It’s a practical approach that enhances the user’s ability to interact with and leverage the AI system effectively. The goal is to make AI interactions seamless and productive, enabling users to achieve more with less effort.

Soon, with planned enhancements, the system can serve as not just a knowledge base but a dynamic behavioral repository, not just storing information but also generating and executing runnable functions. These functions will be invoked through their unique addresses, similar to how one might reference a specific commit in Git, and may be composed of other address-referenced functions. This feature will add a significant layer of adaptability, allowing the system to grow and evolve as more knowledge and behavior are accumulated.

Future Plans and Closing Thoughts

[Edit: The project launched with all below described features and more on Nov 29th, 2023! Thanks for everyone’s support. You can see the launch video here]

My project is at a pivotal stage, in part thanks to my involvement in the Backdrop Build mini-incubator. This program fosters innovative AI and blockchain applications, providing resources, support, and a community of like-minded developers as they ramp up to initial release of their respective projects. By the end of this program, concluding in early December, I aim to integrate significant enhancements into the system, the two most significant described below.

The prize I’ve had my eye on since first getting lost in a Chat-GPT branching maze is dynamic function synthesis and storage. This feature is about generating executable functions that can be serialized, stored into messages, and invoked within the system by their address. With the function body, I plan to also store a list of other function-messages by their respective addresses that the new function will depend on and have access to, leading to a crude but hopefully workable functional programming paradigm of sorts. My plan is to experiment with functions that operate using standardized inputs and outputs to minimize the need to communicate complex, heterogeneous interfaces to the AI assistant. I’m tentatively planning on using RxJS Observables of the type <string> since that seems to align well with text being the life and blood of GPT conversations, as well as RxJS already being used extensively throughout the codebase for gracefully handling all the real-time interactions.

The recent, infamous OpenAI Dev Day introduction of Threads in OpenAI’s Assistants API is a significant boon for the potential of the AI chat interface I’m developing. Threads, combined with the super-long context window and smarter JSON capabilities of the new GPT-4-turbo model, enable infinitely long conversations. A conversation tree can’t have paths which are too long to feed into GPT, so having larger contexts and automatic conversation sampling is incredibly important. However, some aspects of these new features are currently very limited with at time of writing no support for streaming so it will likely be a game time decision about whether I’ll simply swap in GPT-4-turbo for GPT-4, or go for a more rigorous restructuring to accommodate the new APIs for Assistants, Threads, and Runs.

Although the tool is already publicly available, it feels incomplete without these features so I do want to make sure to polish it off before sharing it more widely. That being said, I will need early testers so please do get in contact with me if you’re a frequent Chat-GPT user and interested in lending feedback.

As I approach the soft release of the tool, the journey so far has been enlightening, and the potential that lies ahead is truly exciting. This project is more than just a tool; it’s an experiment towards redefining how we interact with AI. I believe making conversations with AI more intuitive, adaptable, and dynamic can transform our engagement with this frontier edge of technology. I look forward to sharing the next version of this interface with you and exploring the new possibilities it will unlock.

Leave a Reply Cancel reply