Tree Driven Interaction – Vision and Upcoming Features

Tight on time? Here’s a pre-configured chat bot via Chat-GPT with the blog post already loaded so you can explore the topic conversationally

If you’re a heavy Chat-GPT user frustrated at how awkward and time consuming it is to find and reuse past conversations then this post is for you. I’m sharing my progress in developing a tree-oriented GPT chat interface and knowledge base – a promising alternative to Chat-GPT which seeks to avoid the issue of past conversations rapidly becoming infeasible to maintain at scale.

The goal is to create a system where conversations are an asset to invest in, build on top of, and even connect together over time, rather than serve as mostly disposable vehicles not worth the time to revisit and dig through. The goal is to make the AI assistant not just reply and perform actions, but also collaborate with the user to link together branches of the tree of conversations and iterate on assets for future reuse in the system as a first class feature rather than a manual afterthought.

In the following sections, I’ll elaborate on the reasons for this project, a practical example, the remaining work to be done, and why this approach represents a meaningful advancement in our interaction with AI.

The Problem

March 2023 marked a significant advancement in AI with OpenAI’s release of GPT-4. In my eyes, its standout feature was the ability to generate functional code from simple language requests, hinting at a future where AI could act as not just a simple script writer, but a comprehensive software engineering partner.

However, interacting with GPT-4 through Chat-GPT presents practical challenges. Its ability to let the user edit past user messages and fork into new branches in the conversation, while initially straightforward, quickly leads to a convoluted mess. The only way to navigate between branches is to manually scroll and find each forking point and toggle the branch index until you find the one you want. If you can’t remember the exact series of “left and right turns” to get to the message you’re looking for then it might as well be gone.

This branching, essential for efficiently iterating on code and text assets without polluting the conversation with discarded drafts, can quickly become unmanageable after even a dozen branches. The only way around this problem with the Chat-GPT interface is to manually collect all the best parts of the different branches, copy and paste into a text editor, manually stitch them together with peppered in explained context, and carry them into a new conversation and do it all over again with the next series of iterations. With no way to link to or reference specific messages from other conversations it can often feel like working on larger-scale programming projects with Chat-GPT involves so much manual curation work that you’re just trading time spent writing code for time spent herding conversations.

The Pitfalls of Fully Automated Systems

OpenAI’s API offers more direct and customizable access to GPT-4, but at the cost of having to build or find an interface, as well as pay a reasonable fee for every interaction. Tools like BabyAGI and AutoGPT have emerged, aiming to streamline the AI programming process. Yet, these tools often overestimate GPT-4’s capabilities, assuming it can autonomously handle complex, multi-layered tasks from minimal inputs. They tend to end up in super-expensive, brute-force feedback loops of detecting something isn’t coded right and then pivoting sideways rather than forwards right into another issue.

While certainly innovative, they fail to reach the nuanced understanding and adaptability required for engineering software to solve more abstract and novel tasks.

My Approach – Tree Driven Interaction

Recognizing these gaps, my project proposes a blend of intuitive chat interaction and the dynamic growth potential seen in more autonomous systems. The focus is on a unified conversation tree that evolves organically, guiding the AI assistant to build a repository of composable behavioral elements. Drawing on principles from tree traversal and version control systems like Git, this approach empowers the AI to expand its capabilities without overwhelming users with manual navigation.

The remainder of this post explores the early version of this interface and its implications for future AI interactions.

Practical Demonstration: Building a Joke Writer Interface

Although this tool is being built with an eye towards writing software, like Chat-GPT it’s useful for any text generation task. For the purposes of simplicity I’ll walk through a contrived example of building a reusable joke-writing conversation stub by leveraging a prior-made system message writer conversation stub, starting from a new, blank conversation. This sounds confusing at first, so let’s take things step-by-step.

(see what I did there? 😉)

Animation demonstrating key tree-driven interactions
  1. User starts a new conversation.
  2. User searches for and finds a system message writer conversation stub built previously and navigates to its address by clicking on the emoji SHA.
    • “emoji SHA” refers to a clickable, visual way of representing a message address so that we don’t need to get headaches from countless strings of 898b87f978d879c978a…
  3. User verifies the stub, then returns to the orchestration conversation.
  4. User requests the creation of a ‘system message for writing dry jokes’ via the system message writer stub.
  5. The system processes the request, generates the joke writer system message, and returns a reply containing it.
  6. User initiates a new conversation using the generated joke writer system message.
  7. User navigates to the new conversation and requests jokes.
  8. Jokes are generated, proving to the user they can now save and reuse this stub for future joke generation.

The sequence depicted in the animation showcases the system’s capability to handle complex, multi-step interactions in a streamlined manner. By utilizing the emoji SHA link and meticulously implemented pushState navigation, the user efficiently locates and verifies the necessary conversation stubs without losing their place in the orchestration conversation. The indirect method of leveraging and appending to conversation stubs by their message addresses illustrates the system’s unique approach to AI interaction. It’s not just about sending and receiving messages; it’s about activating specific functions that extend the system’s capabilities.

This design creates a cohesive user experience, where multiple tasks are orchestrated within a single conversation path. It’s akin to operating a central dashboard, where diverse functions and tasks are controlled and monitored from one point. This not only simplifies the interaction process but also ensures continuity and coherence in the conversation history.

Deep Dive: The Theoretical Underpinnings

In abstract terms, our system can be understood through one fundamental operation: taking a conversation path and a new user reply as inputs, and producing a new branch in the conversation tree as the output.

  • Initial Operation: (prior_conversation_path, new_user_reply) => (prior_conversation_path_appended_with_two_new_messages)
  • Focused Operation: For a given prior_conversation_path and if we disregard the newly produced conversation path, we get a more streamlined version: (new_user_reply) => (new_assistant_reply)
  • Generalized Operation: In its most abstract form, this can be seen as: (anything representable in text) => (anything representable in text)

This abstract representation illustrates the system’s extraordinary flexibility. Any type of data or request can be processed, and the system can generate a wide range of responses. This is made possible by the capabilities of GPT-4, which excels at interpreting and generating diverse formats and domains of data.

The system uses tree traversal algorithms and cosine similarity of embeddings for message lookups. These mechanisms allow for quick retrieval of relevant messages and branches, essential in a system where data can become extensive and layered. It’s a practical approach that enhances the user’s ability to interact with and leverage the AI system effectively. The goal is to make AI interactions seamless and productive, enabling users to achieve more with less effort.

Soon, with planned enhancements, the system can serve as not just a knowledge base but a dynamic behavioral repository, not just storing information but also generating and executing runnable functions. These functions will be invoked through their unique addresses, similar to how one might reference a specific commit in Git, and may be composed of other address-referenced functions. This feature will add a significant layer of adaptability, allowing the system to grow and evolve as more knowledge and behavior are accumulated.

Future Plans and Closing Thoughts

[Edit: The project launched with all below described features and more on Nov 29th, 2023! Thanks for everyone’s support. You can see the launch video here]

My project is at a pivotal stage, in part thanks to my involvement in the Backdrop Build mini-incubator. This program fosters innovative AI and blockchain applications, providing resources, support, and a community of like-minded developers as they ramp up to initial release of their respective projects. By the end of this program, concluding in early December, I aim to integrate significant enhancements into the system, the two most significant described below.

The prize I’ve had my eye on since first getting lost in a Chat-GPT branching maze is dynamic function synthesis and storage. This feature is about generating executable functions that can be serialized, stored into messages, and invoked within the system by their address. With the function body, I plan to also store a list of other function-messages by their respective addresses that the new function will depend on and have access to, leading to a crude but hopefully workable functional programming paradigm of sorts. My plan is to experiment with functions that operate using standardized inputs and outputs to minimize the need to communicate complex, heterogeneous interfaces to the AI assistant. I’m tentatively planning on using RxJS Observables of the type <string> since that seems to align well with text being the life and blood of GPT conversations, as well as RxJS already being used extensively throughout the codebase for gracefully handling all the real-time interactions.

The recent, infamous OpenAI Dev Day introduction of Threads in OpenAI’s Assistants API is a significant boon for the potential of the AI chat interface I’m developing. Threads, combined with the super-long context window and smarter JSON capabilities of the new GPT-4-turbo model, enable infinitely long conversations. A conversation tree can’t have paths which are too long to feed into GPT, so having larger contexts and automatic conversation sampling is incredibly important. However, some aspects of these new features are currently very limited with at time of writing no support for streaming so it will likely be a game time decision about whether I’ll simply swap in GPT-4-turbo for GPT-4, or go for a more rigorous restructuring to accommodate the new APIs for Assistants, Threads, and Runs.

Although the tool is already publicly available, it feels incomplete without these features so I do want to make sure to polish it off before sharing it more widely. That being said, I will need early testers so please do get in contact with me if you’re a frequent Chat-GPT user and interested in lending feedback.

As I approach the soft release of the tool, the journey so far has been enlightening, and the potential that lies ahead is truly exciting. This project is more than just a tool; it’s an experiment towards redefining how we interact with AI. I believe making conversations with AI more intuitive, adaptable, and dynamic can transform our engagement with this frontier edge of technology. I look forward to sharing the next version of this interface with you and exploring the new possibilities it will unlock.

Mandelbrot Explorer in Pico-8

Here’s a link to immediately play the game/simulator/explorer described in this post and here’s the source code in case you’re interested in exploring that.

Controls are standard keys/buttons for Pico-8:

  • up/down/left/right arrows to move around
  • if on the computer, “Z” and “X” keys to zoom in/out
  • if on mobile, “O” and “X” buttons to zoom in/out
  • Hidden (and confusing) settings adjustment mode can be chosen from the menu (accessible by the “-” shaped start button on mobile or “enter” key on the computer) under the “settings” option, in which case up/down/left/right turn into adjusting the cutoff and iteration count respectively. Choose “navigate” from the menu to go back to navigation.

I wanted to learn more about fractal geometry and complex numbers, mainly in hopes of being able to use them in simulated game universes, so I figured a fun first pass would be building a mandelbrot generator in my favorite game framework, Pico-8.

There’s a lot of reasons why this is a bad idea:

  • It’s slow, in fact, the lua expressions are intentionally slowed down so it runs at the same speed on every platform.
  • It does not give any direct GPU access, all written expressions get executed through the lua interpreter.
  • All numbers are represented by 32 bits, 16 bits for whole numbers and 16 bits for fractional numbers, eg 0x0000.0000 to 0xffff.ffff which means you can only zoom in somewhere around 1000x before the math breaks down from being unable to divide the region the camera covers into pixel-sized units for computation.
  • Using arrow keys and Z/X for exploring an equation isn’t very intuitive.

However, it does let me slap it on a static page and rapidly iterate without having to learn a single new tool or library, so for that reason I’m a huge fan of it for doing prototyping and exploration.

The Responsive Tortoise

Not having access to the GPU (not to mention having every expression be artificially slowed down) means we’re not going to be able to calculate every of the 128×128 pixels while doing 60 frames per second. What we can do though is calculate and draw some of the pixels each frame, and then after a number of frames the picture will be complete.

The naive way to approach this would be start at the top left and work our way across each row, and that would be fine except you wouldn’t have a clue of what you’re looking at until it’s calculated around half of the pixels so you can see around half of the screen. That’s a problem when you want to navigate around quickly and keep restarting the draw process every time you move.

Instead, what I did was start with an extremely low resolution render (8×8 pixels) which meant only 64 coordinates to calculate – a lot by hand but doable in 1/60 of a second by a computer, even with something as slow as Pico-8. At that point, after the first frame, you have some hint at what the final image might be. From there it continues to increase the resolution until it fills it all out at the native 128×128, which takes many frames but usually only about 5-10 seconds which isn’t very long to wait.

The trick was how to gradually keep increasing the resolution without wasting work in the process? Basically, I use the top left corner of each oversized pixel to determine what color the whole region that oversized pixel covers, and then I redraw over the other parts of the oversized pixel with progressively smaller pixels until every 128×128 spot on the screen has had exactly one calculation done for its coordinate. This is difficult to explain, but if you watch the animation or play the game and keep an eye on the top left corner of each mega-pixel as it enhances the image you’ll see the color never gets replaced.

Breaking the 0x0000.0001 Barrier

I thought this would be an interesting opportunity to try to build my bit-management math skills and see if I can view things at a smaller scale than the above described version could. The only way this is possible in Pico-8 is by representing a number with a list of numbers, eg, to represent 64 bits of data in a world made of only 32 bit numbers you can simply use two 32 bit numbers and split the data between them.

This took some doing to figure out, but I ended up with a polynomial kind of representation, like the first number is just multiplied by 1, the second number is multiplied by 0.5^32, the third number is multiplied by 0.5^64, so if your numbers were 3, 4, 5 then your represented number would be 3*1 + 4*0.5^32 + 5*0.5^64. We of course can’t actually multiply this out in the program since the result would be too small, but we don’t need to, we can just keep representing it in parts as we do the math with normal old polynomial addition (add all similar multipliers together) and multiplication (multiply all multipliers of each side against all multipliers of the other side) rules.

Turns out this works quite nicely with the whole complex number thing, because complex numbers are multi-part too, eg, 5+6i. So the rational and complex component of that complex number end up having a list of sub-numbers to represent arbitrary precision.

And to avoid leaving anything out, my examples here talk about 32 bits but there’s actually only 31 since 1 bit is for positive/negative and screws everything up if you try to use it when leveraging the native addition and multiplication. My code worked around that but it’s too nitty-gritty to get into here, so I’m just going to gloss over that part.

Dealing with Overflows

Unfortunately, we still have to deal with overflows. If you multiply 0x0.001 by 0x0.001 then you get 0x0.000001 but we can only represent numbers as small as 0x0.0001 so the result we’d get back would be 0x0000 with no hints at what kind of overflow we had. If we knew the overflow, then we could carry it over, shift it way to the left (so the bit was at the most significant, not least significant, spot), and add it to the number representing the next level of bits.

After a lot of brainstorming I figured out how to do this though. In the above example, for multiplication, to figure out what the carry is that we want to send to the “right” (ie, to the number representing the next least significant set of bits) it’s a multistep process:

  • Shift both numbers 8 bits to the left (0x0.001 becomes 0x0.1)
  • Multiply them together (0x0.1 squared becomes 0x0.01)
  • Shift the result 16 bits to the left (0x0.01 becomes 0x100.0)
  • Add that to the next lower bit range (so it’s now 0x0.0 + 0x100.0 * 0.5^32)

A similar process is done but opposite to calculate what value is carried left to the next more significant bit range.

Impact on Speed

As you might imagine, converting the relevant variables to this complex polynomial representation induces an explosion of extra calculations and had a DRASTIC impact on speed and made running the same high-level calculations as before basically impossibly slow. I spent a bit of time trying to limit various things to try to get the speed to be vaguely bearable but it was still way too slow to be able to reasonably navigate anywhere.

I did my best on the math but taking a look at a known coordinate suggested that while it was indeed mostly working, there were still a few glitches to work out… It seemed like something around my representation of positive/negative wasn’t quite right. I am tempted to keep working on it, but due to the speed the utility is essentially nilch so I think I’ll abandon the project here and move on to greener pastures.

If you’re curious about the complex number library I made (although I recommend you not try to use it) you can find the code in a separate branch (infinite_zoom) here and if you want to try this super slow, glitchy monstrosity you can play it with this link.

Divided Single-Player Colored-Tile Prototype History

This post mainly catalogues the series of changes I waded through in game prototyping adventures for a month or two, mostly for my own purposes. If you’d like to skip to playing the final result, it’s here.

This is a list of all the significant, stable versions of a game prototype I occasionally worked on during evenings for a month or two. I adjusted the exported index.html to optionally take in a query param of blob that points to a git SHA and it will load the index.js corresponding to that point in the code history.

TL;DR – I can make playable links to past code!

I found myself backtracking too much while working on a more complex, multiplayer version of this game so I decided to spend some time with a single-player prototype version of it for awhile so I could quickly shift the rules around to get a feel for what works and what doesn’t without a week of coding for each trial. Each of these chunks of changes mostly correspond to less than a full day of work. The first link with the 7 digit hex text is a link to play the game at that version, the “browse” or “compare” links take you to the source code diff that correspond to the described changes.

The eventual goal of this project is much more complex than these prototypes suggest, their purpose is to really focus on the gameplay mechanics possibilities that come out of a world made of tiles that can be one of either of two colors, or neither, as well as creatures that are made of the same color energy as the tiles (referred to after this as “mobs”) which have some sort of “conservation of energy” type relationship with the color in the tiles. The player is an interloper of sorts that can blend in with either of the colors harmoniously or fight against the current chaotically.

In the eventual game there will be more robust goals and interactions between systems (this colored tile system would be one of many of those) but for the purpose of these prototypes consider your goal to be moving down to the bottom of the grid without getting surrounded. For the first handful of them that might be a bit too easy, so alternatively try arranging tile colors into a certain pattern. Basically just move around the game universe and get a feel for it.

Controls: up, down, left, right – moves selection cursor
mobile: “O” button to select a move, “X” button to change targeted direction (later prototypes only)
pc: “Z” key to select a move, “X” key to change targeted direction (later prototypes only)

7a6b4b5 (browse) – first js export

  • moving only, no interaction with room
  • no sounds
  • ugly colors
  • no mobs

8a0326f (compare) – added mobs and pathfinding-derived moves

  • mobs added that move towards you and block you
  • they move to where you were, not where you’re going, so too easy to avoid them
  • they spawn all around a tile when you pick it up so if you pick up a lone tile they completely surround you
  • if you know how the game works there’s no challenge, if you don’t then it’s too suddenly punishing
  • you only pick up/drop a tile when you choose to, no advantage to holding color
  • you can move over any tile you want while not holding color
  • can’t hold more than one of a color
  • can’t kill mobs in any way
  • mobs only chase you when you’re holding color

05d9748 (compare) – added sound effects

  • mobs don’t move if you don’t move
  • sound effects which ended up sticking around for awhile
  • pathfinding animations for mob and player, they all move at the same time although the mobs still move towards where the player was
  • pick up and drop automatically when moving
  • only spawn 1 mob per color action
  • better mob colors

520888b (compare) – more aggressive mobs, more sfx, animation improvements

  • cursor selection visual tweaks
  • all mobs chase you when gray
  • mobs cancel out with each other when adjacent
  • add slight quadratic easing for movement
  • make mobs appear to come out of your avatar when spawning
  • add sfx for movement, failing to move the cursor, and mobs canceling out
  • player moves slower while no color
  • 2 mobs spawn from every color action

dd916d7 (compare) – mobs chase player, back to 1 mob spawning, selection visual tweaks

  • mobs go to where player moves to, not where player moved from
  • 1 mob spawns for each color action
  • circular selection icon
  • clearer avatar shape/color to make seeing the background tile easier and the avatar color change more obvious

92a02a7 (compare) – huge scrolling map, mobs cancel out from spawning

  • map extended below the screen
  • a panning camera that follows the avatar added
  • when spawning mobs via color actions, prioritize any mobs that can cancel out as targets automatically
  • because of how many enemies there are and poor optimizations, there’s moments of noticable lag when the enemies are pathfinding
  • note – shooting one’s way out of being surrounded is nice in this version since the mobs don’t move unless you do so you can blow open a hole and escape even in a crowd… would be interesting to see what adding user-aimed shots to this would be like

8b45d30 (compare) – player/tiles can have a stack of color rather than just 1, mobs can move when player doesn’t

  • Disable mobs cancelling out on their own (need to “shoot” new mobs at old mobs to kill them)
  • Simplify camera movement
  • Allow mobs to move when player waits (unless the player is immediately adjacent)
  • Make player and tiles able to accumulate color – dropping n accumulated color drops it all at once and spawns n mobs
  • Add particle effects to indicate a high stack of color
  • when spawning mobs, if there isn’t room for them then don’t spawn them even though you “spent” color in the process
  • note – it’s too easy to kill endless mobs safely by making sure you’re holding color and cycling pick up and drop until there are none left

3943fc6 (compare) – balance tweaks to fix previous version being too trivial

  • Make player unable to pick up color without moving – prevents pickup/drop cycling to clear nearby enemies trivially
  • Fix a few bugs around particle emitting
  • Make mobs not block mobs/player of the same color
  • Adjust player and mob max move distances for balance
  • Make a separate variable for how far away a mob can be activated by a player from how far they can move (ie, they can now move towards a player even if they can’t quite it reach yet)

d2de1e8 (compare) – Add experimental mouse support

  • Difficult to detect whether the player has a mouse or a finger
  • Types of interface for mouse and finger are very different and it’s a lot of extra maintenance and design to keep them both in-line
  • Eventually removed in a later commit due to new features that were too tedious to make work with a mouse, but sticks around until then

06f3cb4 (compare) – Add color-stacked enemies and a death screen

  • Make 4 different enemies representing color stacks of 1, 2, 3, 4+
  • When shooting out more than 1 color it comes out as a 1 large mob rather than n mobs of size 1
  • Add a “DEAD – click to try again” death screen where the enemies scramble around your dead body aimlessly
  • Target the largest nearby enemy when shooting out mobs so you can kill the biggest one quickly
  • Start the player at 3,3 instead of 0,0
  • note – targeting the biggest enemy ended up making it too tedious and complex to clear groups of enemies

0485c63 (compare) – smarter mobs, more restricted gray movement, dropped color only results in mobs

  • As gray, you can only move onto the edge of a cluster of color – once pathfinding walks over a color tile it stops searching
  • When dropping color it does not turn back into a tile, solving the cycling exploit without the confusion (previously it was solved by not being able to pick up a tile without moving) – also tends to make the total amount of color shift in forms rather than get created or destroyed
  • Mobs now prioritize targets (in descending order of priority – neaby enemy players, nearest enemy mob, larger friendly player, nearest friendly mob larger than us)
  • If a mob has no viable targets, wander aimlessly
  • When a stack of color is held, only fire one color instance each turn rather than a big shot of 4 color
  • Fix death screen not being centered after moving the camera
  • note – at this point I’m starting to try to polish existing behavior to a “local maxima” of good gameplay to wrap up the prototype project

b8bb143 (compare) – Simplify and cleanup code, make mobs prioritize players targets similarly to mob targets, choose which direction to shoot

  • Mobs used to have separate prioritization for targeting mobs vs players but that’s now been flattened to make players treated similarly to other mobs to give more of a “being among equals” feel
  • Direction to “fire” new mobs is now explicit – shows you what color it will be, prioritizes a nearby opposite color, and allows you to change the selection with button 2
  • Mouse support was finally removed to avoid having to support fire-direction with the mouse somehow
  • Lots of code removal and cleanup now that a lot of behavior has been properly retired

74ac27e (compare) – Targeting visual tweaks, go-like capturing, HUD-like indication of color stack or vulnerable (gray), friend-boosting

  • Made targeting graphics a bit clearer about what’s about to happen
  • Enemies can now be not only canceled out by “firing”, but also surrounded like go stones. when surrounded they revert to tiles
  • Visual indicator of how many colors you have or if you’re vulnerable (gray) since it’s way easier to die while gray
  • Now possible to fire at friend mobs (ie, the same color) which makes them bigger. Usually this is undesirable, but I needed to add it to avoid a possible situation where you can land between friendly mobs and not have any possible direction to shoot or move