Why?
Yeah, yet another wrapper around OpenAI and Anthropic APIs. What's different?
Glim takes a very pragmatic approach and optimizes for immediate developer productivity, and on managing complexity.
Specifically, goals include:
- Make it easy to iterate on prompts separately from iterating on the code
- Allow changing of the LLM model as easily as possible, without having a lowest common denominator type of situation
- Make it as easy as possible to send requests asynchronously without having to think about concurrency more than necessary
There are a number of convenience features:
- Responses are cached, so that if your code had a bug and you run it again, it's faster and doesn't cost you anything
- Easy to determin token usage and cost
- Template language for prompts (erb)
- Tools for including files in the prompt and extracting them from the response
- Convenient handling of OpenAI "functions"
- Handle rate limits in a smart way
- logging of requests and responses
Status
Still lots to do! Feedback appreciated. Specifically:
- spec template idea in general?
- ruby conventions i'm violating?
Getting Started
Install the gem:
gem install glim_ai
cp sample.env .env
bin/setup
Add your API keys to your copy of .env.
With that, you're good to go, example code:
require 'glim_ai'
glim = GlimContext.new
req = glim.request(llm_name: "gpt-3.5-turbo")
req.prompt = "Who came up with the phrase 'Hello World?'"
puts req.response.completion
puts "Cost = $ #{req.response.total_cost}"
Or see the examples.
A word about paths
There is a lot going on, with logs, the cache, generated code, templates, etc. When working on the gem, what currently works is running everyting from the project root, i.e.
ruby examples/glim_demo/glim_demo.rb
Paths can be set in .env.
Design Details
A GlimRequest represents a request to an LLM to perform a task. GlimRequest itself contains functionality and parameters that are common to all supported LLMs:
- parameters like temperature, top_p, etc
- the name of the llm to be used
- code for handling erb templates
- token counting and cost estimate code
To support functionality that is specific to some LLM APIs, there is, for each support LLM API, a GlimRequestDetails class that is instantiated dynamically based on llm_name and then any missing methods in GlimRequest are delegated to it.
So each GlimRequest can have a reference to a GlimRequestDetails object, to which it delegates any methods. The GlimRequest, potentially with support from a GlimRequestDetails object, has to meet one key responsibility: After it is created, it must at all times be able to provide a request_hash, which is a Hash that contains all of the data that needs to be sent to the LLM's API in order to submit the request.
Thus, the GlimRequest and GlimRequestDetails must, whenever the user make a modification to either, update its internal request_hash to stay consistent.
There is one tricky situation that is a bit annoying, but we decided to be pragmatic about it and tolerate some awkwardness: If you change the llm for a GlimRequest to an llm that requires a different GlimRequestDetails class, then the GlimRequestDetails will be replaced and any data in it is lost.
For example, when changing from "gpt-3.5-turbo" (ChatRequestDetails) to "claude-instant-1" (AnthropicRequestDetails), then the output_schema or function_object will of course be deleted. This is facilitated by the GlimRequest creating a new AnthropicRequestDetails instance; as it is created, it is responsible for making sure that the request_hash is accurate. In the other direction, changing from claude to GPT, similarly, a new ChatRequestDetails instance would be created.
Above we have described that (and how) a GlimRequest can always provide a request_hash object. This hash is used for generating the cache key. If the hashes are identical, we don't need to contact the LLM API again, which saves time and money. The corresponding GlimResponse class can call GlimRequest#request_hash to obtain the necessary data, and then it is responsibe for sending the request off to an LLM, as well as interpreting the response and making it accessible in a convenient way to the user.
There is one additional feature that is related: For each GlimRequest, there is a log directory, in which at any time there are several files that represent the content of the GlimRequest:
- generic_request_params: temperature, llm_name, etc
- prompt
- template_text (if a template was used)
- request_hash
And for ChatRequestDetails, also:
- messages: the array of messages, up to and including the message that will be sent
- output_schema.json
Once a response has been created, it would also contain:
- raw_response.json: the exact reponse as received when making the LLM API call
- completion.txt: just the completion that was generated by the LLM for this request
License
The gem will be available as open source under the terms of the MIT License.
TODO
- write a better README
Features to add / change
Autocompress older part of chat conversation
(2) AICallable
- more data types: array, boolean?
- allow changing the ai_name for the function, not just the args; gpt seems to look at the names more than the descriptions
(3) Better name for request#response that makes it clear this is async?
- comments appreciated
(4) support "continue" prompting, especially for claude; 2k tokens is not much
- need to figure out if there is a way to get claude to plan its responses to make them longer? --- ask Luke
(7) web view on glim_log / the input and outputs for the llm? make use of anomalies that are in files; analyze them automatically
(8) Token healing? https://github.com/guidance-ai/guidance
TODO
store extracted fields and files in glim_log/...../
-
extract files as prefab prompt -- code -- non-code
-
extract_field --- have glim generate code
-
show completion for anthropic - unclear when it's missing?
-
support replicate