<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>SystemArchitecture on Pavan Chavali</title><link>https://pchavali09.github.io/tags/systemarchitecture/</link><description>Recent content in SystemArchitecture on Pavan Chavali</description><generator>Hugo -- gohugo.io</generator><language>en-us</language><lastBuildDate>Wed, 08 Apr 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://pchavali09.github.io/tags/systemarchitecture/index.xml" rel="self" type="application/rss+xml"/><item><title>Experimenting with TOON: A 40% Reduction in LLM Tokens?</title><link>https://pchavali09.github.io/posts/toon-world/</link><pubDate>Wed, 08 Apr 2026 00:00:00 +0000</pubDate><guid>https://pchavali09.github.io/posts/toon-world/</guid><description>&lt;img src="https://pchavali09.github.io/posts/toon-world/toon-cover-image" alt="Experimenting with TOON: A 40% Reduction in LLM Tokens?" style="max-width: 100%; margin-bottom: 20px;" /&gt;&lt;br /&gt;&lt;p&gt;I recently looked at the GCP bill for the &lt;strong&gt;&amp;ldquo;Revenue Radar&amp;rdquo;&lt;/strong&gt; agent I built (the one I documented in my &lt;em&gt;&amp;ldquo;Beyond &amp;lsquo;Hello World&amp;rsquo;&amp;rdquo;&lt;/em&gt; deep dive), and the usage costs provided a significant and unexpected reality check.&lt;/p&gt;
&lt;p&gt;The Python code was clean. The logic was sound. But the sheer volume of JSON I was shoving into Gemini&amp;rsquo;s context window for every single RAG retrieval was burning through credits like a startup burning through VC cash in 2021.&lt;/p&gt;</description></item></channel></rss>