#73
Sarvam, Made with Google, Grok 2, Anthropic Caching, Agents on the rise, MoonEcho, PanelsDesu, router, Base32, Team Rocket, Whisky to Mercury, Long-term selfish and more
👋🏻 Welcome to the 73rd!
We are back to shipping on Monday mornings because weekends were busy and a shit ton of things happened over the week.
Let’s dive in. (this one is a long edition, no! don’t, shh!)
📰 Read #73 on Substack for the best formatting
What’s happening 📰
📆 Applications are now open for YC's first-ever fall batch, which will begin in San Francisco on August 29th — with Demo Day in early December. Apply now if you have the ideas to build anon!
🗞️ Crypto x AI goes brr with Polymarket and Perplexity AI partnering up to show news summaries on Polymarket. Bet with facts.
✨ AGI Digest
⚓️ OSS Model Drops
𝕏 xAI announced the beta release of Grok-2 and Grok-2-mini, which are right there on the top with the likes of GPT-40, Llama-3-405b, and Claude-3.5-Sonnet, with some very impressive benchmark scores, even claiming SoTA on the MATH and MathVista benchmarks. It has been on LMSYS for a while with the name
sus-column-r
and beats even 3.5-Sonnet and GPT-4o on human preference on the leaderboard. Both the new Grok models are currently available to Twitter’s Premium and Premium+ users along with a collab with Black Forest Labs for image generation via their FLUX.1 models (which can be quite unhinged BTW). There’s an API to be released soon as well.
No word about open-sourcing it for now. BTW, Grok-1 was open-sourced ~4 months after its release so maybe this would follow the same route? We speculate this because of how Musk has been dissing OAI for not open-sourcing its models repeatedly.🏺 Nous Research released Nous Hermes 3, the first chat fine-tunes of the Llama 3.1 herd of models, primarily focusing on being unlocked, uncensored, and highly steerable. The model is aimed at being excellently capable at tasks such as roleplaying, agentic tasks, reliable function calling, multi-turn chats, long context coherence, etc. Nous also partnered with Lambda Labs to provide Hermes 3 via their API.
🪷 India’s Sarvam AI had a bunch of releases:
Sarvam-2B: A 2B parameter LLM pre-trained from scratch on 10 Indic languages
Shuka v1: An audio-LLM that natively understands audio in Indic languages. Built using Saaras v1 as an encoder and Llama3-8B-Instruct as the decoder.
AI Legal: An AI-assisted workbench designed for lawyers to enhance their capabilities with features such as regulatory chat, document drafting, redaction, and data extraction.
And a bunch of different multilingual voice-enabled AI agents, starting at ₹1 / min.
💇 Nvidia Llama-3.1-Minitron 4B that was pruned and distilled from Llama-3.1 8B which saw a 16% increase in MMLU performance compared to training a Minitron-4B from scratch. Their blog also goes into detail on what techniques they used what worked well and what did not.
👨🏫 InternLM added a 1.8B and a 20B model to the 2.5 family. The 20B is aimed to compete with the Gemma-27B and the 1.8B with Qwen2-1.5B. Though InternLM claims its model models have better reasoning and tool usage, there is still not a large community consensus around it.
🍦 Service Updates
♊️ How to spot if a big tech is serious about AI in the present times? SoTA research? No. OSS releases? Nope. Better developer tools and ecosystem? Nada. Repeating “AI” enough number of times in their keynotes? Hell Yeah!
And boy, Google is damn serious about AI with the myriad updates it packed in its Pixel 9 launch event. The on-device Tensor G4 chip in it does much of the heavy lifting to make the magic possible by inferencing the models locally. And though there is a lot of new stuff that’s coming, Gemini Live was particularly interesting for us (because it is available to test in non-Pixel devices even now) and though it is certainly impressive in how low latency it is, we feel we are kinda spoiled by the expressiveness of OAI’s new ChatGPT Advanced Voice model (which is still behind a thick waitlist though!)💬 OpenAI released a (yet another) GPT-4o endpoint,
chatgpt-4o-latest
which will track the 4o model in ChatGPT and is optimized for chat. The one released last week (gpt-4o-2024-08-06
) is optimized for API usage (eg. function calling, instruction following). This is also evident in the aider's code editing benchmark which BTW continues the trend of each subsequent model in every GPT family getting nerfed a little in coding abilities.💾 Following Gemini's and DeepSeek's footsteps, Anthropic introduced Prompt Caching in its API as well making subsequent requests in a multi-turn conversation significantly cheaper! The TTL of the cache is 5 mins since the last cache hit though so you need to be wise with your API strategy.
🤓 Technical LLM Agents are on the rise
👩🔬 Sakana AI released an “AI Scientist”, a fully automated pipeline for an end-to-end paper generation. Given a broad research direction starting from a simple initial codebase, it can perform idea generation, literature search, experiment planning, experiment iterations, figure generation, manuscript writing, and reviewing to produce insightful papers (RIP underpaid RAs).
💻 Cosine.sh announced Genie, yet another “AI software engineer that tops the SWE-Bench”, achieving an impressive 30% score on the SWE-Bench and 50.6% in the SWE-Bench-Lite. They use a fine-tuned OpenAI model internally and not much can be said beyond that since it’s still behind a waitlist and closed-source :/
🧑💻 Salesforce DEI is (yet again) a “AI software engineer” by Salesforce AI. The difference between it and Genie mentioned before is that it is open-sourced (they promise the code to be released soon) and much more detailed and intrepretable. Though it does not top the SWE-Bench-Lite, but hey, we would anyday trade some drop in performance for a more configurable and open system that we can tinker around with!
🔐 0x Digest
💳 Metamask announced Card in partnership with Mastercard, and Baanx. The card is in pilot and available to selected users in the EU and UK only. They accidentally claimed to be the first such thing, which was later trashed by users as Algorand’s Pera Wallet did before. This is a big move anyway, as it allows you to instantly convert crypto to fiat and use it wherever Mastercard is accepted.
🪙 Coinbase tweeted "cbBTC" and is probably planning to build a huge BTC economy on the Base network.
🍋 Celestia is going to have its first major upgrade "Lemongrass".
wrote a great breakdown in “Inside Celestia’s Lemongrass Upgrade: Key Features and Enhancements”. The upgrade has 5 CIPs, the major ones being "CIP-14: Interchain Accounts", enabling one chain to control an account on another through IBC, and "CIP-20: Disabling the Blobstream Module" in favor of new and better Blobstream X".🗺️ Countries x Crypto
Binance and Indian regulatory’s “situationship” took an interesting turn, making Binance re-enter the Indian market, the app and website being available to users now. (too relatable with people blocking and unblocking you for no reason).
An interesting case of an employee not being able to claim their tokens as part of compensation after getting terminated led to the Dubai court recognizing crypto as a valid salary payment under an employment contract. No matter how we reach here, progress is progress.
🛠️ Dev & Design Digest
Remember Ladybird (we covered it in #67), Fireship just covered it with a few more memes and in their regular way.
🌭 The next version (v3) of node-canvas works in Bun, node-canvas is a Cairo-backed Canvas implementation for Node.js. In the next version, they use N-API1 over V8 C++ API, making it a big change and Bun to be even faster to support it.
📇 TS Expert Matt Pocock shares “Why I don't like enums?”, he explains how it's weirdly implemented in TypeScript and why you should avoid using them. Since there are three types of enums in TS: "number", "string" and "inferred", try not to use any other than "string", since they look almost the same as transpile code, the other two have their nuances.
🎮 Diablo is fully playable in your browser, YES! you read that right. You can play it out here and check the source code here. This is big as it unlocks more things that can be done with the game, once you bring something to the browser, people fork it and do wonder, and yes of course it's Wasm magic.
What brings us to awe 😳
🌕 Hainbach, an electronic musician from Berlin, found out about the Moon bouncing (Earth–Moon–Earth communication) and left no stone unturned, they went to a radio telescope in Dwingeloo. They played around with the moon, sending a ton of noises and getting its echoes from the moon back. Also, they collaborated with AudioThings to make a free plugin for MoonEcho (I wonder if this is what she meant by love you to the moon and back?!?)
🔎 How Google Search ranking works is an in-depth analysis of how Google's complex ranking system works and how the different components like Twiddlers and NavBoost influence search results.
💵 Cringey, but true: How Uber tests payments in production, the article dives into how it’s crazy that sometimes we test multiple times before rolling out anything to production, but it still breaks on production. In this case, Uber diminishes the chances of that, by testing in prod with a set of users, especially in sensitive cases like payments.
🖨️ Inkjet printers might be scamming you big time because they use the more expensive color inks even for b&w prints and underreport cartridge contents.
Today I (we) Learnt 📑
🗃️ Gestalt Principles are laws of human perception that describe how humans group similar elements, recognize patterns, and simplify complex images.
🌡️ The thermometer is essentially the same as those used today, except that it was filled with brandy rather than mercury.
😕 Linux’s Base 32 file encoding is not the same as Base 32 as you might see it in a math class. The first guess would be that Base 32 might use characters “0, 1, 2, …, 9, A, B, C, …, V”, BUT, actually it uses symbols “A, B, C, …, Z, 2, 3, 4, 5, 6, and 7” [as per RFC 3548]. This has been done keeping in mind that the “purpose of base 32 encoding is to render binary data in a way that is human readable”.
🚀 Most of the Pokèmon characters are named after real people. Jessie and James (yes from Team Rocket) are named after Jesse Woodson James (notorious outlaw) [Reddit Thread]. (really fun story of how we stumbled upon this though)
🤝 You have read ~50% of Nibble, the following section brings tools out from the wild.
What we have been trying 🔖
👾 Focumon: Turn your daily goals into a multiplayer adventure! Focus on co-working or co-studying.
🖌️ PanelsDesu: Search manga panels with vibes alone (yes fancy word for RAG).
📺 YTCH: Watch YouTube channels together as if it were a TV and out of your control.
🗣️ voice-writing-electron: A real-time, instant dictation desktop application built on Electron that uses Whisper and Groq under the hood.
Builders’ Nest 🛠️
📦 quickjs: A TypeScript package to execute JavaScript and TypeScript code in a web assembly QuickJS sandbox.
🔀 router: A framework agnostic, tiny URL router for Nano Stores state manager.
🕸️ six-degrees-of-torvalds: Discover how you or any other GitHub user connects to Linus Torvalds through shared repositories.
🔎 AutoRAG: Find the optimal RAG pipeline for your data.
Meme of the week 😌
Off-topic reads/watches 🧗
🗓️ Mediocrity and perfectionism by Seth Godin. It briefly covers how it’s surprising to realize they’re the same.
🤝 Ben Horowitz on the ideal founding team for a startup, saying “The best thing to do is to have two people who know how to do both, but at least one person who’s world-class in each."
🌱 Long-term selfish by Seth Godin. Everyone is selfish, but are you being short-term selfish? If so, pivot to being long-term selfish and thinking for all.
Wisdom Bits 👀
“Beware of overconcern for money, or position, or glory. Someday you will meet a man who cares for none of these things. Then you will know how poor you are.”
― Rudyard Kipling
Wallpaper of the week 🌁
🌌 Grab the week’s wallpaper at wow.nibbles.dev
Weekly Standup 🫠
Nibbler P had a good week, but seeing the same dark grey monsoon skies laden with heavy rain clouds every day is a bit gloomy. He binged on Erased to add some colors to the boredom and revamped his portfolio a little (once again).
Nibbler A had a long off-screen week, a bunch of travel, and some window shopping with friends and family. He did some catch-up with anime and played some chess (helping a friend, lol). He is prepping for the coming work week.
If you liked what you just read, recommend us to a friend who’d love this too 👇🏻
N-API (pronounced N as in the letter, followed by API) is an API for building native Addons. It is independent of the underlying JavaScript runtime (for example, V8) and is maintained as part of Node. js itself. This API will be an Application Binary Interface (ABI) stable across versions of Node.