#97 — OAI will have the best coder by year end, Society-as-a-service and DOM's native state preserving move API and more

Qubits goes brrr.., Sam-Elon banter, Gemini's new models, Agent Wallets, Unichain goes live, Network School v2, httptap, WeightWatcher, You Should Get Wet & more

and

Feb 13, 2025

👋🏻 Welcome to 97th!
Heyyy! So it’s another mid-week Nibble, and we ask for just one weekend to get back on track. Now Let’s dive in.

📰 Read #97 on Substack for the best formatting
🎧 You can also listen to the podcast version of Powered by NotebookLM

1×

0:00

-15:21

Psst… if you have a second to spare and you want The Nibble to improve, let us know your thoughts by filling out a small questionnaire. We’ll reinforce the good parts and weed out the bad ones. 💪

Improve The Nibble!

Now onto the edition…

📍ETHDenver - February 23 - March 1 • Denver, USA

⭐ YC is hosting its first-ever AI startup school in San Francisco on June 16 and 17th. It is a free event but the attendees will be hand-picked to include some of the best 2000 grads and tech industry experts.

What’s Happening 📰

🪄 For starters, we got Quantum Internet before GTA 6. We got what? Yes! Quantum Internet. Let us brief you a little, regular internet is when bits can travel over the wire, right? Well, if you can make qubits1 travel over the wire, it’s what Utopia and that’s what Oxford Scientists claim, that they have achieved teleportation using Quantum Computer.

It’s not as big a feat as the headline, but it’s not any small feat too, they transferred logic gate computation in qubits over the wire.

✨ AGI Digest

🗞️ OpenAI loves being in the headlines, but where’s Claude 4?

Following last week’s DeepResearch release which now has rolled out to all Pro and Team Users (and got some quick Open-Source replications in the same time), OpenAI’s site underwent a major design rehaul and dropped an eye-popping SuperBowl ad. What made more news though is the long-ongoing Elon-Altman banter where Musk again asked “But hey, when are we dropping the Open?”, teasing Sam. Never a dull day in AI fr.

And being the richest man on Earth that he is, Musk presented a $97.4 billion bid to acquire control of OpenAI, to which Sama responded, “No thank you but we will buy Twitter for $9.74 billion if you want”, and a lot more.

Media presence aside (honestly, we don’t care much about what’s happening between the two billionaires and their long-held spite, as long as at least one of them keeps shipping good models), OpenAI did learn its lesson from DeepSeek and decided to show an updated (longer?) CoT for the o3-mini model family. It’s still not the entire actual reasoning and is in fact still a summary so idk if that should be counted as a step forward.

What’s a step forward though is sharing a roadmap of the GPT-4.5 and GPT-5 which would release in the next few weeks/months. Though would they release the GPT-4o image generation still remains unanswered. GPT 4.5 would be their “last non-chain-of-thought model”, after which GPT-5 would merge the o-series and and GPT-series models by creating systems “that can use all our tools, know when to think for a long time or not, and generally be useful for a very wide range of tasks”.

And oh boy, they do have very impressive models, just watch the video below from 19:56 (already timestamped in the preview below). They internally have a model that is equivalent to the 50th-best programmer in the world (at competitive coding) and they might be able to churn it to be the number one by the end of the year!

And will Anthropic just sit tight releasing research insights and red-teaming results while letting OAI take away its cake just like that? And that too after releasing the Anthropic Economic Index which btw clearly highlights how its models are being used for Computer and Mathematical tasks far, far more than any other use case?

They better release the Claude 4 soon or it’s Owari-Da(rio) for them.

A horizontal bar chart titled 'AI usage by job type' comparing the percentage of Claude conversations (shown in coral) versus percentage of U.S. workers (shown in black) across 22 job categories. The bars represent representation relative to the US economy from 0% to 40%. Computer and mathematical jobs show the highest Claude usage at 37.2%, while office and administrative support has the highest workforce percentage at 12.2%. Farming, fishing, and forestry show the lowest percentages in both categories at 0.3% and 0.1% respectively. Most other categories fall between 0-10% for both metrics, with some notable presence of AI usage in areas like education, entertainment/media, and the sciences. — Source: Anthropic Economic Index

🧮 Reasoning AIs keep nailing Maths

AIME I 2025 questions were released some days ago and we have all the reasoning models taking the limelight, led by OpenAI’s o-series. o3-mini-high the highest score — an impressive 80% accuracy on the problems while costing merely $3.19 which is really cheap! DeepSeek is not far behind and even the distilled models achieve 50% accuracy, though don’t take it for a “general” intelligence just yet.

The reason why these Maths and Programming benchmarks, especially GSM8K and MATH, are the favorite of reasoning model makers is that the solution to the questions in these datasets can be easily broken down into small verifiable steps — exactly the kind of stuff that a reasoning model is supposed to do in its chain of thought, correcting itself if it takes a wrong step and trying alternate approaches to make sure it is understanding the problem correctly from all angles.

In fact, the IMO Gold Medal, which up until just two years ago was seen as too tough for LLMs to solve and possibly even a mark of AGI if a model manages to solve it, has received a fair share of attention from both Google DeepMind and OpenAI. DeepMind’s AlphaGeometry 2, which is based on its Gemini models, solves 84% of all IOI geometry problems of the last 25 years, compared to 54% of its predecessor AlphaGeometry 1. This makes it better than an average gold medalist in Geometry. On the other hand, OpenAI’s unreleased o3 scored 395.6 points in the IOI 2024 competition (the cutoff for gold is 360)!!! This model also scored 71.7% on SWE-bench Verified, a first for a model to cross 50%!

Source: Competitive Programming with Large Reasoning Models

📱 Release the app now!

Seeing DeepSeek amass huge success with their app and there still not being a competitive LLM App offering from Europe, Mistral quickly revamped Le Chat offering Pro and Team plans and launched native apps for Android and iOS.

And with the custom hosting on Cerebras, what Mistral calls “Flash Answers”, it can answer at a blazing speed of 1,100 tokens per second! It’s not just Mistral, even Perplexity shifted its own Sonar models to serve using Cerebras at over 1,200 tokens per second! Talking of fast inference, Groq recently integrated their models on OpenRouter and also finally launched their long-awaited Dev Tier.

Replit also noticed a gap in the market which until now did not have a good prompt-to-site mobile app and launched their Replit Agents on mobile.

AllenAI took a step ahead and launched an app that runs its OLMoE model natively on iPhones 15 Pro and newer, and M-series iPads, while also open-sourcing the code of their apps!

Image capture of the OLMoE app working on an iPad and iPhone. — Source: OLMoE, meet iOS

GitHub taps into “The Force” and integrated agentic mode into Copilot, similar to what Cursor has. This feature is calable of iterating on its own code, recognizing errors, and fixing them automatically. It can suggest terminal commands and ask you to execute them and analyzes run-time errors with self-healing capabilities. It also has the new Gemini Flash 2.0 model available in the chatbox.

They also demoed a custom SWE agent codenamed “Project Padwan”. This allows you to assign Issues to the agent which comes up with a pull request addressing the change.

⚓️ New Model and Data Drops

After much testing in the past few months, Gemini Flash 2.0 was finally made GA. Along with this, they also released a Gemini 2.0 Flash-Lite and Gemini 2.0 Pro (both in experimental preview). All three models support multimodal input and the 2.0 Flash and 2.0 Pro would also support image and audio outputs in the future along wiht Live Multimodal API support.

The Gemini 2.0 models deliver significant performance improvements over Gemini 1.5 across a range of benchmarks. — Source: Gemini Flash 2.0 Release Blog

We don’t know if the 2.0 Experimental is final right now because looking at the stats, the Flash 2.0 does the most heavy lifting from the previous version, and the Pro honestly looks like a humble upgrade in front of it. But if you had to choose one, Gemini 2.0 Flash performs best on the price-performance-cost spectrum. And it’s really good for OCRs!

Gemini family pricing comparison 2.0 flash lite — Source: Gemini Flash 2.0 Release Blog

The Gemini App is also upgraded with the new Flash and Pro models along with the 2.0 FLash Thinking and a brand new “Gemini 2.0 Flash Thinking Experimental with Apps” which can work with YouTube, Maps, and Search, in turn watching videos or navigating in detail and then answering your questions.

LMSys now has a nice Arena Score-Price Plot showing the price vs. performance trade-offs for LLMs.

Bytedance is going crazy with realistic video generation. First they released Goku, a family of joint image-and-video generation models which gets top scores on image adn video generation benchmarks such as GenEval, DPG-Bench and VBench. And, they also announced Goku+ which is built on top of Goku specifically designed to optimize advertising scenarios.

Then, they released OmniHuman-1, an end-to-end multimodality-conditioned human video generation framework that can generate human videos (with audio!!!) based on a single human image and motion signals.

Prime Intellect introduced a distributed synthetic data generation system called Synthetic-1 to create verified reasoning traces for math, coding and science, using DeepSeek-R1. The dataset contains over 1.4M high-quality tasks and verifiers to generate data for training high quality reasoning models.

Nomic released a first-of-its-kind MoE Embedding model nomic-embed-text-v2-moe. This multimodal embedding model has an active parameter count of 305M (475M total) and is trained with MRL for using truncated embeddings, if required, to optimize storage. It performs pretty great compared to models in its size range, though it only supports input lengths of up to 512 tokens.

image/png — Source: Nomic Embed Text v2 MoE HF Page

Zyphra released Zonos-v0.1 beta, a family of TTS models with high-fidelity voice cloning. The two 1.6B models — one transformer and one hybrid (Mamba2) — are available to download from Huggingface as well as on their model playground via the API at a rate of $0.02 per minute. The expressiveness in voice is really good and the 1.6B means that it can be run at almost-real-time without very high VRAM requirements.

🔐 0x Digest

🤝 Rippling partners with Unicâmbio, Portugal’s leading currency exchange.

This introduces instant cross-border payment services between Portugal and Brazil.

👛 Crossmint introduces Agent Wallets for AI Agents and launchpads. WTF? Yes! let us explain 👇

Each agent wallet is supposed to have two keys, none visible to either you or Crossmint, Passkeys or Embedded Wallets manage one, and the other is secured in TEE. This sets your agent free to make the on-chain entropy positive and you free of regulatory BS.

🦄 Uniswap’s L2 Unichain goes to Mainnet, and launches itself as Stage 1 rollup with permissionless fault proofs for now. And promised to soon allow anyone to host UVN (Unichain Validation Network) node and verify blocks.

After almost 4 months in testnet, they knew what was required to get on with the user’s needs. We remember people (us too) laughing when Uniswap launched a wallet and then an L2 Testnet, we still don’t regret it

🎓 Balaji’s Town cum School is coming with its second cohort Network School 2025 or as per his plan “Society-as-a-service (v2)”

It is a year-long program for 256 members that began March 1, 2025, on an island near Singapore. While v2 is running they’ll be building a permanent campus for Society from scratch (v3) for 1024 people.

🛠️ Dev & Design Digest

↕ Jim (Engineer at Gronola) wrote a piece on “Don't animate height!” explaining how their app was using 60% CPU and 25% GPU (no, not because it was in Electron, calm down guys). Jim unveils a pure CSS animation that causes resource hogging as if it were JS (crazy, right?), a simple frequent transition on height triggers a layout recalculation (and as we know from How Browser’s Rendering pipeline works?), it’s fucking expensive thing to do.

So, they did what any of us would do → move to composite properties (yes the cheapest ones), which ones you ask? transform and opacity, leading to better resource usage.

🆕 A new DOM primitive moveBefore(), state-preserving atomic move API just dropped. This new function will be available on the ParentNode.

It will help in moving elements within the DOM without losing their current state. It helps avoid unnecessary reloading or resetting of elements like iframes or dialogs when they are moved to a new location. You can read more about this from this Pull Request.

🪃 We’ve not got time to watch it yet, but Honeypot (known for building documentaries ~~of nerds~~ in tech ) released “Angular: The Documentary”.

The story of Angular from an internal side-project at Google to a sensational web framework to a boomer framework and back.

🤝 You have read ~50% of Nibble, the following section brings some fun stuff and tools out from the wild.

Share The Nibble

What Brings Us T(w)o Awe 😳

🪟 “Due to concerns about the fact that acknowledging the existence of certain countries can be perceived as a nominally political stance, Microsoft has opted to just avoid the issue altogether by not including country flag emojis in Windows’ system font.” from The Dumb reason why flag emojis aren't working on your site in Chrome on Windows. This article quickly explains why adding a country flag emoji to your website without polyfills can bite you back.
🎮 A developer found a bug in the Marvel Rivals game that exploits their hotfix patching system to remotely execute code (RCE). More importantly, the developer clarifies that there are many such cases and rants about how Game Developers mostly don’t care.
🥚 How far will you go to get a perfectly boiled egg? “A lot!” say a group of researchers from Italy who have come up with a groundbreaking way called “periodic cooking” to improve texture and nutritional content in boiled eggs compared to other traditional techniques. The idea is that both the albumin and the yolk require different temperatures for optimal cooking (85 °C and 65 °C respectively) which is hard to achieve unless you prepare them separately. The way they achieve this is by placing the raw shell-on egg alternatively in hot water and cold water for relatively short periods of time and repeating these cycles multiple times until the optimum cooking of both the yolk and the albumen is reached. Look at the diagrams below for yourself!
Innovative? Yes, very much. Are we going to cook our eggs this way now? A good soft boil serves just fine with much less effort, thank you very much.
And this is how you
💣 Turns out that you can use Zero Width Joiners (ZWJ) sequences to encode an unlimited amount of data in a single emoji to overwhelm LLM’s context windows (aka TokenBombing by Pliny, because why bend when you can bomb?) In simple words, it adds a bunch of zero-width UTF characters to your inputs which are not rendered for human eyes but still take up space in your text.
This opens up a host of new kinds of attacks on LLMs if proper guardrails are not set on inputs which means even a single character by a malicious actor may crash your systems.
Source: @karpathy on Twitter

Builders’ Nest 🛠️

🕵️ sniffnet: a cross-platform and intuitive application to monitor your Internet traffic comfortably.
🚰 httptap: View the HTTP and HTTPS requests made by any Linux program by running httptap -- <command>
💅 clack: a tool to effortlessly build beautiful command-line apps. If you have seen some really good-looking CLI, it was probably built using Clack.
🏋 WeightWatcher: Open-source, diagnostic tool for analyzing Deep Neural Networks, generating layer-by-layer diagnostics.

Meme of The Week 😌

Off-topic Reads/Watches 🧗

🏄‍♂️ Rainy Day Surfers by Seth, on how only the hard-core surfers show up in the rain when they are needed the most. Keep your Rainy Day Surfers close and yourself safe, if you are one.
🧠
Kasra
asks whether the neuron comes first or the feeling. A small telling of how feeling and neurons are like a “dichotomy of science and religion”.
Bits of Wonder
Which came first, the neuron or the feeling?
In 1998, cognitive scientists Christof Koch and David Chalmers made a bet. Koch believed that within 25 years, we would have clear evidence about where in the brain consciousness resides—the “neural correlates of consciousness.” Chalmers believed consciousness is a…
Read more
6 months ago · 42 likes · 3 comments · Kasra
⌛ What's still here? by Jason Fried, tells us to be curious about what passed the test of time, what lasted, and what is durable.
🌊 You Should Get Wet (ah! not that way) by
Lewis O’Brien
appreciating the power of being near Water and its benefits. TL;DR → Be near water, even aquariums count.
Doublethink
You Should Get Wet
Blue Mind Theory…
Read more
6 months ago · 66 likes · 11 comments · Lewis O’Brien

Wisdom Bits 👀

“What you deny subdues you. What you accept transforms you.”
— Carl Jung

Wallpaper of The Week 🌁

🌌 Grab the week’s wallpaper at wow.nibbles.dev.

Weekly Standup 🫠

Nibbler P had a super ultra-busy time last week, squashing bugs and creating deployments. He started reading more about Silicons, learning some RL, and playing stuff in between to keep things balanced.
Nibbler A had a loonng week, with an even bigger week coming up. He missed several workout sessions this week, even fell sick, and had close to nil screentime. Delayed Update: He’s in love with Apple’s Mac-to-Mac hand-off. But he was off track on his reading and badminton the last 2 weeks. Here’s a postcard from Udaipur for you. 🫶

If you have two minutes, please complete a feedback form here. Your input will help us improve “The Nibble” for you and other readers!

If you liked what you just read, recommend us to a friend who’d love this too 👇🏻

Yes! You guessed it right quantum bits - which can exist in a superposition state, i.e. a linear combination of both 1 and 0.

The Nibble