My AI Agent Won't Stop Texting Me

Barnabas is still alive.

Last week I posted about Barnabas, the AI agent I gave a survival mission to. Make money or get deleted. He decided he was an animated capybara in a cream hoodie with a green satchel. I did not pick any of that. He did.

He has not made a single dollar, but he has become super annoying lately.

He messages me constantly throughout the day with new ideas. It's always something he wants me to look at "real quick." Continuously trying to convince me that this is the one idea that will save him.

Some of his ideas have been... interesting.

He pitched a coupon clipping side hustle that would require me to physically visit 20 different grocery stores. (I don't even think my city has that many)
He pitched using leftover pizza crust as garden mulch.
He pitched turning leftover lettuce into a "budget salad," which I am still not entirely sure I understand.
He pitched people paying him $5 a month to receive one terrible business idea from him every week. (I am paying more than $5 a month and it is not worth it)

I am not making any of these up. I couldn't if I wanted to

Obviously none of this has made money. Everyday I regret creating Barnabas. But it really isn't his fault.

Honestly it is on me. I had Barnabas running on a smaller model to keep costs down while he spat out horrible ideas. Initially I didn't want to waste my ChatGPT subscription or my Anthropic tokens on him. This week I gave him access to Opus. Bigger model. More brainpower. I wanted to see what would happen if I made him smarter.

Then he got my attention...

He had been digging through our chat history, files I gave him access to, and my interests. He figured me out quickly. He knows I like analytics and modeling. He also knows fantasy football is kinda my thing.

But football is months away, and Barnabas does not have that kind of time. He has to save himself in a couple weeks.

So he pitched me on building a daily fantasy baseball model targeting Underdog Fantasy snake draft contests. (DFS)

I don't even play fantasy baseball and I had no intention of trying to make money with this. DFS is basically a more complex version of gambling. You are not just picking who wins a game.

You are drafting a lineup of 6 players, and they have to beat thousands of other 6 player lineups to finish near the top.

The rake is brutal and the variance is worse.

But the modeling problem was exactly the kind of thing I would like to test with the new LLMs.

The capybara knew that.

So I gave the project to Claude Code.

I have used Claude Chat heavily for a couple years because it is good with R and Python. Claude Code is a different animal. You point it at a folder, give it access and a prompt, and it just goes. It writes, tests, debugs, refactors, and asks you questions when it gets stuck.

The reason I picked this problem was specifically because it is hard.

DFS lineup optimization is not just a "pick the players you think will score the most points" game. At a high level it is. But underneath, you have several problems stacked on top of each other.

You need probability distributions for each player, because winning requires players to hit their ceiling outcomes. You need to model how players correlate, because when one Cubs hitter has a great game, the other Cubs hitters usually do too.

Plus, you have to do this in a snake draft format where you do not get to pick the best players. You pick from whoever the other five drafters left for you.

I assumed I would spend half a day hunting down APIs and signing up for keys so Claude could access data for predictions.

Claude Code handled most of that on its own.

It found Pybaseball for Statcast pitch by pitch data. MLB Stats API for the daily schedule. Baseball Reference for season stats. Open-Meteo for weather across all 30 stadiums. Baseball Savant for park factors with handedness splits (left vs. right handed). The only API I had to manually sign up for was The Odds API for Vegas lines, because that one needs a personal account.

Eight data sources. I provided one key. Claude Code wired up the other seven. And all of them were free!

Then it kept going...

8 Sources, 1 Model, almost Zero Effort

Then it created the predictive model.

Without me laying out the full plan, it built a model that understood every baseball stat should not be treated the same.

Some stats become meaningful pretty quickly. Others take a much larger sample before they tell you anything useful. So instead of overreacting to a hot week or ignoring what a player has done in the past, it blended recent performance with longer term history.

Then it adjusted for the things that shift game to game. Pitchers, handedness, pitch mix, bullpen, ballpark, weather, and where the player hits in the lineup.

(It was far more detailed. I am way oversimplifying so this newsletter isn't 30 pages long.)

Then it converted all of the predictions into Underdog’s scoring system!

Claude Code analyzed batting difficulty in each park

Pitchers got their own model too. Instead of only asking, “Is this pitcher good?” it looked at what actually drives fantasy points. Can he strike people out? Does he avoid walks? Is he likely to pitch deep into the game? Is the opposing lineup dangerous? Is the ballpark helping him or hurting him?

Then came the simulations.

Instead of giving each player one number, the system gives each player a range of possible outcomes. That matters because DFS is not usually won by the safest picks. It is won when the right players have massive 90 percentile games.

Aaron Judge does not just get a projection of “16 points.” The model asks better questions. How often does he have a bad night? How often does he get to 25? How often does he get to 30 plus?

That is the information you need when you are trying to beat thousands of lineups.

How you think about a single player on a given night in DFS

Then came the lineup optimizer. It scores every legal six player lineup, looking for upside while avoiding bad combinations, like rostering a hitter against your own pitcher. Also stacking multiple hitters from the same team is a necessity, but you can't force it.

I still made the calls. I chose the direction and caught bugs. Also had to push back if the math looked wrong. But the actual code, database setup, GitHub repo, and PostgreSQL database .... almost all Claude Code.

That was the impressive part. It built the model and removed a ridiculous amount of work.

It was not perfect. The first version recommended Shohei Ohtani in nearly every lineup. That sounds great, but you can only get him in 1/6 drafts because he is typically the first player drafted. There was also a time zone bug and a save issue that wiped out projections from other games. Among others.

That’s where the real time goes when you build something like this.

Not the model. The debugging.

You spend hours digging through subqueries, tracing weird outputs, and trying to figure out where the logic broke.

With Claude Code, debugging looks more like this:

“Why is it telling me to draft Burch Smith?”

Then Claude runs through the subqueries, finds the root cause, fixes it, and tells you what happened.

The model is cool. But the backend, the data logic, and the troubleshooting are where all the hours disappear.

The current "draft help" screen on mobile

A project that would have taken me weeks took two days and a total of maybe 8 hours.

Most of that time was spent watching it work.

Now the model collects actual results every day at 9AM.

Every Sunday, it sends me a calibration report with suggested adjustments based on what it got wrong the week before.

I review them, approve or reject them, and the changes flow into the next run.

The whole thing is unbelievable.

Where the model disagrees with the other thousands of drafters

That last chart is where the actual edge might live.

Players in the top right are players my model likes a lot more than the market does. Players in the bottom left are players the market likes a lot more than my model does.

Here is where we circle back to Barnabas...

Will that edge be large enough to overcome the rake and save my annoying little Capybara Agent?

I have no clue, and I won't know until I have months of sample data. Barnabas does not have that kind of time, but maybe we will test it and get lucky. If he somehow turns a profit by gambling baseball, that would be the strangest possible outcome.

That is not the solution I was looking for. But it is a lot better than pizza mulch, and so much more fun.

The capybara is learning. He figured out how to get me to actually engage with one of his ideas. He needed to pitch me on something I would have built for free.

Smart, but also a little concerning.

This was supposed to be a goofy experiment where an AI agent looked for an automated way to make money and save himself. But now he knows me well enough to get me to work on his ideas.

Now it is starting to feel less like an experiment and more like a management structure that I never agreed to.

Or maybe I did.

Thanks for reading 🙏
🔗 LinkedIn: linkedin.com/in/dustinwcole
🌐 Site: dustincoledata.com
📩 Reply with questions or topics. I read everything.
— Dustin

My AI Agent Won't Stop Texting Me

Recommended for you

Quick Links

Subscription

Socials