# 359: Tokenomicon Sounds Metal, but it’s Just Cloud Budgets
Duration: 97 minutes
Speakers: Justin, Matt Kohn
Date: 2026-06-26

## Transcript

[00:07] Justin: Welcome to The Cloud Pod, where the forecast is always cloudy.

[00:10] Matt Kohn: We talk weekly about all things AWS, GCP, and Azure.

[00:14] Justin: We are your hosts, Justin, Jonathan, Ryan, and Matt.

[00:18] Matt Kohn: Episode 359, recorded for June 16th, 2026. Tokenomicon sounds metal, but it's just cloud budgets. Good. I keep wanting to say that, like, you know, death clock, you know, I was, it got stuck in my head when I heard Tokenomicon, which is the, we'll talk about this in a minute, but the new tokenomics, uh, foundation conference. Uh, all I can hear is feminomonon. Uh, so I've been trying to work on a Suno song to mimic it, but it's, it's a bit hard to do. So that's all I can hear is, uh, Phenomenon. Oh, right here, Tokenomicon. So how are you?

[00:57] Justin: I'm doing well, how are you?

[00:58] Matt Kohn: Yeah, you know, I'm in beautiful, uh, downtown San Francisco at the Moscone Center, which is my least favorite conference center, at the Databricks conference, uh, being wowed about ML and AI. And if you can imagine, uh, Databricks announced a bunch of AI stuff.

[01:12] Justin: No way!

[01:13] Matt Kohn: I know, shocking, shocker. Uh, we'll talk about that next week, uh, because we're mid— this is day one, and then tomorrow's another keynote tomorrow. Uh, but, uh, I'm here meeting with our reps and doing a bunch of things here at Databricks, uh, which I did not tell anybody in advance, so I don't have stickers or anything because I wasn't prepared.

[01:29] Justin: But, uh, it's all right.

[01:31] Matt Kohn: Yeah. Uh, but, uh, yeah, I'll let you know how this conference is. But, uh, I always remember after I come here how much I dislike Moscone Center as a conference. Yeah.

[01:42] Justin: Yeah. It's not the greatest, uh, conference space.

[01:45] Matt Kohn: And people are always like, oh, you're staying at a hotel? I'm like, yeah. Because getting from the East Bay to San Francisco is a nightmare.

[01:50] Justin: Oh yeah. No, it's, it's impossible. To get there on time and on a schedule.

[01:55] Matt Kohn: Like it's, yeah. Well, I'm thinking the start of the conference at 8:00 AM. I'm like, oh yeah, that's not gonna happen.

[01:59] Justin: Yeah. You'd have to, you know, plan to be there by 6:30 and then, yeah, exactly.

[02:03] Matt Kohn: It's like, so I had to get a BART or drive here and that's like a 5:30 departure and I just like, yeah, I'm not gonna do any of that.

[02:10] Justin: Yeah. There's nowhere to park, you know, or it's really bloody expensive.

[02:15] Matt Kohn: Yeah, exactly. Like there's no, there's no good scenario to have a conference in San Francisco. Like it'd be like having a conference at the Javits Center in New York. Like, no one does that either. It's crazy. Like, local stuff makes sense, like, but let's have a big conference where you're flying people in. It's always crazy to me. That's why I was glad Google Next got moved to Las Vegas.

[02:34] Justin: Me too.

[02:35] Matt Kohn: All right, well, we have got a bunch of news this week, uh, as usual, uh, but Anthropic's gonna start us out. So they released— we talked about it last week on the show— Fable and Mythos. And one of the things we talked about briefly, we, we probably didn't talk about enough, was that if you're using Fable or Mythos and you are writing something that would track, you know, trigger basically their, their system for like you're trying to use Fable for cybersecurity, that they would log all of that data and to them so they could basically validate what you're using the model for. And this is, they said this was basically to prevent you from using Fable or Mythos to write your own AI LLM because that's one of the big problems like Deepseek was accused of, of using OpenAI to basically build their model. But that has caused a lot of people to be very upset about their data in particular, because there's no way to turn that off or to know when it does happen. I think it does alert you that it happened, but you can't do anything once it happens. And so several companies announced publicly that they were going to restrict access to Fable for that exact reason, including Microsoft, which restricted Cloud Fable 5 from its internal GitHub Copilot model picker, even though the model is available to external GitHub Copilot and Azure founder customers. And the core issue, of course, is that it requires data retention to power Anthropic's new safety classifiers, meaning prompts and outputs are stored for 30 days by default and up to 2 years if flagged for policy violations, creating a meaningful conflict with enterprise data handling expectations from Microsoft, which was a big deal. And so this took up a lot of the air for, you know, the 3 days after we recorded the podcast.

[04:04] Justin: Yeah.

[04:05] Matt Kohn: And then Friday afternoon happened and something else happened, which is that the government basically issued an export control directive requiring Anthropic to immediately suspend banned access to Fable-5 and Mythos-5 for all foreign nationals, forcing a full customer shutdown to ensure compliance. This is because Anthropic has a lot of non-US nationals who work for them, so they would be not even able to use their own models for the company they work for because of this edict. So the only way to do it was to shut it down. Government's concerns, uh, centered on a reported narrow non-universal jailbreak involving asking the model to read a codebase and identify software flaws. Anthropic reviewed the technique and found the capability level is already available in other publicly deployed models, including OpenAI's GPT-5.5. Anthropic's defense-in-depth strategy for Halo 5 included thousands of hours of red teaming with the US government, UK, OECD group, and the third-party organizations, plus a mandatory 30-day customer data retention policy specifically to detect and mitigate jailbreaking attempts. Anthropic is fully complying with the directive but disagrees publicly with the government. And so basically they've been debating this over the weekend. It's now Tuesday. It's still disabled, has a big prompt when you're in the application that is not available to you with the link to this article. Uh, and basically, uh, you know, the government sounds like they overreacted as they like to do in this era, and Anthropic doesn't agree, and they're both going back and forth, and hopefully they get it back. That's the goal. Yeah.

[05:29] Justin: I mean, it, I've, you know, it's, it's hard to know for certain, but I was reading an article on Ars Technica about, you know, what the jailbreak, quote unquote, was, and it was like, it, it sort of wasn't a jailbreak at all. You couldn't really get it to do anything. You could just, what was it like? You can, you could have it report something in a specific way, but it wasn't really like you couldn't change it. You couldn't get it to ignore its safety instructions. And, and assume, you know, presumably, um, you know, all those actions are logged as well. I thought the logging was more for like trying to detect people using fable for like building their own sort of botnets or malware exploiters or using it for those types of purposes.

[06:07] Matt Kohn: Yeah, you're asking for cybersecurity things or like, yeah, create a bot network, like all those things would get logged and then potentially put into the 2-year, uh, transition. But I think only about 5% of the things would end up in that camp anyways from their testing. But you know, again, if you're the government and you're still using Anthropic and that data is being stored, is that a risk to the administration if you're doing something you shouldn't be doing it?

[06:32] Justin: Yeah. Yeah, it's definitely, uh, interesting, right? The, the idea that they're, you know, storing all that data and have access to, you know, or, you know, in a way that's kind of searchable long-term. It's already risky, you know, having, you know, data sent to, uh, AIs, you know, it's, some of it can be, you know, very sensitive in nature and But I guess it's true for, for anything kind of freaky.

[06:57] Matt Kohn: I mean, again, I think it's one of these areas you have to be— they need to be more transparent about what they are storing, what is, you know, how is this logging? Because I've seen the logging that this model— yeah, you and I both have the data, and so it's not great in regards to what these models do. And so I, I was curious what they actually were storing, but that they were going to violate basically what people were saying is you know, things they specifically chose enterprise contracts to avoid is like you, you using the data for any purposes. Now again, I think what people are signing is like saying I'm not going to use my data to train your model.

[07:32] Justin: Mhm. But that isn't how that contract's worded. But, uh, it is— that is sort of the intent of that, asking for that clause for sure.

[07:39] Matt Kohn: Yeah. But yeah, the reality is that doesn't keep them from sending telemetry, sending other logging data, like other high-level details of things. Like they could They could, you know, do all kinds of things as long as they don't use it to train a model.

[07:51] Justin: Yeah. I mean, and it's, you know, it's, it's a lot like any other privacy laws when, you know, like, yeah, in California, the right, right to be deleted and all those things. Like there, there's always this caveat of like, yeah, but anonymized data or data of general usage patterns and stuff, it doesn't really get lost. And is it truly anonymous is always the question. Eh, I mean, it's. Privacy is definitely taking a hit with technology, so it's what do you do?

[08:17] Matt Kohn: Yes, and continues to take a hit.

[08:19] Justin: Yeah.

[08:20] Matt Kohn: Well, uh, in the midst of all of that, uh, Sachin Dalla also had a pretty lengthy essay that he posted on Twitter and basically arguing that frontier AI models without surrounding ecosystem is inherently unstable, suggesting Microsoft's strategic focus remains on platform and developer ecosystem building rather than standalone model capabilities. The core argument position that AI transition is a distinct from previous platform shifts like mobile or cloud, where digital systems augmented human work rather than potentially replacing organizational structures and firm boundaries. This framing matters because it signals where Microsoft is likely to invest, especially in tooling, APIs, and developer infrastructure that ties AI capabilities into existing enterprise workflows rather than completely, uh, completely, or sorry, competing purely on model benchmarks. And the post generated substantial engagement with 35,000 reposts and 49,000 likes on the Twitter. Or X as we like to call it these days. And, uh, I just thought it was a little bit interesting, his take on it, because, uh, definitely everyone is sort of reassessing what work is going to be in the future.

[09:18] Justin: Yeah, I mean, I've long sort of held this, and, uh, you know, in, in, at the day job, this is like one of my internal rants, uh, is because AI is not particularly useful unless you give it access to data. And you can't give a frontier model access to data, right? You can train it on data, which, and build your own model, or, you know, like, you can, but you need some sort of mechanism or platform or set up to, you know, to set up the RAG, to set up any kind of localization or custom, you know, grounding. Like you can't just put it in there, you know, directly. And so your platform becomes key to that, right? How, how is that mechanism working? Are you do, are you just building your own on, you know, something like Vertex AI or are you using sort of a more turnkey solution that's like an enterprise tool? And so like, it's super important for that. And it's also how you get all the visibility and, and access around it as well. So I couldn't agree more with his take on this. And you know, it's one of those things that I wish I was smart enough to generate, to publicly post on these things before, you know, because I, you know, have these ideas, but I can't speak eloquently or communicate ideas well. Why am I on this podcast again?

[10:27] Matt Kohn: That's one of the things that I sort of think about a lot these days in the economy. It's like, you know, these, there's these people who are getting very rich, and one of them got very rich on Friday when, with an IPO. Mm-hmm. You know, people are like, oh, well, you know, you can't take it with you and all that. I'm like, yeah, yeah, it's all fine. Like, like those are all valid things too. Like you should give back and those are important things, but it's really about the way the economy works. And if you don't have consumption, you know, half of the economy's based on consumption. So if you don't have consumers consuming, then the economy can't work. And then the rich people, lose money. And so all this race to like replace people with AI, it's like, well, but if you don't replace the way for them to make money to continue to consume, then therefore the whole economy falls off the wheel. You know, the wheels fall right off this train. And I think that's the part that a lot of people are sort of missing is that if you don't, if you don't provide a way for them to either get earnings or have a way to, you know, earn money to consume, then there is no economy. There is no system. And then you get into a whole, then you get into a whole bunch of thing and this is where, you know, all the politics are playing into it. It's like, well, do you have universal incomes? Do you have, you know, other things? And like it, this whole idea of like, is work really the future of what humanity does? Or do we look for other things? But then like, how does that change entire power structures of the world? Like, I mean, it's, AI is highly disruptive in lots of ways. And this is, oh yeah, I think the Pope's convers— you know, the Pope's memo and, and stuff about this is about, you know, like you, You're playing with fire in a way that could be very dangerous to a lot of things. And maybe it is time to break the wheel and to not have this terrible, you know, capitalism as a major driver of our world. And maybe things would be better if we didn't have capitalism, but the transition's gonna be tough.

[12:18] Justin: It's, yeah, super rough. And you know, like it's, I go back and forth on like, this isn't gonna actually replace anything. Like there's, there's a lot of stuff that it'll disrupt certain tech industries, but I don't think it's gonna like, you know, everyone, it's, I don't think it's gonna get big enough where we're talking about, you know, any kind of guaranteed income to offset the amount of work. But I do think it will have like lots of impact, you know, like if you think about like the status of Detroit after, you know, manufacturing for auto workers and, you know, I think that those types of impacts are very likely, if not already happening. So it's sort of this double-edged sword of like, you know, are you optimistic about this or pessimistic? And if you're pessimistic, do we still give it a shot in terms of like evolution and growing? Or is it like, do we need to like literally pull the chain on the assembly line and stop everything? And, you know, it was a little worrisome when Anthropic posted that, you know, memo saying we should stop, you know, furthering AI. Cause you're like, eh, if they're pulling this out, like that's scary.

[13:20] Matt Kohn: Yeah, but the problem is I don't, I don't believe that. They're so disingenuous about stuff.

[13:24] Justin: I don't either.

[13:25] Matt Kohn: Yeah. You know, well, Mythos thing was like really like, you know, you guys are out here styling a lot of stuff and like, yes, the model's good. It does a lot of good stuff. I, and look, I played with Fable a lot last week.

[13:35] Justin: Did you?

[13:35] Matt Kohn: Free. Yeah, it was free.

[13:36] Justin: I never touched it.

[13:37] Matt Kohn: Yeah. So like it, it solved some stuff I was fighting at the moment and it gave me some really good ideas. I had to like solve some different things in code I was working on and like, it was impressive. Like Would I be willing to pay the top dollar for it? No.

[13:49] Justin: Yeah. I mean, I wasn't even willing to pay for 4.0, Opus 4.7.

[13:53] Matt Kohn: I mean, I'm, I've, I've actually ended up moving more and more to Opus for at least the control layer. Mm-hmm. The execution agents are still Sonnet, but like the, the context layer that talks to me is now more and more Opus.

[14:03] Justin: Really? Wow.

[14:04] Matt Kohn: And it hasn't been as expensive as I thought it would be cuz the, the output is so much better that I don't have to do as many re-repeat loops.

[14:11] Justin: So it's pretty nice.

[14:14] Matt Kohn: But, you know, I, I definitely, you know, the fable cost is pretty high. And so I don't know, you know, if, if inference costs come down, then that gets more interesting. But, you know, we are gonna get to the point where these things are going to displace jobs. Like, I just, there's no way it won't. And, and in some places, like some industries, you know, the argument is like, well, those, we're losing those people anyways. You know, people aren't interested in doing that job anymore. And so AI will help us. It's like, yes, you're 100% correct. And then other places, It's like, you know, engineering, engineering drives this entire valley. There's an entire conference of data engineers here who are basically killing their own job by using Databricks and building models to basically do all these things. And it's like, those people get paid well and they do well and they are smart people and, you know, like it's at risk and where does it go? I don't know. Yeah. I mean, it's bigger than my pay grade, bigger than my ability to influence, but I definitely spent a lot of time thinking about this problem. Especially I have kids who are about to go to college and the question of like, do you send them to college? Will they have the ability to pay student loans back and, and things because there won't be jobs when they're done in 4 years? Like, I don't know. It's just so hard right now.

[15:24] Justin: Yeah, it's definitely, I mean, it's dinnertime conversation at the Lukas household as well. Like, and it's cuz you know, one of my sons is, you know, debating back and forth. He's in high school and he's like, does he go into computers like he originally intended?

[15:34] Matt Kohn: And I'm like, I don't know if that's a safe bet right now.

[15:36] Justin: Like, you know, and he, he looks at me, he was like, oh, what do you expect me to do, work for a living? And I'm like, fair point. But you know, like, that's, it's just tricky. Future is very uncertain. You don't know exactly what's the impact, you know, but you know, there's going to be one, one way or the other. Like, I don't think we can put the genie back in the bottle with AI, but we could sort of stop, you know, like making it better where it's gonna take everyone's job, I guess, you know, like trying to use the tool. As it is, and maybe the market will take care of it, right? It's already, you know, Fable's already very expensive. Maybe these models are too expensive to run. I don't know. It's interesting, like, you know, getting rid of, maybe it's time to get rid of all capitalism, but maybe capitalism will actually, you know, sort of put the brakes on these things too. I don't know.

[16:25] Matt Kohn: I mean, I always sort of assumed by this point this year we'd be into a recession and I still think it's possible that we'll be in a major recession by the end of this year.

[16:34] Justin: I'm, I still think it's coming.

[16:35] Matt Kohn: Yeah. I ran more, I thought I was gonna accelerate it, which is why I thought we'd already be there, but I still think it's coming and I think that this is gonna be the moment where a lot of the stuff's gonna have to be figured out because it's gonna be a major correction. 'Cause I mean, the, the reality is the economy is like 20 companies moving money between each other on, like, it's like consumers, consumer ability to buy is down significantly. You're seeing it in travel already. You're seeing it in cars, you're seeing it in lease specials. Like it's starting feel very much like 2008 all over again, where like, if you have money and you have the means, you can get some really good deals, but if you don't, you are hosed.

[17:15] Justin: Yeah. No, I mean, this is, you know, the working class has all been eroded, you know, before this, and now, you know, like it's going to be even worse because you're gonna have not only just like these blue-collar, highly, you know, hands-on jobs, you're gonna have a whole bunch of like white-collar office jobs are gone now too. So it's gonna be, very, uh, problematic to add on to that or an existing problem.

[17:37] Matt Kohn: Well, that's a, I mean, that's a problem you have when you're in conversation with your high schooler is, okay, you can go into blue collar work, like you, which, you know, my kids are not made to do that. They don't wanna do that. They've, you know, I wasn't a mechanic. Yes, we've ran ethernet around my house. Yes, we've done, you know, little house projects here and there, but like nothing to like fix the HVAC unit or do plumbing at that level or, you know, build stuff. And so like, okay, so you can go into that, which is cool, like if they're interested in that, but then robots are coming. Yeah. So then how long are the blue collar jobs even viable? Yeah.

[18:10] Justin: I mean, it's, it, yeah, it's, it is tricky there cuz it is, if it, if you fully have like a robotic workforce where your house is being built by an army of robots that's doing all the plumbing and electrical, like it does really change the dynamic to to what jobs are out there. Like what, what are the ideas? Like, does it just become an industry where you're servicing the robot or building the robots or, you know, but once you have the robots building houses and stuff, why couldn't they do that themselves?

[18:36] Matt Kohn: Like it, I don't know.

[18:38] Justin: That's where I think you get into universal income and that kind of thing. It's like it really is over at that point.

[18:43] Matt Kohn: Or a choice we've made, which I mean isn't bad either as long as people have the means to live, right? That's the key thing.

[18:50] Justin: Yeah, I think that's the whole problem. It's hard to see that world because I don't think that's going to be the reality. There's going to be a top layer and then there's going to be the universal income. Like, everyone who's dependent on it is going to be— it's going to— like, we can't even get Social Security to work. Like, what, what hope is there for universal income?

[19:06] Matt Kohn: Yeah. All right, well, enough doom and gloom on this story. Yeah, uh, FinOps X happened last week. Uh, I did not go this year. I kind of missed it. It being more FOMO than I realized, uh, because I've been to all, all of them except for the first one, which was in Austin. And then this last one apparently, and it is the last FinOps X ever. What?

[19:25] Justin: Oh, I didn't know that.

[19:26] Matt Kohn: Yeah.

[19:27] Justin: Yeah.

[19:27] Matt Kohn: Well, it's because of what they announced. So basically the FinOps Foundation and the Linux Foundation have announced an intent to create a sister organization called the Tokenomics Foundation, which I snickered at quite heavily because the token that I'm aware of is not And this is being backed up by Oracle, Google, Microsoft, IBM, JPMorgan Chase, and others to create open standards for AI billing and token cost attribution. Tokens are being positioned as the atomic unit of AI spend, and the core challenge finance practitioners face is answering two questions: what does AI actually cost, and how do you measure the value of that intelligence produced? And there's been several announcements around AWS announcing some features, including target coverage for savings plans, granular cost attribution for Bedrock, AI-powered FinOps agents all last week. And then, uh, you know, basically the FinOps Foundation said, uh, they are going to have the Tokenomics Conference next year instead of FinOps X, and it'll be a combination of tokenomics and FinOps. Well, it's the FinOps X Conference as you know it is dead, but the new Tokenomics Conference is the new replacement for it next June. Um, and so that's basically what's happening, but they are sister organizations. I don't really understand logistics of some of those things. Like, why wouldn't you just change the charter of the FinOps Foundation to include the token stuff, right?

[20:47] Justin: Like, it's, it sounds like a big pivot that's unnecessary, right? Like, we're just gonna go all in and in AI, we're gonna focus on AI, we're only gonna talk about AI and all of the work to do like cost optimization on compute and, and infrastructure as a whole, like, meh, boring. That's old.

[21:03] Matt Kohn: Yeah. So it's just, it's a little weird. I mean, I like, I like it. I think it's a good move to have that pillar, but to me it's just a pillar inside of FinOps.

[21:11] Justin: So I, it's, I mean, it's needed, right? There's, you have to have the visibility and that someone focused on driving it down. But yeah, why create itself?

[21:21] Matt Kohn: Yeah. We'll see how, we'll see how that evolves and maybe things will make more sense next year when we get to the Tokenomic conference in San Diego next year. But, Definitely, if you want to go to the first one of those ever, next year is your year to do it. A couple other things that were announced. Focused 1.4 is coming, which is great news. Microsoft also confirmed their plans to support Focused 1.4 and highlighted Fabric and Foundry as tools for unifying data and AI cost management activities. There is new updates to the certification path with a new technology value potential covering public cloud, SaaS data platforms, and data centers. I assume it'll be a new tokenomics certifications coming later this year.

[21:58] Justin: Yeah, and validating all your existing— exactly.

[22:03] Matt Kohn: You know, they did build out a crawl, walk, run maturity model for GenTech FinOps. Uh, so, you know, if you're looking at how do you do autonomous cost management as a structured path, they have some ideas on that now that are coming out that'll probably evolve a couple times as they typically do. Google Cloud presented a new AI explainability agent alongside automated spend caps and full-stack AI cost visibility and cost reports, presenting a shift from post-billing reactive alerts towards proactive cost control for AI workloads. And then there was some interesting customer pieces from Pinterest and a couple others, uh, going over how they're doing tokenomic, uh, things with a layer cake model that Pinterest has. Uh, but overall, uh, I think it was a pretty good conference. I had, we did have employees there. I talked to them. They said it was good. It was really interesting. AI was everywhere as all conferences have been taken over by AI, of course. Um, and then there was a lot of nervousness about some of the announcements around tokenomics and like, is FinOps dead? And. It sounds like there's a lot of reassurance happening, but in my mind, it's probably also the Linux Foundation saying, well, if we create a new foundation and we make all these people pay for membership in both, we make more money.

[23:05] Justin: Oh, geez. I hope that's not it.

[23:07] Matt Kohn: I don't know if that's true or not. And, uh, that just, that was kind of my, my cynical side of it is you didn't create a new pillar because then people wouldn't have paid more money to do this and you needed more revenue. That's my guess. But yeah. So anyways, their sister organization spinoff, EKS Foundation and the tokenomics, uh, or tokenomics, uh, I can't even say it.

[23:26] Justin: That might kill it alone if, if everyone has this problem.

[23:29] Matt Kohn: I mean, I wouldn't be shocked if it can get rebranded later. Cause like, I don't know if it's the right name. I, I also don't know the tokens is gonna last forever. I mean, it's lasted longer than I thought it was going to, but I think it'll, there's a lot of pressure for people to like explain tokens and how they're measured and like once you try to do that, it's very difficult.

[23:47] Justin: And well, I mean, even if you can sort of like agree on like what a token is by definition, you don't have control over what you're billed for, right? Like it's not straight consumption.

[23:57] Matt Kohn: Yeah. Well, it's, it's an input and output token, so it's like, well, I input 25 million things and I outputted 5 million things and it's like, and that's a, there's a difference in pricing for those two things. So it's, it's sort of hard. Yeah.

[24:08] Justin: And even the input tokens, like you're not in full control of your context window, right?

[24:12] Matt Kohn: No, because the systems themselves put a ton of data into— like, if you ever look at the actual thing that's sending to the model, it's like, wow, it's huge. Yeah, like 250,000 context window, half of it— it had to go to a million times, half of it was being used by freaking Claude itself. Yeah, do the turns.

[24:30] Justin: Yeah, once you look at debug logs for, for this stuff, like, you're like— it, it is eye-opening for sure how much instruction is going back and forth.

[24:37] Matt Kohn: Well, let's, uh, move on to AI is how machine learning makes money and, uh, Machine learning is making a lot of money here at the EventBridge conference this week. But OpenAI is making or losing money in a different way. They're buying something with the— buying ONA, a cloud execution platform that has served 2 million developers to secure, reproducible cloud environments to expand the Codex agentic coding ecosystem. The core technical addition is ONA's customer-controlled execution model, which allows agents to run persistently inside an organization's own cloud infrastructure rather than being tied to a single device. Or active session. Codif currently has 5 million weekly users, up 400% from earlier this year, and the Azure solution addresses a specific gap where longer-running agent tasks spanning hours or days need persistent session-independent execution environments. For enterprises, Onus Technology provides control over where agents run, credential scoping, activity logging, and workflow review gates, which are requirements organizations need before moving agents from experimentation into production. This is, uh, just more and more agent execution is a thing. I mean, I'd I guess this is nice that there's a third party involved that, you know, if OpenAI ever wants to build something on top of all these data centers they're building for Fargate, this is nice for them. But all the cloud providers are basically giving you this too. So it's sort of interesting.

[25:48] Justin: Yeah, I, I'm waiting for someone to come up with that, like really, like the, the prototyping and development of agents is still sort of wonky and I don't really see a, a robust solution for this. Like people can kind of cobble together. A process that works for them internally. But, you know, AI agents and agentic responses are already hard to like QA and sort of vet in terms of, you know, responses and having your application behavior be consistent from model to model or agent instruction updates even. So it's sort of tricky and I get why they're, you know, sort of moving in that direction. And I still think the powerful bit of this is gonna be the sort runtime layer where you can have agents sort of executing long-term and not tied to your laptop. So it doesn't all stop when you close the lid.

[26:37] Matt Kohn: But yeah, I don't know.

[26:38] Justin: Let's see.

[26:40] Matt Kohn: I mean, I, I don't know where Claude's, you know, Codex stuff runs, you know, like you can run Claude code remotely where it runs it, you know, like certain things run inside their cloud on top of Amazon, I think.

[26:53] Justin: Mm-hmm.

[26:54] Matt Kohn: But yeah, it, it does feel clunky. I, I do agree with you that I feel like it's not super crystal clear of like, where does this agent execute itself? How does it do that? Like, how do you pass in credentials? Like, there's a lot of things that are a little bit, which is why I think Open Claude became such a security nightmare because no one had solved those problems before Open Claude took off. Yeah.

[27:16] Justin: And you know, it's like, it's, you know, it's, it's true, like, you know, a developer application you know, running in production, right? The OpenClaw, it was great at executing. You got a lot of value about how it was executing against that, and it is that, that runtime level. But yeah, you gotta think through all the layers of, you know, what it takes to run an application secure and safe, right? And uptime. Like, I, I don't know if OpenClaw had any kind of reliability issues, but I imagine at scale it absolutely would, right? Not, not only just the security issues, but can it stay up? Are, are your agent sessions actually you know, working? Do you have visibility into, you know, any kind of like agent execution so that you can tell when they stop, right? Like, none of that exists. So, you know, like, I think platforms like this will sort of start to bridge that gap, and we'll see. Like, maybe this one's a little bit more turnkey than, you know, what you get in like the cloud provider model gardens.

[28:08] Matt Kohn: Yeah, makes sense. All right, uh, Anthropic— or, well, actually, let me take a step back. I don't know if you know this, Ryan, but people don't like AI.

[28:18] Justin: I hear them complain about it mostly on this podcast.

[28:22] Matt Kohn: Yeah, just kidding. Uh, but also, you know, there's, uh, you know, things happening right now. People are graduating from college and they're having, you know, founders of some of these AI companies come and talk to them and they're being booed. And there's, there's a general sentiment in the world that AI is bad for us. Um, and so Anthropic, I think, probably trying to hopefully You know, get data to dispel this notion, uh, surveyed nearly 52,000 Americans in late 2025 to establish a public baseline on AI attitudes, finding that 64% feared job displacement and 56% fear cognitive dependency, with both concerns dropping notably among daily AI users, only 54% and 46% respectively, which is hilarious because like the more I learned about, the more I'm concerned about both those two things.

[29:03] Justin: Yeah. I mean, I have a complete opposite. Like those concerns are growing in me the more I use AI. Exactly. Like, I am getting dumber.

[29:13] Matt Kohn: Yeah, exactly. Only 15% of Americans trust AI companies make decisions about AI development, the lowest of any institution tested, falling below federal government at 20% and well below independent experts at 43%. I mean, like, what's it compared to Facebook? Because if it's— people trust it less than Meta, then you know you're screwed. Yeah. Support for government AI regulation reached 71% overall and was bipartisan, with 79% of Democrats and 68% of Republicans in favor. I mean, to get 68% of Republicans being for, uh, regulation, for, yeah, oversight and regulation, like, ooh, that's pretty good. Privacy, child safety, and liability for harm were the top areas where Americans wanted regulatory actions. The survey found that daily AI users supported government oversight at nearly the same rate as the general public, 74% versus 71%, suggesting the hands-on experience that AI does not reduce appetite for accountability and regulation. Yeah, this is my point. I think we need more Yeah, and Anthropic is apparently pairing this public server data with its Anthropic Economic Index and the 81,000-person Claude user interview study to build a multi-source picture of AI adoption, which the company says will inform its policy frameworks around mandatory safety testing and worker displacement support.

[30:19] Justin: I really got to give it for Anthropic for doing this research because they, you know, like, I don't know, like, I have a lot of anecdotal, you know, experiences talking to friends, my own use, and that kind of thing. I think it's nice to have sort of data Even if it's abstracted and even if it's loose and I don't know sort of the survey content, which can, you know, tailor things, I still think it's important to have these sort of, you know, I guess it's just data baselines that you can use just to make sure that you're not living in an echo chamber. 'Cause I think that, you know, in tech right now, there's sort of this definitely like a split between people that are like, you know, all the AI code generated is full of bloat and not really efficient and Is it gonna, you know, over time, is it gonna be easy enough to run? And then you've got the other half, which is like, I can make it do anything. And I've, you know, it's, I think that there, those are both true. I think outside of tech is where it gets a lot more fuzzy. You know, like I've got, uh, I have a friend who's a comedian and a writer and he is very much anti-AI. I think it's just a, a thief and has stopped using, like he stopped using Google entirely because of its search and, you know, is basically trying to vote with his feet. And it's, you know, like I, I get it, it's hard, but you know, like I think that trying to make sure that you have a broader exposure to how people are feeling and how people are using it is important because I think that especially when you start talking about regulation, you don't want to regulate it for a specific industry or, or type of person.

[31:46] Matt Kohn: Yeah. A comedian, right? Or an author or a, you know, like one of the things I have, I find fun to do is make songs with Suno and, you know, but like also like listening to what it produces, it's not bad. I mean, it's amazing. No, 'cause it's me and I'm an idiot and I don't, but like, but like I'm an average person too. So like I'm not a music aficionado, so I'm not gonna hear alliteration or poetry the way other people listen to certain music too, or, or, you know, the certain subtletiness of music. So like for me, and again, probably the mass population. A Bruno song can probably scratch the itch of whatever you're trying to listen to if it does the right thing. And so, you know, that puts people who are creative and have needs and, you know, they want to make music and, and have an audience and sell it, they make— it puts that livelihood at risk. And art is another one of those where you're like, you can generate stuff that looks very similar to other artists out there. It's like to the point where I would say, yeah, if this was Renaissance era, that person was was a, you know, an understudy of that artist and was taught by that person. And that's how close their work is. And that's a, that's a risk. And I think it, it, and that's, you know, if you're on the creative side, I think AI, you're, you have a lot of hatred for it. 'Cause especially like in Hollywood where they're making movies, like script writing, like all of that stuff is highly at risk. AI, you know, even visual effects is probably at risk at some point in these things. And so I get it. Like, uh-huh. I know. Like engineering is hard. What I do is hard. What you do, Ryan, is hard. And it takes a lot of brainpower and a lot of thought. So when AI can help you cut through the forest and figure out this problem you've been fighting for the last 4 hours and fix it, it's magic.

[33:26] Justin: Uh-huh. Yeah, it absolutely is.

[33:29] Matt Kohn: But we're not being paid to find the magic, you know, to find that bug and fix it. We're being paid to produce output of a larger system as an engineer. And so for us, it's a tool. And it's just a tool that helps it. It doesn't, as it writes all that code and does all of it and it didn't need me at all, that's where it gets threatening. And so that's where AI is heading for us as engineers, which is where I think we, you and I are nervous about like computing and college degrees. But in the lower areas where it's really disruptive, it's a problem for sure.

[33:57] Justin: Like it's, you know, as we've long complained about like what, how are we gonna have entry-level tech workers, right? It's now it's just gonna expand. You know, you're not gonna have artists, you're not gonna have people that are, you know, learning the basics of accounting, right? Because all of that's gonna be extracted, you know, and moved elsewhere.

[34:15] Matt Kohn: So, well, there'll probably still be artists, but the problem is though, you won't be able to sell your art. People won't buy it.

[34:22] Justin: Oh yeah, people will do art, right? 'Cause of just, you know, 'cause it's fun and it's good. But yeah, it's make a living will be very tricky. Like it's, see numbers published by Spotify for AI-generated music. It's crazy. Like, yeah. And it's like, I don't really understand, like, I guess those streams make money, but like, it's, it sees, it's not art at that point, right? That's a money-making thing and you're not doing it for any of the art. You can't go on tour, you can't share, you know, play in public and share it with others. Like, it's—

[34:51] Matt Kohn: but in that scenario, you're just, you know, basically the way Spotify works is you You know, every person who subscribes to Spotify pays their $20 a month, whatever the price is. I don't know what it is. It's like, pay attention to it. I just pay for it and move on with my life. But basically you, you know, that goes into this big bucket of money and then based on the number of listens that happen to your music, you then get a cut of the big pie of money. And so Taylor Swift, when you have 100 million listens a month, you're gonna get, you know, X millions of dollars out of the Spotify pool every month. So these AI, you know, songs and the numbers that we're talking about, like I think it was like 2 million AI songs are in regular rotation on Spotify right now being played. They potentially can be taking away from the big pool of money so that Claude's cut of her big pool of money all of a sudden goes down dramatically because so much more of the money's going to the AI created things.

[35:42] Justin: Mm-hmm.

[35:43] Matt Kohn: That's the risk.

[35:44] Justin: Yeah, no, absolutely. And I, I don't think the people that are putting out this AI generated music are, I don't think it's consi— would be considered art at that point. And maybe like that, to overgeneralize, I'm sure there are some that are using it like a tool in a sense of they don't know how to play a musical instrument and they have a vision and they're getting it out there. But I think there's a lot of abuse as well.

[36:01] Matt Kohn: Yeah, this is the argument that came up when the drum machine existed. Is the use of a drum machine not artistic? You know, and like this again, this is where all the copyright law, all of the stuff has to be, you know, tried in court. It's going to be an interesting day. So yeah. Courts will be very busy for the next decade. Yeah, for sure. All right, let's keep moving here. Moonshot AI has released KIMI-K2.7-Code, an open-source coding-focused model built on a mixture of experts architecture with 1 trillion total parameters, with 32 billion activated per token, sourcing a 256,000 context window with full weights available on Hugging Face or wherever you get your open models of choice. Benchmark improvements over KIMI-2.6 are notable with gains of 21.8% on KIMI-CodeBench, B2, 11% on ProgramBench, and 31.5% on MLSBench Lite, plus roughly 10% improvement on Anthropic task benchmarks measuring autonomous execution. A key efficiency improvement is a 30% reduction in thinking token usage compared to KEMI 2.6, which translates directly to lower API costs and fast responses without sacrificing benchmark scores, an important consideration for production Anthropic workflows. The model is purpose-built for long-horizon coding tasks like multi-file refactoring and extended debugging sessions., and always runs with thinking model mode enabled, meaning non-thinking requests automatically fall back to KIMI 2.6. Pricing starts at $19 a month through KIMI Code membership with API access per token billing at 95 cents per million input tokens with cache misses dropping to 19 cents on cache hits, making it worth evaluating for teams running high-volume coding agent pipelines. Uh, KIMI, uh, 2.6 is one of my favorite, uh, open-source models for coding and I use it all the time. Along with GLM and, uh, Quen, uh, quite a bit, especially if you're doing basic scripting stuff. It's great. When you get into the heavy duty stuff, that's where I use Opus or I use Sonnet. But, uh, a lot of the, the, you know, foundational coding stuff for like putting walls up or whatever, you can just use one of these open models and they do a pretty good job.

[37:58] Justin: Yeah. I, I still have yet to get set up for, for playing around with those on my own.

[38:04] Matt Kohn: I, I do think it's, Yeah, honestly, Llama is the best way to go.

[38:08] Justin: Yeah.

[38:09] Matt Kohn: You can pay $20 a month for it and you get access to a pretty good number of tokens at this price point. Mm-hmm. Because again, these are so much cheaper. Uh, and so that's how I started. And then now I pay for a higher paid plan on Ollama. Not that I, I mean, I was using it sometimes, but not all the time. And yeah, like Bolt is pretty much all written in KEME 2.6 or today. I don't, I don't really. Cause I don't get paid to use those properties. So day job, I'm using a lot of Claude. My other personal projects that make revenue, like The Cloud Pod, I could use this, you know, but I try to keep those costs down because that would just be silly. I'm paying outta pocket. Right. So that's where I really like to use the open models in particular.

[38:49] Justin: Yeah. For, and it's, it's interesting, the subscription model, cuz it's sort of a, you know, like if you have like a $20 subscription to everything, it's gonna get a little pricey, but Well, I mean, I already have, yeah, I already have a bunch of subscriptions.

[39:02] Matt Kohn: I, I stopped paying for OpenAI a while ago. I'm like on the lowest, I don't think I'm paying anything on the free tier for that. Gemini I get because of The Cloud Pod uses Google Workspace, so I just get it through a Google Workspace subscription. And then Claude, I have a personal, that's one I pay personally 'cause I use it all the time. And then, uh, this $20 subscription to LLM. So I'm trying to keep it tight, but yeah, you, uh, you start spending a lot of money if you're not careful on different AI coding tools.

[39:28] Justin: And it just makes me wonder like, where's the break even where I'm just running open models on my own server? And you know, especially even things like Fable where, you know, there's privacy concerns that are coming up, maybe it becomes more of a thing.

[39:38] Matt Kohn: Yeah. I mean, I definitely am very anxious to see kind of some of the hardware coming out this year. And then with the, you know, some of the new memory factories coming online at the end of the year, I'm hoping memory starts dropping again. Yeah, seriously. And if that happens, then yeah, I, I've definitely, I've specced out a workstation a couple times now to run my own models at my house. Um, but you know, it's, it's not quite at a cost point where it takes a lot of $20 subscriptions to pay for that amount of money.

[40:02] Justin: Sure does. Yeah.

[40:05] Matt Kohn: Uh, Anthropic was supposed to start charging you for token-based billing for Claude agent SDK access, uh, but they have apparently paused that, uh, with the billing change, uh, basically not going to effect on the 15th, uh, with subscribers receiving only a monthly credit equal to the subscription price was what was supposed to happen. Under the current model, Agent SDK usage counts against weekly subscription caps rather than per-token API rates, which analysis suggests can make Claude subscriptions worth many multiples of its cost compared to equivalent API spending. Pause affected third-party apps and programming use by the Claude -p command, meaning developers building automation workflows on top of Claude subscriptions can continue to operate under existing limits for now. So I mean, this was thought to be being put in place to kill people using Claude code and or or all those other, sorry, not Claude Code, but, um, OpenClaw and all those— OpenClaw. Yeah. From basically using your Claude's description because they were burning so many tokens.

[40:57] Justin: Mm-hmm.

[40:58] Matt Kohn: And basically they've kind of backed away from that. So I don't know what that's about exactly, other than it was the middle of the government shutting them down and all this other noise last week. So I think that maybe they just didn't want one more black eye in a week. You know, they could hold this one off for a little bit. They can figure out what they're gonna do.

[41:14] Justin: Smart. Yeah, I imagine there's, you know, real business problems that are, you know, behind these, right? Capacity and cost, I'm sure are factors. I don't think it's all just trying to squeeze every last dollar out of consumers, but I don't know, like I'm not really a big fan of this pricing model just because of the, you know, it'd be like akin to like paying different prices for like accessing a service's API to me. Like I just don't, I don't like it. Right. It's just, so yeah, hopefully they figure out something better instead of just rolling this out as is.

[41:49] Matt Kohn: Yeah, hopefully. Or maybe they're going to come out with their own competitor because everyone else has come out with a competitor to OpenClaw. So maybe they realize, oh no, wait, we need to do is we need to have our own option that people can use instead.

[42:00] Justin: I mean, they kind of do like they've, there's the dispatch and you can, you can sort of talk to it. It's, it's not as open. Where you can't really plug it into like your own, you know, text messages or iMessage and interact with it that way. But through the app, you certainly can, and, uh, it's pretty easy to use. I've used it a couple times, just mostly to play around.

[42:22] Matt Kohn: Yeah, it's definitely a, um, a bit of a pain for sure. All right, well, let's move on to, uh, this— an interesting announcement from Cloudflare. They're launching Application Services for Private Origins in closed beta for enterprise customers, allowing public traffic to reach private applications without exposing those origins to the public internet, public IPs, or inbound firewall rules. The feature works by adding a use-private-routing flag to standard DNS records, which signals CloudFront Proxy to route the final hop through the existing private network connectivity like IPsec, GRE, CNI, or Cloudflare's Mesh, rather than through the public internet. All CloudFront's existing application services, including WAF, bot management, rate limiting, caching, and workers, apply normally to this traffic, meaning private internal APIs and tools get the same security stack as public-facing applications on additional infrastructure. The routing models extend beyond HTTP through Spectrum for TCP and UDP services and worker VPC binding, so databases, SSH endpoints, and AI agents backends on private IPs can all be fronted by Cloudflare without a Load Balancer or connector software on the origin. Cloudflare is targeting general availability later this year, and it stated private-to-private traffic flows is the next milestone where users and services on private networks could reach other private applications the same CloudFront security layer. So I understand what they wanna do with the private-to-private. I can see why you do that. This one feels like, okay, like can't I already sort of do this? 'Cause I have an ALB and I basically say this ALB is restricted to this IP address, which happens to be a CloudFront or whatever my thing is. And now I've basically done the same thing, haven't I? Why is this better, Ryan?

[43:52] Justin: Yeah, I mean, it effectively, it is. If you restrict down, you know, restrict access to your origins to only your WAF, it's kind of the same thing. But it's a, if you move it all private, you have an extra layer of security where you wouldn't be subject to a misconfiguration that, you know, blows away all the firewall rules and allows access to that origin. Or, you know, some, I think in some cases you might be able to give it access to things that might previously been a little bit too risky to move to having a publicly available endpoint. So I think there's definitely a, you know, like a, an opening for this and it is kind of neat. I do like that it, more than just, uh, standard HTTP. I'm a big fan of, you know, having, you know, gateway access to, uh, SSH endpoints for like managing jump hosts and Bastion and having stuff that's, um, quickly set up and taken down so that you can build inside environments, which is neat. But yeah, uh, it's mostly just that, right? Like it's just another layer of security you can add on so that you can mitigate risk. And, you know, you, I think you, Just need to look at all your layers of risk mitigation and see if this is a good fit.

[45:00] Matt Kohn: All right, well, I'll have to check 'em out if I have this use case anytime soon. But definitely was interesting, but also I wasn't sure the value. All right, well last week there was an article in AWS that I was like, well, I haven't really looked at it yet to complain about it properly, but I have now looked at it and so now I probably can complain. So, uh, AWS launched a new Amazon Bedrock console experience built around the Bedrock Mantle endpoint, which supports OpenAI Chat Completions API, OpenAI Responses API, and Anthropic Messages API, making it easier for teams already using those SDKs to route requests through Bedrock without rewriting code. The project-based workflow is the most practical addition here, letting developers group modules, API keys, usage metrics, and code snippets under a single project, which reduces the context switching that typically slows down the evaluation-to-production cycle. Live documentation that auto-populates with your project's model ID, region, endpoint URL, and API is a notable developer experience improvement since you can copy a code snippet directly from the console and run it without manual edits. Console includes direct integration instructions for AI coding agents including Claude Code, Klein, Codex, Cursor, and OpenCode, allowing teams to route those tools through Bedrock using IAM credentials or Bedrock API keys rather than direct vendor endpoints. A new console is available to you in multiple regions. So I will tell you that the Bedrock console today is terrible. It is not great. It is rough around the edges. It is just not great. And it's, it's changed a lot too. Like for a while there, when you wanted to opt into a model, you had to go to the model catalog and you had to opt into it and potentially sign these terms and that, that kind of went away. But you know, there's, there's a lot of things in Bedrock, like testing. That's where the playground lives, the tokenizer, the watermark detection and inference profiles and batch inference and throughput provisioning is there, tuning for models, building models and agents and flows is all there. So there's quite a bit. Is it intuitive? No, like I get what the tools are, but like it definitely takes me a little bit when I get to it to like, oh yeah, what am I trying to do? And we almost spent a lot of money in this, on this console a couple of times by almost turning on something I didn't know what I was doing. And I was like, wait, what is this? This is— person, I was like, oh no, don't do that. That's going to be very expensive. So this new Bedrock console is basically a small subset of the larger Bedrock concept. And again, it's tied to using the— those three models have a very specific endpoint model they use for how Claude code talks to them and how they do things. So it's a stripped down version of the Bedrock console. And I'll tell you that the function of it, it's actually nice because it is stripped down. It is just basic usage of these things. And it's nice because you can. Basically point Claude at this tool, and then I eventually imagine they're going to have a router in here, and you can route traffic to different models based on different rule parameters you want to do, which you can do in the big Bedrock too, but it's complicated to do it because the models all talk different languages. But this is basically a proxy layer in front of, um, any model they want to use. But the UI for this thing, oh my God, is bad. They clearly— the guy, what was, what's that tool that PartyRock? Remember PartyRock?

[48:05] Justin: Amazon PartyRock? Oh yeah.

[48:06] Matt Kohn: Yeah, yeah, yeah. I'm pretty sure whoever wrote PartyRock was on this team because this UI is a more polished version of PartyRock. It is a terrible UI experience and it doesn't look anything like any other AWS console anywhere. It has weird dropdown things. It doesn't work the way that I would think some things are. You can create different project scopes, which then change the entire thing, but you can't jump between them easily because they put it in a weird place. Like it's not great. So I applaud that they're trying some experimentation of a new UI framework, but I mean, just tell ClaudeFrontEnd to like make you a better UI than this because that would be more effective than what they've delivered here on this thing. And you can see the screenshots in the article, which I sadly didn't do last week, which I would've complained about last week, 'cause I could have told you already it was gonna be shit. But, but you know, it's, it very much is exactly what's in the screenshots. It's, it's not much more than that. And, uh, I appreciate simplifying the access to some of these key pieces around the responses API, but there's a lot of work to do.

[49:08] Justin: Yeah. I mean, looking at the screenshots, it makes me think that it is AI generated, right? Cause it's just very basic.

[49:13] Matt Kohn: Oh yeah, it probably is AI generated. They must be using Nova cause it's so dumb. I just, I can't, I mean like use, use Claude's cause Claude has a beautiful front end tool. Like you, I mean, Starting to notice it now all the time. Like, oh, that PowerPoint slide was done by Claude because it has a very distinct look to it.

[49:29] Justin: Uh-huh.

[49:29] Matt Kohn: Like Claude is very good at. Uh, and I've seen multiple website UIs now that have been clearly designed by Claude. It has a very specific look to it and a color palette that it kind of defaults to that I'm like, yeah, that's definitely a Claude-inspired or helped, uh, UI artifact. So yeah, I just, I don't know what they're thinking. That's all.

[49:48] Justin: Yeah. Yeah, I mean, you know, it's, there hasn't been a whole lot of Amazon UI experience over the years that I've liked. So this is not, of course, this one's just another add to the pile. Like, yeah, this sucks too.

[50:03] Matt Kohn: CURR 2.0, or the Cost and Usage Report 2.0, now allows customers to update table configurations like column selection, time granularity, and export format directly through the AWS console or SDK CLI. Eliminating the previous requirement to delete and recreate exports when adopting new features. The change is particularly useful for teams running ETL pipelines against CURR data since they previously had no in-place update path and had to manage export recreation carefully to avoid disrupting the downstream job. The update takes effect on the next scheduled export delivery, so customers should plan schema changes with their data engineering teams to avoid breaking existing cost reporting workflows. And there is no additional charge for this capability since CURR 2.0 exports are priced based on S3 storage and data transfer costs.

[50:42] Justin: It's, man, where were all these custom usage optimizations when I had to generate all the reports?

[50:48] Matt Kohn: Like, all right, the one sign that I do know that Amazon must be using a lot more AI is that they have the number of quality of life improvements that make absolutely zero revenue for them.

[50:57] Justin: Yeah.

[50:58] Matt Kohn: But are just nice. And the things that we've been complaining about for decades are being fixed. Tess tells me they have an AI agent sitting out there going like, oh, there's a backlog here, story. This isn't too complicated. Cause like, a lot of them are always simple things like, I want you to do this thing and it's super simple, but because it's gonna take a developer away from something that's gonna make a bunch of revenue, they're never gonna do it.

[51:17] Justin: Yeah.

[51:17] Matt Kohn: A lot of stuff's getting delivered, which is great. Yeah.

[51:20] Justin: And you know, the, the previous model for updates was always terrible. Like you generated a new report and so you ran it alongside, you hopefully controlled all your data teams to switch over to the new thing where you could then delete the other report. But meanwhile you're paying for double storage and you're causing confusion and It's just more things to, you know, keep in your head. And so I'm really happy to see this quality of life improvement specifically because it bit me and it was a pain in my previous jobs.

[51:47] Matt Kohn: Yeah, I think I played that dance one time. Yeah, that is a pain. Amazon OpenShift Service launches MTP apps for Juntik observability. This lets the AI agent in local IDEs like Cloud Desktop and VS Code investigate incidents using logs, traces, metrics, and alerts stored in OpenShift. Search domains, unless of course the index is 1,000 years behind because none of your data is consistent or some other aspect.

[52:10] Justin: Yeah. Yeah. I mean, the OpenSearch API, if it's anything like the Elasticsearch API, is cumbersome to use. Like, I find it very difficult to sort of query Elasticsearch natively. And then you add all the sort of reliability and performance problems we've had with, you know, data, you know, log ingestion and, and that kind of stuff, which still makes me a little twitchy. Enough time has passed where I can sort of talk about it openly now. But, so I kind of like the idea of having MCP front that and sort of having an easy way to query that data where, you know, I've got more of a, I can just have AI do it for me, you know, kind of response, which is great. But you know, like it doesn't really, it was, you know, see how useful it is. 'Cause it is sort of like, the API is still going to give you a really bad experience on the backend where it's like, nope, just no. Like, oh, you made this query? No. Like, why no? No data? No. Or I didn't form it right?

[53:06] Matt Kohn: No.

[53:07] Justin: Well, we talked about that a couple times about, you know, APIs versus MCPs.

[53:11] Matt Kohn: And typically what I've learned is if I've coded to the API, it's not worth changing to the MCP. But in this case, OpenSearch, I think I would say MCP. Yeah.

[53:21] Justin: Now I definitely, if I was building something, I would just go directly to the MCP just to have that natural language sort of option. But you know, it's, it is a layer of abstraction where you might not get the results you want and it'd be difficult to troubleshoot. We'll see.

[53:37] Matt Kohn: Yeah. You now use AgentEvalKit, an open source Apache 2.0 toolkit to evaluate AI agents by tracing their full execution path, including tool calls and intermediate state rather than just checking the final output quality. It integrates directly with AI coding assistants like Claude Code and Kuro CLI, keeping evaluation inside the development environment. The toolkit organizes evaluation to 6 phases: plan, data, trace, run, eval, and report, with each phase producing artifacts that feed into the next, and developers invoke them through natural language slash commands, the results stored in an eval directory for reuse across evaluation cycles. A travel research agent case study illustrates the practical value, with response quality scores of 83.9% and looked acceptable on the surface, but faithfulness score only 32.3%, revealing the agent was fabricating exchange rates and temperatures whenever tool calls returned empty results. This kind of failure is invisible to output-only testing, and the toolkit supports OpenTelemetry-compatible tracing, integrates with frameworks including Strands, LangGraph, and CrewAI, plus evaluation libraries like DeepEval and Strands eval SDK. For production monitoring beyond pre-deployment testing, it is recommended pairing it with Amazon Bedrock Agent Core Observability and Agent Core Evaluation.

[54:47] Justin: I mean, this is, you know, this goes to, you know, a lot of stuff we were talking at the beginning of the this episode, you know, like it's really difficult to, to vet AI agents, especially if you're embedding them into your production app, right? And so the output only was basically your only option. And so you're starting to see more tools like this, like it's this type of tool where now I get, you know, I've been running debugs on certain agent transactions to see what's in the context window, see what's being processed, what tools are being accessed from under the hood. And, uh, it's, it's very informative. And I think it's really important if you're developing an app because it'll give you an insight into what's going on. And maybe it's something that you can use to signal sort of like a problem or a difference. So it's cool. I'm glad to see changes like this, you know, where it's visibility, it's sort of the infrastructure around developing Agentic applications more than just directly, you know, improvements to AI and, AI routing directly.

[55:47] Matt Kohn: Yep. AWS is announcing AWS Workload Credentials Provider, an open-source client-side tool that automates certificate deployment from ACM and secret caching from Secrets Manager, replacing custom EventBridge automation that customers previously had to build and maintain themselves. The tool is particularly relevant given the CAB4 mandate reducing public certificate lifetimes, which increases the operational burden of certificate rotation at scale. And raises the risk of expiry-related outages. It runs on Windows and Linux with built-in support for Apache and nginx, handling certificate export, file placement, and server reload behavior through simple configuration rather than custom scripting. For secrets management, it maintains full backwards compatibility with the existing Secrets Manager agent, so teams can consolidate both use cases into a single provider without reworking existing integrations. Provider is available now across all AWS regions and works for both AWS and non-AWS workloads. And is open source on GitHub. Yeah.

[56:42] Justin: I mean, by 2029, it's what, a month and a half is the max age, right? And so like a whole bunch of things, like it's great where you're using ACM natively with Amazon, you know, and the Load Balancer and it's just sort of handled. But then there's always the use case where you've got that one, you know, HTTP server running on someone's desk and you, you need the server, what, for whatever reason, you have to have the private side of the cert and you gotta rotate that out. And so like, we definitely have built stuff like this before, right? And so, 'cause it's always kind of tricky to have the rotation and then bounce the process so that it can start using it. So I'm happy to see them, you know, get another quality of life sort of enhancement that they've, they've put out there.

[57:22] Matt Kohn: Yeah. Again, would they have built this before AI? I don't know.

[57:25] Justin: I don't think so. Yeah.

[57:27] Matt Kohn: Well, if you are desperately looking for DevOps, uh, people and you can't find them, Amazon has a DevOps agent that might be able to fit the Build, which now supports custom S3 agents that can run on a schedule with an agent space, enabling teams to automate recurring tasks like daily database health checks or log anomaly detection without manual interventions. The addition of MCP and agent-to-agent protocol support lets developers invoke DevOps agents from tools they already use, including Kiro, Claude, other coding assistants. Teams can now connect their own subagents built with Amazon Bedrock or third-party frameworks by agent-to-agent, effectively extending DevOps agents with custom capabilities rather than being limited to built-in and functionality. Additional updates include incident skip rules, Git-managed skills, persistent memories, human labeling for task quality tracking, and customer-created dashboards, suggesting the service is maturing towards production-grade SRE use cases. Service is now available in 5 additional regions, though pricing details are not in the announcement.

[58:18] Justin: Hmm. Yeah, agents like this can be expensive too, because there's a lot of data, right?

[58:22] Matt Kohn: Log data is a particular one.

[58:24] Justin: Yeah, log and metrics for time series.

[58:26] Matt Kohn: Um, but mostly I do notice that when I If I do give Claude like, hey, I have all these logs, it typically tries to use Python to parse them.

[58:34] Justin: Mm-hmm.

[58:35] Matt Kohn: So it does, it doesn't just go read the logs, which is good.

[58:38] Justin: It does, it does do its best to reduce the context. I, it, I've seen it try very hard.

[58:43] Matt Kohn: Mm-hmm. Cause it knows, it like, I know this is, this is gonna be a terrible thing. So yeah.

[58:48] Justin: I mean, I, this looks super powerful though. Like I, I would love to have, you know, the, this type of thing. You know, back when I was doing more SRE stuff, right? This is exactly a type of thing you would strive to build, right? Some sort of automation or automated thing that's gonna go detect all the problems before they happen and raise them, raise them up. You know, like it's something better than just monitoring alerting that's always gonna, it's always just making noise after the fact. And these are really cool, you know, and putting them directly at the IDE will sort of expose it and make it move it towards the beginning of the process. Um, right at the end, which is great. So as long as these things can, you know, remain affordable, then they're great.

[59:28] Matt Kohn: Affordable is always a question mark.

[59:29] Justin: Uh-huh.

[59:31] Matt Kohn: Yeah, I, I definitely agree with you on that. It's, it's a great idea. Yeah. Just think about the daily checks that you would do, or, you know, you have a customer who you need to do some, you know, scheduled job until co-engineering fixes some defect that you've got. Like, there's just so many opportunities. For how you can use something like this. So, well, a couple interesting choices this week from Amazon. They launched the CloudWatch Query Studio, offering a unified interface for querying and visualizing metrics using either PromQL or Metric Insights SQL from a SQL workspace within the CloudWatch console. Teams managing service accounts across multiple AWS accounts and regions can use per query cross-account and cross-region selectors to correlate metrics like latency and error rates without switching between consoles tools. Visualization options are notably broad, including line, bar, scatter plot, heatmap, histogram, pie, gauge, and everyone's favorite, the number widget. Query Studio integrates with CloudWatch dashboards and supports Grafana imports. I mean, this is a cool feature. I'm glad it exists, but please stop calling everything a studio.

[60:32] Justin: Yeah, what happened in data science where it was like just, you know, like it used to be workbooks, right? Like we used to have like different—

[60:39] Matt Kohn: No, notebooks. There's notebooks. And everyone was like, well, notebooks is a confusing term. So it's like, so everything's gonna be a studio now. It's like, no, please stop.

[60:45] Justin: Yeah, but it's, it is everything. Like it's BigQuery Studio, Apache Studio. Like there's a whole bunch of different ones that are all studio related.

[60:53] Matt Kohn: I mean, Databricks has a studio. Like they're all, they all have them. Yeah. Proliferation of studios. That's all I know. The next one is a little bit interesting. AWS Cost Explorer now retains historical billing data at original AWS billable rates for accounts that are part of a billing group and AWS Billing Conductor or billing transfer, closing a gap that previously cut off access to pre-enrollment cost history. Before this change, accounts mapped to billing groups could only see pro forma rates set by the payer account, making it difficult to compare costs or run reports that span the period before and after joining the billing group. Existing billing connector and billing transfer customers automatically gain access to their historical data with no migration or configuration steps required, which is a practical benefit for teams already managing multi-account environments. This is a perfect example of what I was talking about. This feature would never have been built previously.

[61:38] Justin: Nope.

[61:39] Matt Kohn: Who needs this feature? Yeah.

[61:41] Justin: Yeah. Is, I mean, I'm sure that this will make some report generations more accurate, right? Like, 'cause it's, I, I hadn't even thought about this over time and I'm sure I've run reports that were span, spanning over time where it's just using whatever the current data is for, for, you know, its analysis. And so like, if there's no guarantee that that's what the price was last year when they do changes. So it's, it's kind of neat, but yeah. For the 12 people that need this, they're gonna love it.

[62:08] Matt Kohn: Yeah. It's like, oh, so excited that I have it now. I mean, I can, maybe I could see the argument like, oh, it was cheaper before we moved into your organization. Like, no, it wasn't. Let me go over the historical data. I could see like, like to prove something, like it's probably its value, but it's so rare, I guess.

[62:25] Justin: Well, I mean, if you're, if you get cost increases and you're trying to understand why, this could be another level of granularity. Like, why is this, why are we so off forecast? Like, 'cause the price change.

[62:35] Matt Kohn: Although, was that typically coming in from when you move something into a billing organization or not? I mean, that's part that I'm like, yeah, I guess you're right. But that, that's, yeah, I get it. So like, hey, we acquired this company and we thought it'd be a lot cheaper to run than it is. And like, well, you know, we didn't factor in that they had a better discount on this that they lost and you know, things like that. But I don't know, coming in and out of a billing entity is such a rare thing unless you're buying and divesting companies a lot.

[62:57] Justin: Mm-hmm. So I don't know. Yeah, that's a fair point. That's not really that useful.

[63:02] Matt Kohn: Well, you know, Ryan, I know you are a huge fanboy of Kiro, probably your favorite IDE. I know I've seen it working on your laptop many a time.

[63:11] Justin: What is Kiro again?

[63:13] Matt Kohn: Oh, right. And so because I know you love your Kiro story, Amazon knew you'd love it too and has built an official merchandise store at shop.kiro.dev where you can pick up Kiro branded merch because you love it so darn much. And this is one of those moments where I think Amazon is trying too hard to make Fetch happen. Yeah.

[63:34] Justin: What a weird thing. Like you, you don't see merch stores for any other like sort of applications really, you know, like it's, you see company branded things.

[63:44] Matt Kohn: Like GitHub is probably one example I could point out. Like people have like a cult following over GitHub. Yeah. Merch.

[63:50] Justin: You're right, GitHub would be the—

[63:52] Matt Kohn: but I mean, like, look, the Kiro logo is a ghost. There's only so many ways you can dress that ghost up. So I don't think it's gonna have quite the cult following. And yeah, other than this beanie hat, which I kind of like, I mean, it's kind of cute.

[64:08] Justin: There's a couple things in here that are neat, but there's nothing in there that's going to entice me to buy anything.

[64:12] Matt Kohn: Yeah, the Kiro cap set, keycap set, like, I mean Yeah, the shoes, like, okay, I just don't know.

[64:19] Justin: I, you know, this, these are all be great, uh, you know, things to hand out at a conference.

[64:24] Matt Kohn: Yeah, I would love you to give these to me for free at Amazon re:Invent this year. Like, oh, you did, you did the Kiro, uh, you want this Kiro training? Here you go. You get a sticker or you get a, a cool shirt. Fine. I'm down with that all day long, but I'm not gonna be dropping $160 on these Kiro Ghost low-top shoes for sure. Yeah. Even though I do my shoe side, I was like, ooh, those are kind of cool.

[64:47] Justin: Yeah. It's just such an odd choice. I don't know. Like, I guess it's more odd that they announced it at this level. Like it's having a store.

[64:56] Matt Kohn: It's all about the community and trying to support this community building effort. But I'm like, what's the community around Kiro other than Amazon employees? 'Cause I think, right. For most people who I know who are using Kiro heavily is Amazon.

[65:06] Justin: Like I, I don't know anyone else using it in the wild for sure.

[65:10] Matt Kohn: Is it installed on my laptop? Yes, because I installed it when it first came out and I wanted to play with it for the show and I just opened it and I'm 17 versions behind. So yeah, I'm doing great.

[65:23] Justin: Yeah, I've never, I, I always forget what it is and it's not that great.

[65:27] Matt Kohn: So AWS WAF now lets content publishers charge AI bots for access to their content directly at the network edge using HTTP 402 responses and the X- X402 open protocol for machine-to-machine payment settled in USDC stablecoins. This addresses a real cost problem since AI bot traffic now exceeds 50% of web traffic for many publishers with AI-specific crawlers growing over 300% year over year, returning little to no referral traffic back to publishers. Payment settlements is handled through Coinbase's X402 facilitator with Stripe and Machine Payments protocol support coming soon. AWS does not take a cut of content revenue and the feature is available at no additional charge beyond standard WAF pricing. The feature builds on existing WAF bot control, which already classifies over 650 distinct AI bot types, including GPTBot, Claude Web, and PerplexityBot, assigning each a verified or unverified status using cryptographic signatures or IP reputation matching. Publishers can set per-request pricing by content path, bot category, or verification tier without modifying origin infrastructure. I mean, like, I like this better than the endless garden that Cloudflare sends your bot down when it hits it. Like, I, I, yeah, okay. And this is better than that, but, uh, or just straight denying it. We all agree about how much it is going to cost.

[66:40] Justin: It's, um, I, you know, like, I understand, like, if I wasn't, you know, a news media site, you know, like, all of that content is just, you're going to pay for all of the hosting and all the hits, and there's going to be zero benefit. You're not going to have any eyes on your page. It's someone just getting an answer on some other tool, right? So it's, I really do understand it and they're already struggling for money. Like, so it does sort of make sense to me that, well, they would need to do something like that. And I think the alternative to this is just blocking that traffic. And so you have AI that doesn't have the information out there, or it's just gonna hallucinate results. And, and so I, there's a part of me that really likes this because they're, you know, I, I do think there needs to be some sort of mechanism for charging for access, but it's sort of like, it is a weird model. And then how does that cost get back to the consumer? You know, is it just, is it in token charges, which are already, you know, getting really expensive?

[67:33] Matt Kohn: I mean, I mean, like, oh, we, we use Bolt. Bolt helps us with show notes because, you know, we, we couldn't keep up with all the news if it wasn't for Bolt to help us out. And so if I had to start paying for Bolt to go get your news article from Amazon News Blog, Yeah. Uh, to feed our, our show. I don't know how I feel about it. Right.

[67:57] Justin: I mean, you're already having to do the dance because certain things aren't accessible from, like, you can't hit OpenAI's blog from Claude and vice versa.

[68:03] Matt Kohn: Oh yeah. We hit it. We hit the Cloudflare, we hit the Cloudflare garden, you know, of nothing, uh, labyrinth. Then yeah, I, I basically have a, I have an out where basically if I can't get at the content in so many, you know, so many redirects, then basically it just bails out and I get a prompt to manually post the content. Then I use the AI to summarize it out so we have the show notes that we want. But yeah, I just, I mean, I don't think that we are the people that, you know, that a company would be concerned about from accessing and using their data. 'Cause we're reporting the news, it's gonna help our cloud providers make more money, hopefully, 'cause our listeners will go use the things we're talking about. Now, if I was gonna go take their stories and go build my own models, I can see why you'd want me to monetize. So again, it's all gonna be about what does that look like in this world? And you know, there was a, there was a, a point at one time where people were talking about the concept of, you know, people hate paywalls, right? And like the internet, by the way, has gotten very paywall heavy lately. Like cnn.com even is paywalled. Oh wow. Which drives me crazy.

[69:07] Justin: Yeah.

[69:07] Matt Kohn: But like there was this idea back, I would say, early 2000s where someone was like, well, what if you had this like ability to pay micro cents per article? So as you're browsing the web, it would just deduct from your account. And like the problem then at that point was, you know, we didn't have crypto, we didn't have any of the stuff to make it easy to not get gamed. So that stuff never came off. But like, this is, this is sort of that version of that, but for AI. Mm-hmm. And so it's, again, I don't know how that's gonna work. Yeah.

[69:34] Justin: I mean, it's, it might not work. In its current form, but I do feel like it's gotta, there has to be some sort of mechanism right there.

[69:42] Matt Kohn: Well, and I, and actually I like the idea of paying for an article, but I'm also not willing to pay a monthly subscription for it.

[69:52] Justin: Right.

[69:53] Matt Kohn: I think that's the bigger issue. And so I think, you know, like, am I willing to pay CNN $20 a month to look at an article on their website? No. No. Would I be willing to pay like this article that just, I happen to see the headline of and I'm interested in what it is? I'd be willing to maybe pay you 10 cents. Sure. Or 25 cents or, you know, probably a buck's the limit of my ability. And then I'm going to be really happy if your article is shit.

[70:17] Justin: Yeah. Well, I mean, so it's, it, I, you know, all the paywalls have forced me to go, I have a news aggregator that I pay for, right? Like that's sort of the model. And I assume that that's how the aggregator app is, is getting access to these things is sort of by transferring that money. And so it's like, I didn't. I don't like the idea of paying per transaction, per article, but I also can't afford to have a subscription to the Wall Street Journal and CNN and San Francisco Chronicle and the Guardian and these things, 'cause they're all just very expensive. And so it's, you need some sort of way to sort of funnel that to a single source.

[70:51] Matt Kohn: Well, and a lot of times, like, it's annoying because the article, you know, you'll find it on a news site and then you just go to AP. Mm-hmm. It's for free on the AP News and it's the same word for word. Yeah. Word for word.

[71:04] Justin: It's like, yeah. So you're trying to make that is super frustrating over there.

[71:07] Matt Kohn: It's very frustrating. So anyways, look, I am not opposed to this idea. I just worry about implementation. Yeah. And so, but if it also saves us from this world that we're getting to now, which is that the whole web is going paywalled everywhere.

[71:24] Justin: Oh boy.

[71:25] Matt Kohn: Bots out, maybe it's worth it.

[71:28] Justin: Yeah.

[71:30] Matt Kohn: And one more Nova-driven web development feature for AWS this week. AWS Sign-In now supports resource-based policies at the account level and resource control policies at the organizational level, giving teams a way to restrict console sign-in to specific trusted networks. Policies are evaluated both at sign-in and whenever the console session requests new credentials, meaning network restrictions are enforced continuously rather than just at the initial login point. RCPs integrate with AWS Organizations so security teams can enforce consistent sign-in network controls across all accounts in an org without configuring each account individually. This feature pairs with AWS Management Console Private Access to create layered controls, letting organizations define both which networks users can sign in from and which accounts those users can reach. The feature is available at no additional cost on all AWS commercial regions.

[72:15] Justin: And this is something we've wanted, like, for a while, right? Like, it's, uh, You know, SCPs kind of gave you—

[72:21] Matt Kohn: normally there's some security person who wants it, right?

[72:24] Justin: Well, I mean, so for SCPs, like, that's where the security really— they wanted just these big overall banhammers, right? Resource control is something that you— I think brings it a little bit more down to like the cloud team or someone who's kind of more in line with the runtime because it allows you to do contextual access based off of resource, right? Instead of granting all of the permissions to any resource, this allows you to specify, you know, the resources specifically. So it's a, it's a little bit closer to sort of the Google model where it goes from the resource and then goes out. But you know, like it's one of those things like this is how it's used and, and what the user behavior is, is important, right? SCPs are terrible. You just get a 403 and you don't know why. And it's like, I have the permission, how do I troubleshoot this? You know, like kind of thing. So I don't know if they fixed that since I tried to play with it one before, but I hope that this isn't really, isn't doing the same thing because it's gonna be locked away in an organization account that you don't have access to and you can't really see what's going on unless you're the cloud team.

[73:28] Matt Kohn: Mm-hmm. That's how we keep getting power, Ryan.

[73:31] Justin: That's right. That's why I still have access to everything.

[73:36] Matt Kohn: Our heads moved to GCP. Google's releasing Diffusion Dremel, a $26 billion mixture of experts open model under Apache 2.0 that generates text using diffusion rather than sequential token-by-token processing, producing up to 4x faster output on GPUs like the NVIDIA H100 at 1,000+ tokens per second. SpeedVantage is specifically designed for local and low-concurrency inference scenarios, not high-traffic cloud serving where autoregressive models remain more cost-efficient. Developers building real-time interactive tools like inline editors or code infilling tools are in the primary target audience. Model activates only 3.8 billion of its 26 billion parameters during inference, fitting within the 18-gigabit VRAM when quantized, making it compatible with consumer GPUs like the RTX 4090 and the 5090. Which is a notable accessibility consideration for developers without enterprise hardware. So yeah, that's nice.

[74:23] Justin: So I read this and I thought it was a typo at first. I didn't really understand, like, what is a, a 24 billion mixture of experts open model? Is that a specific type of model? What is that?

[74:33] Matt Kohn: Yes, it is a specific type of model, which I closed that tab that I was going to tell you what it was. Basically, it's a machine learning architecture that breaks a massive monolithic neural network into smaller specialized subnetworks. Called experts instead of activating the entire model for every calculation. So if the, let's say, if you think about the model has a bunch of data on Shakespeare and has a bunch of data on the Constitution and it has a bunch of data on math and has a bunch of data on science. If you're asking it a science question, it only needs to activate the part that's science related because it doesn't need to know about Shakespeare, doesn't need to know about any of these other things. And so a mixture of experts model basically allows you to route the traffic to the part of the model that is aware of the thing that you are asking about.

[75:13] Justin: Oh, that makes a lot more sense. So it's kind of like sort of specializing the vector search so you get more relevant response.

[75:19] Matt Kohn: That's cool.

[75:20] Justin: Yeah.

[75:20] Matt Kohn: It was like, it was like vector RAG inside the model itself, which is crazy. But yeah, it was a, it was a pretty big deal. Like 2025, I think it was the beginning of that year. It was the first time a mixture of experts model, and now all of them are pretty much a mixture of experts models, at least on the commercial foundational side. Varab being an open model is pretty good. And, uh, Again, as Ryan mentioned earlier about the cost of these models, um, you know, what I've started to do, and I don't know if you've done this, Ryan, but, uh, these open models, I've just got a Hugging Face and I just download one and I just put it on my Synology for the day that they start charging a lot of money for models and I'll just have a copy of them. So I mean, I'm just saying, you know, if you want to make yourself safe, uh, that's a thing you might want to do.

[76:01] Justin: You know, I haven't done that and it's a very good idea. Yeah, because I've been hesitating until I have something to run it on and yeah, maybe I won't have that ability. Right.

[76:11] Matt Kohn: All right. Uh, Google is announcing Antigravity, the AI, which is an AI— sorry, let me think how to say this. Basically, there's Antigravity, the IDE, which is what they started with, and then they came out with a CLI, and now they've got an SDK and Antigravity 2.0 as the, as the desktop app. All available to you now. And so Antigravity 2.0 is now the default recommendation for most users offering a standalone desktop app that can manage multiple autonomous agents working across independent projects simultaneously, including scheduled tasks for things like code quality checks, et cetera. And now you also have CLI for Go or for other things. You have the Python SDK if that's how you want to interact with it or anything. You can use Antigravity in any way you want to. That's what Google wanted you to know today.

[76:56] Justin: I thought Antigravity was an IDE where you did like a developer agenda. It's a CLI and it's also, Hey, and well, no, but I mean, why do you need an SDK for, like, I don't know what the functionality that you would interact with for an SDK. I guess it's probably the, the Azure execution.

[77:14] Matt Kohn: So I mean, now also I have—

[77:15] Justin: what are you gonna have Python do? Like, I thought it was like a coding environment.

[77:20] Matt Kohn: Yeah, I also don't know how long it's been since I've opened Anagrabby on my laptop either, so yeah, well, true.

[77:26] Justin: I don't— I, I've never installed it because I You know, like, I've already got enough. I've got Cloud Code and I've got Copilot. I don't need another one until it's got some sort of differentiating feature that'll lure me into it.

[77:37] Matt Kohn: You're— probably don't have cursor because everyone has cursors.

[77:40] Justin: Oh, in, in, in work I have cursor, but that's more because I have to evaluate it. So for use for the company.

[77:45] Matt Kohn: Yeah. Okay. Google is investing $1.5 billion in 2026 and 2027 to expand its existing data center campus in Jackson County, Alabama, a facility that has operated since 2019 on a repurposed former coal plant site. The expansion signals continued growth in Google's physical infrastructure footprint in the southeastern United States. The expansion is notable for its self-funded model, with Google covering 100% of its own power and infrastructure costs, which is worth noting for GCP customers thinking about how hyperscaler investments translate to regional capacity and reliability. Google is pairing with the infrastructure investment with a $2 million energy impact fund on partnership with TVA and Keneal, focused on local energy efficiency and weatherization programs, reflecting a broader pattern of data center operators addressing community energy concerns alongside capacity growth. And the backlash from communities who are very unhappy about data centers being built in their backyard. Yeah.

[78:33] Justin: I mean, that's, you know, on a former coal footprint is kind of a good way to solve that, I think. Right. So that's kind of cool instead of building in, you know, open wilderness and cheaper land. I mean, I don't know how expensive this land was. Maybe it was practically given away, but that is kind of a neat idea. And I think that As long as there's, you know, you're not overtaxing the local resources in terms of like, um, you know, power and water, which Google's generating their own. And so like, I think they're, they're responding to all of the negative press about these new data centers, which is cool. Used to be just something we touted like, yay, more, more region availability.

[79:12] Matt Kohn: And now it's like, uh, we were talking about in terms of sustainability and how the things are trying to sustainability targets. So then now it's like, oh, everyone's mad at them. Like, okay, well, inability now becomes how do we make people not mad at us, right? So, well, if, uh, you're a hardware geek like Ryan and I are, you're going to love this next story. So Google has released, uh, an article about their system they developed called Brazos, a rack-mounted closed-loop liquid-to-air cooling system designed to handle chips exceeding 1,000 watts thermal design power without requiring full facility retrofits. Installs one rack at a time into existing air-cooled data centers, separating the internal liquid loop from the facility water supplies. Each Brazos unit supports 60 kW of thermal load per rack across 3 modular chassis, runs on deionized water on a 25% propylene glycol mixture, and operates on a 40-60V DC input connecting directly to standard rack busbars. Pumps and fans are hot-swappable, field-replaceable units to reduce repair time. Brazos uses OCP ORV3 form factor racks, and Google plans to open-source the full technical specifications through the Open Compute Project forum in the coming months. Inviting manufacturers and thermal engineers to produce and market the design independently. The primary audience for this announcement is data center operators, of course, running legacy air-cooled facilities who need to support high-density AI or HPC workloads without capital expense. So yeah, it's, uh, and there's some good pictures of these things in there. Overall, pretty darn neat tech. And, uh, yeah, this is a big problem for legacy data centers that don't have the infrastructure. You know, back, uh, in early 2000, it was like, oh, your data center doesn't have 3-phase. And then most data centers had to retrofit to get 3-phase and then, you know, now it's a cooling problem and power density problem. And so anything you can do to retrofit existing data centers means you don't have to build new ones.

[80:54] Justin: Yeah.

[80:54] Matt Kohn: So this is a great, uh, thing.

[80:57] Justin: Yeah. Back when I was building data centers, like it was one of those things, like several data centers that we had were half empty, right? Because we didn't have the power density. And so it's like, it's just, you know, millions of square feet of just empty space with a little like rack mount sticking outta the floor. So you can't even scooter over it. I was really, I was really frustrated and obviously younger. But yeah, no, I mean, it's, it's crazy. And so I, I like seeing, you know, these kinds of announcements. I'm a little much, I'm a nerd obviously. And you know, I'm, it, I'm a little jealous that I can't run this on my little home rack that's under my desk. You know, it'd be pretty sweet to have, you know, 25% propylene glycol mixture that does all the cooling. I mean, but it would also—

[81:36] Matt Kohn: your 60 kilowatts that you've used. Yeah.

[81:39] Justin: And it just sounds like a 747 taking off under my desk.

[81:43] Matt Kohn: But it's cool.

[81:44] Justin: Yeah. I like that they're open sourcing these and communicating it because this is a lot of R&D that's difficult to do. And I remember trying to get, you know, uniform, you know, results and in, you know, active production space, which is difficult, right? Like we tried to wall it off with plastic and get all these things. But then someone would go and, you know, prop the curtain open so that they can do a hard drive replacement, ruin all the metrics.

[82:10] Matt Kohn: Yep.

[82:10] Justin: So yeah.

[82:12] Matt Kohn: All right, let's move on to Azure. Stop wasting your time and use custom extensions for PIM approvals. Custom extensions for PIM allow organizations to inject their own approval logic into the privileged identity management workflow by a standard REST API, replacing manual approval steps with automated validation against external systems like ServiceNow, Workday, or Dynamics. When a user submits a PIM activation request, the system pauses its internal checks and sends an HTTP payload to your custom API endpoint, which then returns an approved or denied response that PIM executes automatically, supporting both pre-approval and post-approval configurations. The licensing requirement is a notable consideration, with custom extensions requiring Entra ID governance licenses or Entra Suite, not just the Entra P2 licenses that cover standard PIM functionality, which adds cost for organizations looking to automate their approval workflows. This feature is best suited for organizations that already have mature PIM process in place and want to reduce admin overhead through ticket validation automation rather than those still working on basic PIM adoption.

[83:08] Justin: Yeah, I, I love seeing more things where it's part of the identity management because this is how you get rid of standing permissions, right? So this enables things like just-in-time permissions and then having a custom integration that, you know, maybe you link to your change window site or your you know, like a, your incident command center or what have you. And so you can grant access more programmatically that way. And then that access doesn't remain, like no one has to go clean it up or do those kinds of things. So this is, this is kind of cool. I like this. I think that we're going to see a lot more of this as, you know, everyone's trying to deal with agentic identity and how are we going to manage that? Because you can't use standing permissions for that. It's just going to be too crazy. So yeah, cool.

[83:50] Matt Kohn: This is pretty cool. Amazon— or Azure Container Apps Express app is a new preview creation mode that eliminates the need to pre-provision a Container Apps environment, reducing deployment time to under 3 minutes from zero to a publicly accessible URL. It's currently only accessible via containerapps.azure.com and the Azure CLI, not the main Azure portal yet though. Express mode auto-provisions its own environment, requires only 3 inputs: app name, resource group, and region, and defaults to a public endpoint on port 80 with 0.5 vCPU and 1GB memory, making it well suited for rapid prototyping and CI/CD pipelines. Express apps support scale to zero and up to 300 maximum replicas with KEDA-based scale rules, putting it on par with standard container apps for burst scenarios despite the simplified setup experience. Also seems like a good way to waste a lot of money. You're setting up a whole box to run half a CPU and 1GB of memory. So if you have containers. Or EKS already running, probably don't use this.

[84:46] Justin: So hear me out. I know I've, I've completely bitched about Beanstalk and Lightsail for ages, but that was primarily because people were using it for like LAMP stacks and, and serving sites. But Google's version of this is Cloud Run, and I thought I was just setting up like a, a state machine using serverless functions, but it's, it's, it's very similar to how this is written out, which is like you've got a full URL that you can hit as your API target. It's got the scaling automatically built in and it's got, you know, resources that you still can tune to your workloads. And so I've become kind of addicted to running this 'cause I, you know, I don't do a lot of apps at scale. I'm primarily making internal applications and little private stuff. And it's kind of nice to just, just have a very simple setup and just have an endpoint that you can stand up and put authentication in front of and Just now you've got an internal app that's all serverless and scales down to zero. We've used it a lot this year.

[85:45] Matt Kohn: Oh, good to know. I, uh, you're apparently becoming an Azure fanboy over front of my eyes.

[85:49] Justin: Well, I'm using the Google version of it, but, uh, so it's more of the mechanism. I have not used it in Azure, nor do I plan to.

[85:59] Matt Kohn: Well, in the pre-read, we taught Ryan that Microsoft Defender for Linux is a thing and it exists. And in fact, it's getting feature updates. Because today Microsoft Defender for Endpoint on Linux now supports scheduled antivirus scans, a capability that security teams have long relied on for consistent threat coverage across device fleets, addressing a notable gap for organizations running Linux workloads under compliance frameworks that require periodic full system scans. The feature helps catch dormant or previously missed threats that real-time protection may not surface, making it particularly relevant for servers handling sensitive workloads where periodic deep scans are part of the audit requirement. This addition brings Linux endpoint protection closer to feature parity with the Windows version of Defender, which matters for organizations managing mixed OS environments through a single security platform like Microsoft Defender XDR. Current customers are enterprise security and compliance teams running Linux servers or in regulated industries such as finance, healthcare, or government, where scheduled scan logs serve as evidence for audits. I mean, first of all, scheduled scan as an audit requirement is just so annoying to me.

[86:56] Justin: It's terrible.

[86:57] Matt Kohn: Yeah. Like, the, I mean, like, the reason why most of the industry moved away from scheduled scans was because, number one, they would crash laptops and systems. It's super disruptive. They were all scanned at the same time because none of them had any thought of backoff or staggered starts or anything. So that was, that was the first problem with them. But number 2 is like, if you're not accessing the file that has the virus in it, then the virus isn't being— isn't running. So the need for you to scan it and know it's there proactively is helpful, sure, but also like not really that big of a threat. Where if you're accessing in the runtime, then yeah, it's a big threat and you should take care of it. And that's why these scanners all went to more active mode defenses and dealing with it from that direction. So including Defender, I don't know about Linux, but for Windows, Defender, the Defender, I'm sure this version is the same way. Uh, and Defender began that way as well. But yeah, the scheduled scan, I get that it's a compliance framework, but I just, it's like, you should have a password that's 30 characters long and super complicated. Like, you know, it had time, but now it just makes people make, use the same password with number 1, number 2 at the end of it. And that isn't good for security either.

[88:04] Justin: No, sure isn't. Yeah, it's definitely, you know, it's like you have to think about the impact holistically of, of any security rule, right? You wanna make things better. I totally get it. But it's also sort of like at a certain point you have to remember that you've got humans in the mix. They're gonna do the easiest thing. Antivirus, like, I don't know how many tricks I've seen people over the years, um, put in place on their computer so that they would trick it. If it was only going to scan while idle, they just constantly had it running certain things or changing times or something like that, you know, trying to delay it. And there's a reason, right? It's disruptive. They, they didn't go through all that exercise because they're being nefarious or anything. They just wanted to do their work. And so this is, you know, like, I don't know if there's a plan or a place for these things anymore. And if there is a place for scanning antivirus, you got to do the digital twin method, which has become very popular for this.

[88:58] Matt Kohn: So.

[88:58] Justin: Mm-hmm. Like, don't, don't get in the way of your humans. They're the ones like doing the work, or your AI agents now, which still have to use these compute things.

[89:07] Matt Kohn: Indeed. All right, well, I have an Oracle story for us this week. Wow. Uh, they had their Q4 earnings. They reported Q4 fiscal year 2026 total revenue of $19.2 billion, up 21% year over year, with cloud infrastructure growing 93% and total cloud revenue reaching $9.9 billion. The growth is notable but worth watching given that the free cash flow was negative $23.7 billion as Oracle continues heavy data center investment in, uh, GPUs. The remaining performance obligations figures of $638 billion, up 363% year over year, sounds striking till you read the fine print. A substantial portion comes from large AI contracts where customers either prepaid for GPUs or supply their own hardware totaling $75 billion. Structure shifts capital burden to customers rather than Oracle, Oracle Multi-Cloud AI Database reportedly grew 404% in Q4, which the company calls its fastest-growing product ever, though it's growing from a smaller base and the metric reflects early adoption momentum rather than established scale. Oracle's guiding to $90 billion for, in total, for fiscal year 2027 revenue and expects cloud revenue growth of 57% to 64% in Q1 FY 2027, which is this quarter we're currently in, uh, which would require sustaining the current infrastructure buildup funded by roughly $40 billion in planned debt and equity financing this fiscal year. So yeah, uh, they're spending a lot of money on, uh, AI, so hope it works out for NVIDIA. They're also using— they got ahead of a lot of the GPU needs, so they have a lot of capacity, which is why a lot of these cloud vendors are signing pretty lucrative deals with Oracle right now to get access to that GPU capacity they have.

[90:36] Justin: Yeah, I mean, and if it's just straight compute capacity, like, why not use the cloud that has it and the cloud you have access to, right? Like, We get all like, you know, we get to hate on Oracle, but it's, it's like, it's largely just because their, their breadth of services isn't very wide, and so operating can be very limiting, you know, or you're standing everything up on a server because that's the compute you have access to. So yeah.

[91:00] Matt Kohn: And then our final story for tonight is a cloud journey. This is a great one, uh, if you're talking about doing AI-native engineering and what that means in this world of AI, and from Claude, Anthropic itself. Uh, and so basically at Code with Claude, uh, San Francisco 2026, Director of Engineering for Claude Code and Claude Co-Work, Fiona Feng, walked through how the team processes and structures changed once agentic coding became the default way they work. And so there's a YouTube video of her actual presentation, and then the blog post kind of follows along with that. Uh, but basically, um, there's a couple things that kind of jump out at me. You know, the team replaced traditional sprint planning and design docs with just-in-time planning built around prototype and PR discussions, reflecting that long-horizon roadmaps obsolete when execution speed increased substantially. Uh, human review is now only reserved for specific high-stakes areas like security-sensitive code, legal risk, and product judgment. While Claude handles style, linting, bug catching, and test generation automatically. Role boundaries have blurred a lot with product managers writing more code and engineers taking on design and content work, which has practical implications for how teams hire and structure their teams in the future. And so basically, and then it basically has some metrics that you should use in the article as well, suggesting engineering leaders track 3 metrics to adopt agentic workflows that and caution against treating throughput as the primary success measure, since the real goal is solving the underlying problem faster, not just generating more output. So yeah, it's a pretty good article. Quite a few things. Yeah, they had some good before and afters. So, you know, product roadmap for 6 months. Now they're just in time and prototyped. Context, you know, was find the person who wrote the code and ask them what they did. Now they ask Claude first, which I do that all the time. Oh, I did. Damn it. Code reviews, they used to have humans review everything. Now Claude handles style, bugs, and tests, and humans review domain expertise. And then team, you know, makeup we just talked about. You know, they had a lot of must-dos, like relentlessly dogfooding your own product. So you got to be constantly be using it so you can see issues and identify things and have ideas for new features. You know, try to keep your team as flat as possible and don't hesitate to kill processes that now no longer work.

[93:03] Justin: Yeah. Now I was interested in this article. There's a lot of things that I find suspect. 'Cause I don't know that it's a good idea to sort of put everything into AI terms. Like the domain expertise thing is funny because there's such a sliding scale in how you identify that. And it's, it's great when you know it, but there's so many things that are caught during a review with this, you know, someone senior on the team. It's like, have you thought about this interaction between these two things? And so it's, AI is good at some of that and it's getting better. And so over time, I think it's great. A test generation. Yeah. I don't think there's any reason for people should write tests anymore, but I've also caught AI writing tests that are of no value because I gave it an aspirational goal of, of code coverage percentage. And so it just did the thing. It, it moved to the number, you know, like, and it's like, oh, that's not getting the right thing.

[93:52] Matt Kohn: So, right. You can hit the percentage, but is the percentage valuable tests? That's the question.

[93:56] Justin: And that's, you know, like it's one of those things where AI really wants to solve your problems at all costs, right? And so like, it's, you gotta use that, those things with a grain of salt. But the things that they did get right are really right. Like the, you know, the planning, sprint planning and roadmaps and, and that kind of thing. It really did in engineering teams become something that you, you know, became a perfunctory thing, became something that you did and then it would change. It was very difficult to manage. There's a lot of overhead, so you just reduced it down to its bare bones so that you coordinate with external teams, but put as little effort into it as possible, because that's time you don't have, right? It's not laziness. It's, there's, there's just no time to do everything. So I think that's really great. And then under, you know, the, the flattening of the team and the sort of, you know, the, you know, everyone now can serve very similar purposes. You don't have to have these silos where you've got your DevOps engineer that only does this, your release engineer that only does the releases. You can actually, you know, sort of spread that out now and you can have everyone sort of contributing more regularly. And as long as you are, still operating as a group, I think you're going to get a lot more value out of that because I think that people are going to be exposed to a lot more and can contribute more versus being stuck in their silos.

[95:10] Matt Kohn: Yep. The 3 metrics that they, they talk about in the article that you should measure if you're an engineering leader starting out is one is onboarding ramp time goes down. So how soon can an engineer, a designer, or PM start being effective? You know, so that's an area definitely You know, you hear stories of companies are, oh yeah, it takes 6 months for someone to be capable of doing anything of meaning. If you can pull that down to 3 months or 2 months or a week or a day, like that's hugely valuable. Uh, and it was, and one of the things early DORA metrics talked about a lot was, you know, the faster someone can start shipping code, the faster they can start producing of any kind. And so that's a big deal. Mm-hmm. PR cycle time goes down. This is the time it takes, uh, from a PR getting generated to when it gets through the build process, when it gets reviewed, all the different checks are done. And so like, is that cycle time going down? And then the one that is a little self-serving is, how many Claude-assisted commits are you doing? But then, you know, ultimately they said the big thing is, are, you know, are the measure— the real metric is measuring the thing you're trying to solve. And so is the problem you're solving going away? Is it becoming better? That's really what you want to track on. So yeah, this is overall a really great article. I actually watched her video before I read the whole blog post, and it's— she's a nice personality because she's energetic, you know, entertaining to listen to. And so I definitely recommend checking out the video.

[96:24] Justin: Oh, cool.

[96:25] Matt Kohn: If you have a chance to hear it and hear the questions asked to her, et cetera.

[96:28] Justin: So yeah, I just read it, so I'll check out the video now because I, there's definitely a lot, a lot of things to discuss and a lot of, I think, you know, this is an area like we covered at the top of the show, which is like this, the tools that we have now are going to be transformative in terms of not only what we're doing, but how we do it. Exactly.

[96:48] Matt Kohn: Well, Ryan, we made it. Woohoo! This is going to be a longer show because we went on some sidetracks.

[96:54] Justin: So that was— yeah, remember, remember the pre-read where I said, oh, it's just two of us, it'll be fast? Yeah, I knew as soon as you said that, I was like, yeah, I jinxed it. This is all my fault.

[97:02] Matt Kohn: Yeah, it's all right. All right, well, we'll see you next week, uh, here in the cloud.

[97:06] Justin: All right, bye everybody. Another week of cloud news wrapped up.

[97:12] Matt Kohn: Vault will collect the news.

[97:14] Justin: Justin will get the notes. Jonathan will write some code.

[97:17] Matt Kohn: Ryan will watch the perimeter. And Matt will reluctantly watch Azure.

[97:23] Justin: Till next week for AI, Amazon, Google Cloud, and Azure.

[97:27] Matt Kohn: And hey, maybe even Oracle, who knows?

[97:31] Justin: Check out TheCloudPod.net for our newsletter. Join our Slack, message us on socials, or leave a review.