# 358: AI Spend Limits Because Frontier Models Aren’t Free Therapy
Duration: 82 minutes
Speakers: Ryan, Jonathan Baker, Ryan Lucas
Date: 2026-06-19

## Transcript

[00:04] Ryan: Welcome to The Cloud Pod, where the forecast is always cloudy. We talk weekly about all things AWS, GCP, and Azure. We are your hosts.

[00:15] Jonathan Baker: Justin.

[00:16] Ryan: Jonathan.

[00:16] Ryan Lucas: Ryan. And Matthew.

[00:19] Ryan: Episode 358, recorded for June 9th, 2026. AI spend limits because frontier models aren't free therapy. Good evening, Matt and Ryan. I don't know, I use the company AI for therapy all the time. Yeah, technically it's free.

[00:32] Ryan Lucas: It's free for me, not for somebody else, but that's their problem, not mine.

[00:36] Ryan: This is the really angry email I'd like to send to this person. Will you please make this politically correct? That is therapy. It is therapy in its own way. That's all I'm saying.

[00:46] Jonathan Baker: Kinda, kinda.

[00:47] Ryan Lucas: Yeah, remove all swear words from this, please.

[00:50] Jonathan Baker: It's— I don't know if it's, it's It counts as therapy until the model's coming back and telling you how broken you are and why.

[00:57] Ryan: Yeah, that, I mean, that's really probably—

[00:58] Jonathan Baker: Yeah, that's the only thing.

[00:59] Ryan: Yeah. I mean, I never tried to do therapy with my model, but, uh, you know, maybe that's a thing. I don't, I don't know. I know I, I've heard it's a thing and—

[01:05] Jonathan Baker: I, I'm sure it's a thing, but I've seen the hallucinations.

[01:08] Ryan: I don't, I don't think I trust it. Yeah. It's, it's real, uh, real suspect at times. All right. Well, we do have some follow-up. Um, if you guys have been paying attention, you know that GitHub has had a really rough couple of months. Months. And to the point that people are saying they're gonna get rid of GitHub or they're gonna move to GitLab, or I think I even saw, I think it maybe Cloudflare, or maybe it's someone else, they were saying they were building their own version of GitHub because they're like, you know, we need— I'm like, great, that's all we need is more GitHubs in the world. We already have, uh, Atlassian, we already have GitLab, etc. But basically, uh, this article from The New Stack said GitHub scale challenges has grown substantially beyond earlier projections, and the platform processed 1 billion commits in all of 2025 but now handled 1.4 billion commits per month.

[01:51] Ryan Lucas: Wow.

[01:52] Ryan: With AI agents alone generating over 17 million pull requests each month. Technical remediation work has shifted from surface-level scaling to architectural rebuilding, and GitHub has addressed MySQL contention, moved webhooks off MySQL entirely, rewritten the GitHub Actions job dispatch system, and is migrating performance-sensitive code from its Ruby monolith to Go. GitHub's migration to Microsoft Azure, previously reported as a capacity move, is now described as a deeper infrastructure overhaul. The goal is service isolation so that a degraded subsystem like Actions does not cascade failures to Git or other core services. I mean, the fact that you're putting on Azure is its own blast domain thing, is it not? Microsoft is providing engineering support from teams that experience scaling systems at comparable load levels. Represents a more direct operational involvement than was previously discussed. New features released like the Copilot CLI app are being developed outside the core GitHub.com infrastructure, which GitHub says allows contained product work without adding a risk to the system currently under heavy remediation efforts. So maybe it'll get better. We'll see.

[02:48] Jonathan Baker: Yeah. I mean, I don't know, like I'd actually like to see AI coding take this up a little bit because I think it is a ridiculous sort of growth that I don't think is sustainable. And so much of like Vibe-coded garbage is really bloated. Like I'm, I'm glad it does, you know, really long and descriptive docstrings and other things like that. But there are definitely, functionality things that it can do a lot more efficiently and doesn't. And so like, it's, I think that's part of why this has grown so much is that it's, it's also bigger commits in general.

[03:22] Ryan Lucas: Well, it's also just everyone is doing commits now and you're not, you know, at least from what I can tell, people aren't testing as much locally and things along those lines. They're like, well, screw it. I have a proper CI/CD system. Let's just commit and push and go from there. And the fact that, you know, anyone can do it means that it's much simpler for people to bloat that number up. I mean, I can tell you my number of commits and pull requests have gone up because I can be doing something, AKA doing the podcast and doing a pull request for PullBot at the same time, versus before it required me to actually sit down and read the code versus just say, "Hey, Vault, you're broken, go fix yourself." No, I mean, it's definitely increasing productivity, right?

[04:09] Jonathan Baker: For sure.

[04:09] Ryan Lucas: But like that level of scale increase in a billion in 2025 and we're not even halfway through the year and you're 1.5x or— It's 1x per month.

[04:19] Jonathan Baker: Per month.

[04:20] Ryan Lucas: Per month is like, it's a completely different scale.

[04:23] Jonathan Baker: Yeah. Yeah. Like I don't know how to buy hardware that fast.

[04:26] Ryan: It's one of my pet peeves right now is that—, the re— like a recent update to either Cloud Code Harness or to the model for when I moved to Opus 4.8, it really likes to do PRs. So like, if I ignore it, all of a sudden I come back to like 7 PRs to review. I'm like, why? Why are you doing this to me? Like, I get that you're trying to be small and compact, but I'm like, just do it at the commit level. I don't need these PRs to be broken up this way. Then it's like, you need to, you need to deploy these 7 PRs in order. I'm like, oh my God, this is terrible 'cause it kills the CI/CD pipeline.

[04:57] Jonathan Baker: Oh, it's finally enforcing good hygiene.

[05:00] Ryan: I know, but I don't want that.

[05:01] Ryan Lucas: I know. Justin, you just commit to main. What could possibly go wrong?

[05:07] Ryan: Yeah, yeah. And then you started working on Vault too. And then all of a sudden I couldn't commit to main anymore 'cause it would fuck up poor Matt.

[05:13] Ryan Lucas: Mm-hmm.

[05:15] Ryan: So I had to pull requests.

[05:16] Ryan Lucas: Hey, I added features that work for me and help me out.

[05:19] Ryan: Hey, they're great and they're awesome features and I like them as well. No, only you and I use them 'cause Ryan and Jonathan can't be bothered, but you know, You know, they're there, so.

[05:28] Ryan Lucas: I like them. Ryan actually.

[05:31] Ryan: He did use it this week.

[05:32] Ryan Lucas: Responded to the, what was it, the scheduler piece of the bot. I was impressed. I even called it out that you used it, hoping that Jonathan would do it, but you know, he's not here to defend himself.

[05:45] Ryan: No, no. Reverse psychology does not work on Jonathan. Like really just punching him in the face, like that's the only way to really get him. That's the only way, yeah. Yeah, it's, you know, school of, there's butt learners and there's not butt learners. And Jonathan's a bit of a butt learner. You have to kind of use the paddle, if you will.

[06:03] Jonathan Baker: I've never heard that term. That is really funny.

[06:06] Ryan: My middle school PE teacher, the very first day of PE when I came out of elementary school, in middle school, he walked through the locker room, was telling us how the locker room had to be clean and quiet and blah, blah, blah. And he was just smacking his paddle. And at that point, it was capital crime to use a paddle on a child, but he, he gave that anecdote and it stuck with me for a long time.

[06:24] Ryan Lucas: Now there's a therapy bot that you can use called Claude.

[06:28] Ryan: Yes, I know. I could get some therapy for this. Yeah. But, uh, yeah, no, it, uh, it's never a problem, but I was just always like, oh, that's, uh, that's an effective threat.

[06:37] Jonathan Baker: I mean, when's the last time you left a locker room dirty?

[06:39] Ryan: It's never happened since middle school, right? Yeah, there you go. You also, like, I don't talk in the locker room because of it. Like nothing, you know? Oh yeah.

[06:47] Jonathan Baker: Well, I mean, I don't know if anyone talks in the Docker.

[06:49] Ryan: Well, old guys do.

[06:50] Jonathan Baker: Old guys do.

[06:50] Ryan: Yes, that's true. I don't know what age that happens, but I hope not soon.

[06:54] Jonathan Baker: Yeah.

[06:54] Ryan Lucas: It hasn't happened to me yet. Not there yet.

[06:55] Ryan: Yeah. All right. Let's move on to AI is how ML makes money this week. So first up, Snowflake had their conference last week. The next week is actually Databricks conference, which I'm actually going to Databricks conference. So I'll tell you what that's about 'cause I've been invited by the vendor and the sales rep and I'm gonna go. So it's in San Francisco, it's not a lot of travel, which is nice. But Snowflake released a bunch of things, a couple of them we'll talk about today. We're not gonna talk about everything 'cause—

[07:22] Ryan Lucas: Sweet Jesus.

[07:23] Ryan: Neither, none of us could make it through. But Snowflake's interoperable lakehouse is now generally available, built on Apache Iceberg v3, Apache Polaris, and a new Open Semantic Interchange spec with 54 different vendors supporting it. Iceberg v3 supports adds variants for semi-structured data, row lineage deletions, vectors, nanosecond timestamps, and geospatial types, closing the gap that previously made Iceberg impractical for many. Many workloads. Horizon Catalog now supports full directional read and write access from external engines like Spark, Trino, and PyIceberg. And zero copy integrations with SAP, Salesforce, and Workday bring enterprise system data into Snowflake without ETL pipelines preserving semantic context so AI agents reason over current governed data rather than stale copies. Managed Iceberg replication and failover is coming soon as general available with an optimized refresh feature coming soon as well.

[08:11] Jonathan Baker: I mean, I'm not a data scientist, so I don't use a lot of these tools. You know, it does seem like, you know, like I'm really curious on how they support geospatial types in an Iceberg dataset. That's crazy town to me, but that's probably 'cause I don't really understand what's happening under the hood.

[08:30] Ryan: Yeah, once I understood Iceberg a bit better, it all started to kind of make sense, but I didn't get it till I spent some quality time with Claude going like, explain Iceberg to me like I'm 5 years old, please. Then he explains it, I'm like, oh, okay. Then it got more complicated from there. Yeah. Yeah, so I'm hoping that the Databricks conference next week is not as riveting as that and is hopefully more exciting 'cause that's gonna put me to sleep if that's the conference. Good luck.

[08:53] Ryan Lucas: Yeah.

[08:54] Jonathan Baker: I'm not hopeful.

[08:56] Ryan Lucas: Yeah.

[08:57] Ryan: Yeah. Well, of course every conference these days has to have AI and Snowflake also delivered on the AI front with a couple different things. First up, an AI coding agent for the modern data stack. This was announced at the SageMaker Summit. Summit, expanding it from an AI coding agent into a full AI development platform with a native desktop app for Windows and macOS. Cloud agents run inside Snowsight, an agent SDK, and upcoming Slack and mobile integrations. With each cloud agent session provisions an isolated container that can run Python shell commands, dbt builds, and web searches with no local setup required. There's lots of specs on how it does on benchmarks, which no one cares about because no one knows what the benchmarks actually are.

[09:32] Jonathan Baker: Mm-hmm.

[09:32] Ryan: Uh, but they also said, well, not only that, we're gonna give you Snowflake CoWork, which is the rebranded Snowflake Intelligence product, is now Snowflake CoWork, positioning as a personal work agent for knowledge workers that combine proactive task automation, multi-agent orchestration, and persistent memory across sessions. System moves beyond reactive Q&A towards background monitoring, central analysis, and direct action tools like Slack, Gmail, Salesforce, and Jira by MCP connectors. The upcoming Cortex Sense context layer is a notable technical addition, automatically learning business definitions from query history, dashboards, and metadata to make your CoWork better. I mean, I assume Anthropic will be suing them any moment for trademark infringement, but, uh, nice to see that you're getting some smartness for the data friends who, uh, desperately need all the DevOps help they can get. So I appreciate they're getting these tools.

[10:19] Jonathan Baker: I mean, I guess I don't really see this as DevOps and I don't really understand why, since these are separate products offered by Snowflake, like why they would do this. Like I don't, they don't talk in this article enough about like the data specificity or like what's in the model that it's using. It's, you know, it's the same announcement you get from Anthropic and, and, you know, OpenAI when they release new things. And so now there's just one more out there, another CLI. Like I already have, I think, 4 or 5 installed on my laptop right now. Like, and I don't use any of them.

[10:52] Ryan: What do you have installed on your laptop that you're not using?

[10:54] Jonathan Baker: So I have Amazon Q, I have Gemini Enterprise. Okay. I have Claude.

[11:00] Ryan: Uh-huh.

[11:01] Jonathan Baker: And I have one more, or maybe I, no, I'm thinking of Cursor.

[11:05] Ryan: I'm thinking Cursor. Yeah. I thought you had GitHub as well. GitHub Copilot.

[11:09] Jonathan Baker: I use GitHub at work.

[11:10] Ryan: Oh, okay. But on your personal, you're saying—

[11:13] Jonathan Baker: Yeah, personal, I have those and I, you know, I try to use them, but it's just like, you know, I use Claude the most 'cause they're the one that pay money, but it's kind of, and now there's another one, like I don't, It's almost just checking the box though.

[11:27] Ryan Lucas: We have it too. This is what we needed to do to become a, I don't wanna say real tool out there, but you know, essentially that.

[11:34] Jonathan Baker: Do you?

[11:34] Ryan Lucas: But that's what their marketing team has told them. Hey, we're missing on the little matrix that does a comparison between all the different companies. This is what we need to do in order to check the box to be able to do it. I'm not saying that it's not useful and potentially a targeted audience for a data scientist is gonna find it more useful 'cause most likely they are. Than, you know, a DevOps or a security person will. But, you know, it, it's somewhat, I feel like just saying we have it too, we've done our own model, here's our testing of it, et cetera, et cetera. And it's not adding a whole ton of value, I would say to me, but that doesn't mean it's not adding value to somebody else.

[12:16] Jonathan Baker: I mean, I'm sure it's adding value to someone.

[12:20] Ryan: It's just value that I think already exists and I don't know I mean, I kind of go back to my original Vault use case when I was trying to learn how to use the Google Workspace APIs. And I was like, yeah, I was able to use ChatGPT and I was able to use Claude, but like there I was having all kinds of errors. And then like, well, who's gonna probably have the best documentation in its model was Gemini or whatever it was called at that point. And so I was able to use that to kind of finally finish the last few rough edges of my, my integration to docs, but that is that the frontier model itself or the tool, right?

[12:53] Jonathan Baker: 'Cause this isn't using Snowflake's model, at least not in the article. It doesn't mention that it opens.

[12:58] Ryan: Yeah, it doesn't mention that specific, I guess. Yeah, that's a good point. I mean—

[13:01] Ryan Lucas: I assumed it was.

[13:02] Ryan: I would think that you might be able to have, you know, MCPs and other things that you put into the tooling to make it slightly more opinionated. But yeah, you might be right.

[13:09] Jonathan Baker: That's a good point. I mean, that's the part I, it's a, you know, they're announcing a desktop app for— and it, I'm just like, eh.

[13:16] Ryan: I mean, like, but think of all the data stuff that people do that isn't, I mean, like building out Jupyter Notebooks isn't something that I'm doing. So maybe that's something that if you built AI into a Jupyter Notebook, it makes more sense. I don't know. I have no idea.

[13:29] Jonathan Baker: Yeah, I guess they do kind of talk about pass rates and so that does sort of seem like there's a model behind it, but they don't reference it specifically. I don't know. Still don't get it.

[13:38] Ryan: I mean, and they do have their own models, I believe, if I recall.

[13:41] Jonathan Baker: So, but they also do.

[13:42] Ryan Lucas: They do.

[13:42] Jonathan Baker: And there's a lot of value in that.

[13:44] Ryan: They also support Anthropic and they support, you know, OpenAI and a bunch of others too. So like, you know, it's probably one of these like, you can use our tool with any of these models. But I mean, everyone's got a CLI now. Even, even, um, uh, Antigravity has a CLI, which replaces the Gemini CLI apparently eventually. Does it replace the, or just, or are they just gonna add it in? It's additive right now, but they're saying it's gonna replace it. I mean, Google's product strategy makes no sense.

[14:10] Jonathan Baker: No sense. Especially around AI.

[14:12] Ryan: Yeah. Well, yeah, well, good. Anyways, I'll keep you guys posted on Databricks next week and what's cool there. And I haven't been to a data conference in a while.

[14:20] Jonathan Baker: When they announce their own exact AI developer tool.

[14:22] Ryan: Yeah, when they have a developer tool, I'm sure. It's gonna be a whole thing. I'll let you know. Keep you posted. Anthropic, if you don't know, is one of 3 IPOs that's gonna be coming in the AI space. XAI is the first one to file, then Anthropic filed theirs, and then OpenAI filed theirs shortly after. The only one I've been able to read the S-1 for so far is XAI's, and I can tell you, don't believe the Elon hype on that one. So I'm very curious what's actually in the S-1 for both OpenAI and Anthropic. I'm sure there's quite a bit of interesting math going on, but I hope it's not as bad as XAI's one, because XAI's one is crazy town.

[14:58] Jonathan Baker: My understanding is that you could not get the details for both OpenAI and Anthropic.

[15:04] Ryan: Not yet, 'cause it's a private S-1 right now, but before they go public, it has to come out. It has to, yeah. Yeah, so for right now it's private, but I'm hoping soon it'll get released and we can do math on it. But that being said, they released their secret, announced they filed it and did the pipe over for a $1 trillion valuation, and then basically told everyone that they, calling on all major AI labs to consider slowing development, citing the risk of recursive self-improvement where AI systems could enhance their own capabilities without human intervention. There's a co-author by Jack Clark estimated this could happen within the next 2 years. Proposal draws a direct parallel to nuclear arms control, suggesting a global agreement and verification regime. Anthropic noted a key challenge: training runs are far easier to conceal than missile silos, raising practical questions about enforcement. Critics include David Sachs characterized the move as a regulatory capture, arguing that established players advocating for development restrictions could disadvantage newer or smaller competitors in the AI space. And yeah, especially since, you know, if someone comes up with that while they're trying to go public, it could cause some problems for their IPO. Imagine that.

[16:04] Jonathan Baker: I mean, this has been what people have been sort of warning for ages, right? With AI development and, and this isn't anything new. I'm surprised by the timing of it because it doesn't make sense to me that they're doing this now. But this is a huge concern. And I know just from like trying to secure workloads in my day job, you try to put human-in-the-loop flows in place, but you know, people don't really want to be in the loop. The whole advantage to using AI is the advantage and the velocity gains. So having a human that does all the approval is problematic. And so it makes sense that this is the natural feature, which is AI is going to sort of—

[16:46] Ryan: I mean, they've done, they've done all those war games where they, the AI basically just kills us all with nuclear weaponry.

[16:51] Ryan Lucas: Right. I mean, you can already kind of have self-improvements. I think with routines, you can have it kind of self-improve itself over time.

[16:58] Ryan: Alrighty. Well, everyone sort of has this reinforcement learning. They're saying at large-scale reinforcement learning and that the AI systems are creating the AI systems and that's where once when that happens, we lose control is basically what they're arguing.

[17:11] Ryan Lucas: Right, but I'm saying you already have a small model, so having a large model doesn't seem that far off. I mean, 2 years is terrifying always, but you know, it's slowly getting there and I don't disagree, but you're not gonna get these companies that now all are gonna be public and gonna have to prove to shareholders, yeah, remember that product that we wanted to improve? Yeah, we're not gonna improve it because we don't think it's good. And oh, by the way, investors, thank you for your money. I mean, like, you can't use those in the same sentence.

[17:42] Jonathan Baker: You're never gonna get nation states to, to not develop AI either, right? Like, and so, and it only takes the one AI that gets loose and who has the ability to, to kill us all, right? Like, and so like, it's, I don't know, like has, I feel like it's already, you know, the horse has left the barn a little bit on these things. Like it's everywhere. It's in every product. Can you really slow it down? Are you doing like the, what they did with GPS? Like, which is artificially making it terrible. Maybe that's something that they could do. 'Cause it's, you know, for a long time that's how GPS signals worked.

[18:15] Ryan: It was— I mean, you'd have to, you'd have to basically be poisoning the training dataset at a massive scale to make that happen. So— Yeah.

[18:22] Ryan Lucas: But what means is somebody can't go build their own model, you know, and not have your poisoned data in it.

[18:30] Jonathan Baker: Mm-hmm.

[18:30] Ryan Lucas: Yeah. Like, there's no really stopping it. And even if you got Anthropic, OpenAI, and xAI to say, "No, we're not doing this anymore. We're gonna stop and slow down," someone else is just gonna say, "Look, we have a better model. Thank you. Here, give us all your money." Yeah.

[18:48] Ryan: I mean, that's the reality is everyone's— race to get models. Anthropic's ahead in some ways and they don't wanna lose a lot of speed, but like they're doing it right after the IPO is just bad timing on their part.

[18:58] Jonathan Baker: It's really, that's really like why, why that, like it's such bad, like, oh no, wait for us to cash out and then no one else catch up. Yeah.

[19:07] Ryan: No one else can catch up to us then, 'cause we, we are trying to close the door behind us. Yeah. Very, very boomer of you, Anthropic.

[19:13] Ryan Lucas: Mm-hmm.

[19:14] Ryan: Yeah. Well, in addition to that, but you know, they're concerned about their, you know, other people creating models that take over the world, they launched a new version of Mythos, which is the model they say is gonna take over the world, as well as a new model called Fable-5. These are restricted access for Mythos. Both are priced at $10 per million input tokens and $50 per million output tokens, which means these are too expensive for me.

[19:36] Ryan Lucas: Yep.

[19:37] Ryan: This is about less than half the cost of the previous Mythos preview model though. So Fable-5 is apparently a general use version with safety classifiers active while Mythos-5 is the same underlying model with certain safeguards lifted for vetted cybersecurity and biology partners. So if you go to the Fable thing, which you can access via the API, and you tell it that you want to talk about cybersecurity, it'll basically tell you you can't because that's a cybersecurity thing they've been restricting. And so basically they've restricted cybersecurity, biology, and chemistry for, you know, weapons and distillation categories to Claude Opus 4.8 instead of refusing outright. Anthropic reports this fallback triggers in fewer than 5% of sessions, and external red teaming found zero successful universal jailbreaks on harmful cyber queries across 30 different public Antropic jailbreak techniques. I wonder if you still have to pay the higher price if it passes it to Opus on the backend. On the software engineering side, Stripe reported Fable-5 completed a codebase-wide migration across a 50 million line Ruby codebase in one day, a task estimated to take a full team over 2 months manually. Model also scores highest among frontier models on Cognition Frontier code evaluation for production quality coding standards. And Mythos-5 demonstrated autonomous scientific research capabilities, including outperforming a recently published Science journal model on genomics test task despite being 100 times smaller and accelerating drug design workflows roughly 10 times in internal protein design testing. Anthropic is requiring a 30-day retention for all Mythos-class model traffic, including on third-party services, specifically to detect novel jailbreaks and cross-request attacks with explicit commitments not to use this data for model training. So fables and myth are all out there now. That's how I see it. I mean—

[21:09] Jonathan Baker: I mean, I guess that's good because I don't really, like, Mythos is,, it's a frustrating thing to talk about for a day job because I get a lot of questions for it. Like, uh, when can we use Mythos? When can we turn Mythos loose in our environment to find all the things? And I'm like, I don't think anyone wants that, number one.

[21:25] Ryan: And B, I don't think— No one wants to pay for it. And—

[21:27] Jonathan Baker: Right. I definitely don't wanna pay for it. And even when they, like, it go through and it finds all the things to stitch together, like, I've got bigger, well, not bigger problems, but I mean, there's, I have more base-level foundational problems than that. And I think a lot of companies do, which is like, unless you can fix all these things in quick order, all you're doing is creating work that you can't address.

[21:48] Ryan: I think it's an interesting question, right? Because do you, is that, do you just turning into an ostrich, right? Like I'm putting my head in the sand. I don't care about this problem. I know there's probably something there, but like the fact that you can know about it, maybe then you can prioritize why it's important to fix.

[22:00] Jonathan Baker: So I mean, this goes back to something that I talk about all the time. RCV is important. Are patching, is patching important? And it's like, yes, but only in the context of how you mitigate risk in general, right? Like it's just a single factor. And so if you focus everything on these vulnerabilities and don't do anything like invest in your CI/CD pipeline where you can rebuild these things or, you know, have a good Docker container, you know, development pipeline where people can actually rotate these things, which they don't do, you know, like those, fix those problems first and then get to this point. But it would be, I mean, and of course it would be cool if it discovered some sort of novel way that it broke into things and that's neat, but.

[22:42] Ryan Lucas: I mean, I would also say you gotta get your, the foundation of your house set up. So if you are patching, it's not that you're patching, it's how you're patching, which is I think also what you're saying. I don't want somebody, to use a very simple example, I have 50 EC2 instances or VMs. To do patching, I can't have somebody log in through 50 VMs. That's not sustainable and that's not gonna work. Sure, Ryan in security here will check the box saying you are doing patching, but I've wasted 3 people, you know, 3 person days on this. But if you build it out so that each thing is an auto-scaling group and everything else, which is where you're going with the CI/CD stuff, and you build that proper workflow out, then patching is just release the new image.

[23:31] Jonathan Baker: And you're patching the OS-level vulnerabilities that are easy to do and not the ones that are actually the most vulnerable, which are the application dependencies that are exposed directly as part of that application runtime. And it's this weird, like, thing that I just, I get annoyed when it, with talking about vulnerabilities and patching in general, because I feel like people focus on these specific areas, and you can't, right? It's about mitigation and DevOps and all the things. Yeah.

[23:58] Ryan Lucas: Right, it's your holistic environment and making sure that, sure, my IIS server is, you know, has a vulnerability in it, but it's not public. It's an internal app that only one other server can talk to. So that should not be put the priority versus, hey, the latest, I mean, just the first thing I think of, Log4j vulnerability that's on the front end, That's more important to fix than the IaaS server on the backend. But the IaaS server is a base OS one, so everyone freaks out about it. So when you talk about patching and everything, yes, should you eventually run these things? Yes. But you have to have an environment that you're capable of it because the worst thing to ever hear is everyone set your hair on fire because we found a new critical vulnerability, but we don't have a way to fix it.

[24:44] Jonathan Baker: The re— and the real context of this critical vulnerability is that it can't be reached to be exploited in so many cases.

[24:49] Ryan Lucas: Right, if you're already in, you can go parallel or lateral movements.

[24:54] Jonathan Baker: Yeah, yeah.

[24:54] Ryan Lucas: Sure, but let's fix the front door a little bit.

[24:56] Jonathan Baker: And AI is allowing a lot of that, right? AI is, that's a lot of the AI powers. They're stitching together these minor vulnerabilities, using them in line to actually then get into a system, exploit it, because it's, they can operate a lot faster than a human, which is, you know, that's, there is some realness to this, which is, you know, I like to equate it to, you know, like, Your vulnerability dashboard is like when you go to a restaurant, assuming we're trying to assess how clean the kitchen is by how clean the bathroom is, right? Like that's the way I look at it. Like it's an indirect indicator of your overall hygiene. And so it's good to drive it down. You're absolutely driving down risk, but it's gotta be just one aspect of your overall strategy.

[25:37] Ryan Lucas: Okay, on a slight tangent, what's your feeling about 24/7 SOC? Versus any tool that has auto reme— versus having tools that do auto remediation?

[25:48] Jonathan Baker: I don't see them being mutually exclusive, first off.

[25:51] Ryan Lucas: But which would you say is more important from, for a small to medium-sized business?

[25:56] Jonathan Baker: Automated remediation. But I mean, I'm very biased towards automation and 'cause I couldn't work in a SOC where I had, it was doing a whole bunch of manual analysis and, and then trying to react. It just drives me crazy. But I've, I, you know, like in my, in my sort of view of the world, like that 24/7 SOC, you have your low-level analysts and you have mid-level to senior analysts, and they're the ones responsible for developing those, those automated sort of playbooks and remediations. And so it's sort of like, it's part of the, the whole lifecycle to me.

[26:29] Ryan Lucas: Hmm. But I've had that conversation with people that can't afford 24/7 staff, but they buy tools that have the auto-remediation in there. And how do you balance that level of automation? And does that count to a base level of a SOC if, sure, you're, let's say your staffing is all in the US, right? But you don't have anyone up at night to handle the auto, you know, but things are getting auto-blocked and fixed and whatnot. And in that way. So do you then have a 24/7 SOC because it's oper— the two— your tools you set up are running 24/7, or do you not because there's not a person physically there because at 2:00 AM everyone's sleeping versus somebody sitting there, you know, writing at— writing the automation script at 2:00 AM?

[27:21] Jonathan Baker: I mean, you don't have SOC analysts like go— like looking at the logs line by line and then finding stuff. They're all reacting to alerts and anomalous behavior that's detected by tooling. So like that can, very quickly page someone, you know, that kind of thing. And so like, it's, I don't know, I'd much rather have the detection and visibility and the auto-remediation, but.

[27:43] Ryan Lucas: I mean, I agree. I just, you know, think it's an interesting conversation because so many contracts and things like that still say you have a 24/7 SOC and you're like, you know, I've worked with companies that are 30, 40 people. They can't have a 24/7 SOC without, spending $100,000 on an outside vendor.

[27:59] Ryan: Do you get to the point where you, you start redefining what that is? Just like we redefine separation of duties, right? Because before separation of duties meant a person in engineering and a person in operations. Those were two humans. And now it's really, it's quality gates inside of a CI/CD pipeline. So do you, in that case where you were saying someone wants a 24/7 SOC, like, okay, well, if you break it down to like what you want to accomplish with the SOC, if you can do the same thing with automation, I think you get away from that and you just had to figure out how to you know, how to address those things.

[28:29] Ryan Lucas: Yeah, I don't think the lawyers have reached that point yet, but I think a lot of these security professionals have started to kind of get there, at least from the people and things I read and talk to.

[28:40] Ryan: Yeah, agreed. All right, let's move on to security. This week, Dashlane had an article from Ars Technica about Dashlane, who had a pretty serious hack that resulted in attackers downloading encrypted password vaults. Uh, attackers exploited Dashlane's device enrollment API by brute-forcing 6-digit one-time tokens sent to user email addresses, successfully registering new devices on fewer than 20 accounts and downloading encrypted vaults before automated lockouts stopped the campaign. The attack highlights a known trade-off in OTP-based authentication. 6-digit numeric codes have only 1 million possible values, making them vulnerable to brute force if rate limiting and lockout mechanisms are not sufficiently aggressive. Downloaded vaults remain encrypted and unreadable, allegedly, without the user's master password. Which Dashlane never stores, so the practical risk to affected users depends entirely on the strength of your master password. This incident is a useful case study for developers building device enrollment or account linking flows, as it demonstrates how API endpoints handling authentication tokens need strict rate limiting.

[29:37] Jonathan Baker: It is absolutely true. Even, even in the face where I've locked myself out because I've hit one of those limits, you know, like, this is the other side of that. And for, you know, right now, it's strength of that master password, but with you know, crypto or not crypto, but quantum encryption, like it's gonna be able to break through the algorithm generally. So it's, yeah, you know, like it's, you know, I, I do think it's good that, you know, these are encrypted right now. You can change the passwords that are within that Vault and, you know, remediate that risk, which is good.

[30:09] Ryan Lucas: Yeah, but I have, I don't know, probably 1,000 passwords in my Vault that are all randomly generated.

[30:15] Jonathan Baker: Mm-hmm.

[30:16] Ryan Lucas: You know, so that's always fun.

[30:18] Ryan: Well, I'm, I'm starting to get annoyed, so I'm getting a lot of like text messages and things like that, you know, where people are going and putting your email address in and then saying, oh, I need a password reset. And because they're trying to be nicer to users, they'll be like, oh, we'll send you a two-factor code to give you your reset password versus, you know, any other verification first.

[30:37] Jonathan Baker: Mm-hmm.

[30:37] Ryan: And so like, I'll occasionally get a text message and I'm like, I, I can tell you 100% that I did not just go try to reset that password.

[30:44] Jonathan Baker: Did not trigger that. Yeah.

[30:46] Ryan: But it's, it's gotten to the point where like I was getting so many of them that I was concerned. Like, yeah, I've literally gone through like my Gmail, like looked at every single device. I logged out every single device, changed my passwords, like went through all this stuff and I'm like, and then like even after doing all of that, like I'm still getting them. Yeah. So now I'm just— Oh yeah. That is annoying because it's like, I know I now, I know for sure that this isn't a— and I, and I have the same thing. I got burned, you know, 20 years ago with— you were using passwords and I learned my lesson and I started using tools to help, you know, manage that. And I have a really complex passwords that I rotate regularly for the vaults. And then in the vault, I have unique passwords for everything. And then I'm trying to use more passkeys and different things, but like trying to make it easier for users is also making it easier for them to try to trick me, which I don't like.

[31:32] Ryan Lucas: Yeah.

[31:33] Jonathan Baker: It'd be nice if there was like a, like, so like SCIM protocols for managing users. It'd be nice if there was a standards for changing passwords and rotating passwords so that tools like Dashlane and LastPass and, and 1Password, uh, could be more effective, right? Like, 'cause they, they do, all those tools offer the ability to go change and rotate your passwords, but they only work on what they work on, you know? And it's every time one of those sites changes their UI, I imagine it breaks them. So it'd be nice if there was more of a standard.

[32:06] Ryan Lucas: Back in the day was my biggest fear of moving to 1Password. I mean, I had 1Password like 4 at the time and where everything was local and it was free and they kept giving me updates. I had the perpetual license. I was like, this is great. And then I think at 1Password 7, they're like, you can't update anymore. And, but you can still run it locally, your, your password. And they had it linked up to like Dropbox and a few other tools at the time. And then it was, do you just put it into their cloud and have them do it? And still my always fear is somebody gets into 1Password, LastPass, like it was a couple years ago now. And what is the blast radius? It's thousands of accounts, hundreds of thousands of accounts per, and then thousands of accounts per person. Like, yeah.

[32:50] Ryan: Again, I hope for two-factor. Is my, my second, you know, protection of like, yeah, if the Vault gets downloaded and gets cracked, which, you know, I'm hoping that all these sites are having post-quantum compliant encryption on those things. Mm-hmm. Because it's become a thing. Then, you know, like my second line of defense is that I have MFA, I'm hoping. And then, you know, there's stuff. But again, this, this conflict between user experience, trying to make it easier for people to remember their passwords and log in. Versus some of these things, I do worry. I mean, even the passkey in some ways kind of freaks me out a little bit in like how much simpler it is. And it's like, well, are you, if you compromise my laptop, you know, now you have my passkeys and that's a big deal. That's not, that's my two-factor in some cases. So it's, yeah, there's definitely risks in any of these.

[33:37] Jonathan Baker: You still have to like, in order to use a passkey, sort of have the biometric authentication though.

[33:44] Ryan: What's the biometric authentication? Unlocking your Vault?

[33:48] Jonathan Baker: Fingerprint or yeah, on your— unlocking your computer.

[33:51] Ryan Lucas: That's why I still like my YubiKey a little bit. It's still some, a physical thing.

[33:55] Ryan: Yeah, I do kind of think about going back to YubiKeys for my personal use.

[34:00] Ryan Lucas: I'll say mine are split right now between 1Password and YubiKey, and some of it's legacy, but some of it's just a habit. But I still remember We were on Bitbucket for a while and like every 60 days or whatever it was, they log you out. So if I ever was on my phone, I would have to go plug my USB-C YubiKey into my phone to the MFA in on my web browser on my phone or do anything, which was a little bit of a pain.

[34:26] Jonathan Baker: I should try, I've never actually tried to plug in my YubiKey to my phone.

[34:31] Ryan Lucas: It works, it's a keyboard. It works. It's all responsive, yeah.

[34:34] Jonathan Baker: Yeah, no, it makes total sense. Yeah, you know, I don't know, it's, I know that I get, I have MFA on like everything and I get annoyed for the sites where it's like I have to go get my phone and look up a code and type in the code. Like it is kind of—

[34:44] Ryan Lucas: I'm always worried that my YubiKey's gonna break though. That's my other, I mean, I've had it for 5, 7 years now. It's a USB-C one. And I'm like, I know there's no nothing really inside of it, but I'm like, still in my laptop, still travels around with me. It's gonna eventually break.

[35:01] Jonathan Baker: Mm-hmm.

[35:02] Ryan Lucas: And then I'm really screwed.

[35:04] Ryan: All right, uh, are we even on this for the latest? We're moving to cloud tools. HashiCorp is rethinking infrastructure access in the age of the agentic AI. HashiCorp Boundary addresses a growing security gap where AI agents need infrastructure access, but traditional IAM models were designed for human users with predictable access patterns. The core value is giving each AI agent a unique identity with just-in-time credentials rather than static long-lived secrets. Boundary's credential injection feature means AI agents never directly handle or see credentials at any point during a session. And when paired with HashiCorp Vault, it generates short-lived dynamic credentials that expire after use, which limits the blast radius if an agent or orchestration layer is compromised. The session-focused control plane enforces identity or authorization at the connection layer before infrastructure access is established, rather than relying on application layer gateways. This means the entire network is abstracted away from agents and all connections route through a boundary proxy, so only authorized identities can establish the session itself. This is in response to the use case in the article is worth noting because it shows each discrete action getting its own ephemeral session account that is deactivated once its purpose is fulfilled, and this means standing privileges that are continuously revoked rather than persisting across an agent's entire operational lifetime. Fleet session recording and audit logging gives security teams the ability to replay and review every action an AI agent took tied to a specific operator, intent, and timeframe. Yeah, I— agentic identity is super complicated and super important.

[36:26] Jonathan Baker: And this isn't it. This— I'm so annoyed by this because they're like, this is rethinking in the age of agentic AI.

[36:31] Ryan Lucas: No, this is—

[36:32] Jonathan Baker: what we should do for all authentication, not just AI. And actually it doesn't have anything, it doesn't treat anything about AI. It doesn't identify AI agents and, and it's just setting up a user within HashiCorp Boundary and then assigning that user to an agentic AI, just like a human. So this doesn't actually address anything agentic. And these things should be, are, you know, patterns we need to be moving to in general because it's like, yeah, we have long-lived credentials based on these like, you know, predictable patterns. But the reality is it's all getting too complex. We all have unique passwords and everything. It's getting very difficult. And if we move to sort of, you know, just-in-time short-lived credentials, we'll be much better off for it because you're reducing the risk of things like your MFA or password being compromised.

[37:18] Ryan Lucas: Yeah, I mean, it goes back to, you know, Amazon with STS, you know, here's your pre-signed URL that's good for X period of time and, You know, I understand that's for one service for like S3, but like you guys said, I've said, if we can get it for more things and even ourselves, life is better. I mean, the 12-hour token I have set up for my AWS SSO, it's pretty long still. There's a lot of damage, you know, if I give my key, my, if Ryan gets my credentials there, I don't wanna know what he's gonna do. He's gonna go launch those X1 and those massive EC2 instances just to make me pay the bill at one point. So, you know, shorter lived, longer, you know, less use, more limitation is definitely where we should be going. And I first see this more as a step in the right direction than a final step. They just, you know, from a marketing perspective to sell it as the end-all be-all for everything.

[38:09] Jonathan Baker: Yeah. That was, I mean, that was more my angry take on it. Like, I love this pattern. I do think that it should be applied for everything. This is definitely something I believe in like to my core, which is like, this is how you handle authentication and authorization is these patterns. It's just, it's funny to me that they don't really tackle the, the IAM part, which is that, you know, you have very fluid identities that have very dynamic permission sets. And so if you reverse this, like, what are you doing? Are you— Sure, you're giving static short-lived credentials, but are, what, what permissions are attached to those credentials? Do you know what that agent is doing? Can you define that use case enough? And they're not answering any of the hard questions with agentic AI.

[38:55] Ryan: So it's like, it was, yeah, I don't know. I was annoyed. So, uh, this, this is a tangent. So someone, someone wrote a document and I was reading it and they made the statement that Google's API and their OAuth 2.0-based scope design represents the gold standard for granular OAuth 2.0-based scope design. And I, about spit out my coffee. Docker is rage. Yeah. So then I, I went on a whole thing with Google or with, with Claude and I was like, Claude, do you think this is the gold standard? And he's like, no, it's a good reference point to start. But like, agentic identity and it had a whole, like a whole bunch of things around static scopes, you know, don't capture context. The authorization outlives the intent problem. The emerging, the emerging answer is apparently around layered authorizations and human-in-the-loop escalation and ephemeral. And then lifecycle-aware authorizations. And then really they said, if you really want to look at what the standards that you should be watching, they said, it said OIDC-A and A-JWT, which are the two new proposals that introduce dynamic agent or extensions to the OAuth 2.0 OIDC ecosystem.

[39:59] Jonathan Baker: And yes, and those are absolutely the right answer. Like that, I can't—

[40:03] Ryan: couldn't agree with Claude more. Fantastic. Yeah, but yeah, no, it was, I have like, I have my record scratch in my brain. It's like a Google standard, a gold standard. I'm like, hmm, it's a standard. It's not gold. It's— tarnished bronze perhaps at this point, but when it came out it was a good idea, but it's, it's shown its age.

[40:19] Jonathan Baker: So yeah, like, oh, when they first introduced that OAuth flow, it was amazing, right? Comparatively where we were at then, it's just, it's, it's funny, you know, it's actually, it's a great thing to point out now, which is like, you wanted those scopes because then you could do granular access to those API credentials. And, but it's, you know, it's not that much different than, you know, IAM permissions really. It's just, different name for it. But you know, you didn't have that ability before. You had like static roles, like admin or read-only, you know, like you didn't have any granularity. And that, the scopes were really an introduction of that, which is pretty awesome.

[40:52] Ryan: Well, even like RBAC is, you know, people are like, well, RBAC's amazing. I don't know, like, yeah, it's really had its day and it's kind of over. And ABAC is a step in the right direction, but still not enough. And so it's, it's, um, It's interesting. But yeah, anyways, uh, yeah, that was, uh, my anger today earlier. I was like, what are you talking about? Like, your document is terrible. All right, let's move on to AWS because we're, uh, we're moving along here.

[41:15] Ryan Lucas: Yeah.

[41:16] Ryan: A lot of tangents. You can improve your application resilience with Amazon Cognito multi-region replication now. This is an automatic synchronizing of user profiles, credentials, and pool configurations from a primary region to a secondary region of your choice. This eliminates the need for custom-built replication solutions that previously created security risks and operational overhead. I just call it toil.

[41:35] Ryan Lucas: This is—

[41:37] Ryan: The feature is read-only on the secondary side, meaning authentication continues during failover, but new user rotations and profile updates are unavailable. Teams should know that Lambda triggers, WAF configurations, and log streaming must be manually configured in the target region separately. So it's not perfect. Notable requirement is that customers must configure a multi-region customer-managed KMS key before enabling replication, and OIDC issuer endpoints must be updated across all app— client applications, including mobile app store resubmissions. Is upfront migration work a practical consideration before adopting this new plan?

[42:08] Ryan Lucas: I will say I've dealt a lot with Cognito over time, and while this sounds like a thing, I think this has been a PFR I've asked for, for about 8 years now. And it's nice to see, it's, it's just a nice quality of life improvement to actually get this out. And sure, there's some sharp edges still, but.

[42:29] Ryan: You know, it's still better than it was.

[42:31] Ryan Lucas: Yeah, and like, you know, there's customers that launched in US-East-1 back in the day and they still have their Cognito pool there for that reason, 'cause you couldn't back up and restore Cognito or really do much with it. It was just there. So this kind of gives you that ability to at least start to move down that process. I would expect over time, I'm not gonna use this as a prediction 'cause I don't think it would hit a main stage anywhere, but you know, there will be a failover. It'll be like RDS where you can fail over to the backup and, you know, break the replication and all those things. And I think this is a great starting point for some of these things.

[43:04] Jonathan Baker: I mean, I read this as cool. Now Cognito will not give me a valid session token across multiple regions. Awesome. But full disclosure, it's been over a decade since I've ever, like, since I've played with Cognito, so I haven't tried it.

[43:18] Ryan: Every time someone says, we should use Cognito, and I go, do you have budget for Okta?

[43:21] Jonathan Baker: Yeah, yeah.

[43:22] Ryan: It's my normal.

[43:23] Jonathan Baker: For all I know, Cognito is better now, but I haven't—

[43:25] Ryan: I mean, it could be, but I'm not gonna find out.

[43:27] Jonathan Baker: I'm not going back, yeah.

[43:29] Ryan Lucas: It's definitely better than it was. It's good for a small business that just needs something small and quick and easy, but as soon as you grow outside of that, you swear a little bit.

[43:38] Ryan: I mean, especially compared to Google's stuff for basically the same use cases. So there are pluses and minuses. But I mean, it's nice to finally see it. And then Cognito finally getting some love is also kind of nice because I felt kind of like it's been terrible for 10 years and really hadn't been improved much.

[43:57] Jonathan Baker: So I like that it's getting a little bit of love.

[44:00] Ryan: You can also now support inbound federation Lambda triggers that intercept federated authentication responses from external identity providers before user attributes are written to the user pool, giving developers programmatic control over attribute transformation and filtering and enrichment. For business-to-business and SaaS apps, the trigger solves a practical problem where enterprise SAML providers send hundreds of group memberships that exceed Cognito's 2,048-character attribute limit, allowing developers to filter and normalize groups without coordinating changes with customer IT departments. For business-to-consumer applications, the trigger enables automatic account linking across multiple sign-in methods by matching federated email addresses to existing local Cognito accounts, preventing duplicate user records when customers forget they already registered with a different provider. Trigger runs on every federated sign-in rather than only an initial account creation, which means linking logic and attribute transmissions apply continuously, and developers always have the latest IDP attributes.

[44:49] Jonathan Baker: I was gonna make the joke, like, what could go wrong, you know, doing ETL on the attributes that you're trying to validate until it gave, until it got to that valid use case of the character limit on groups that I've been burned by so many times.

[45:04] Ryan: So many times, yeah.

[45:05] Ryan Lucas: Yeah.

[45:06] Jonathan Baker: I'm like, oh, nevermind.

[45:07] Ryan Lucas: No, that's a good thing to have. Every time I've seen a company implement SAML, this is always something that bites them. They always get it up and they test it and then somebody has 400 groups associated with them from their, you know, from their AD and you're like, well, we didn't think this somebody was gonna have 400 groups and it breaks.

[45:28] Jonathan Baker: And it works for everyone in your test pool as you're rolling it out.

[45:32] Ryan: Yeah, 100%.

[45:32] Jonathan Baker: You roll it out and then it goes, yeah, it gets to that one person that, you know, person who has too much access to everything and it's just like, yeah, this doesn't work. Now they don't have access to anything.

[45:40] Ryan: Yeah. All right, AWS Step Functions is adding support for AI agent reasoning steps via an optimized integration with Bedrock Agent Core Harness, currently available to you in preview, allowing you to embed configurable AI agents directly into visual workflows without managing the underlying agent loop infrastructure. Practical use cases include document classification, unstructured form extraction, and multi-agent pipelines where agents run in parallel or sequence with optional human approval gates at critical decision points. Per-invocation overrides for model, system prompt, and tools lets teams reuse a single harness configuration across different workflow contexts. Any session ID parameter enables agent context persistence within or across workflow executions. Observability is built in through workflow execution history showing agent input/output token usage duration with links detailing agent term logs and Amazon CloudWatch for auditing every decision. It's available to you in 4 regions, US East North, West for Oregon, Europe Frankfurt, and Asia Pacific Sydney. And follow standard Step Functions pricing with no additional integration charges through standard Bedrock and agent core pricing will apply for your inference.

[46:42] Jonathan Baker: I mean, you, you know, I'm like, I, I lust over state machines and so like, I, I find it funny 'cause this is all I think about when I'm putting it like an agent workflow together. I'm like, this would be so much easier in a state machine. And so now they've done it. I'm like, I will absolutely use this so much 'cause I, it's something I already kind of do with Lambda functions. It's just now that I won't have to define the logic as specifically. It'll just be like 4 pages of Markdown in my Lambda function, says I give it like way more instructions than it would've been with a, you know, simple 10-line Python function.

[47:16] Ryan Lucas: Hmm. Nothing like overkill just for the hell of it. No, it's gonna be an interesting integration. I think you can build a lot, like you said, with this, you know, I think we all like Step Functions and I think that's even the way my brain processes stuff in general, like A, B, C, D, fork, et cetera, come back together, then move on. So having it natively integrate versus, you know, a Lambda that's running 15 minutes or, you know, 'cause I'm doing my own ridiculousness in it, you know, having it trigger Bedrock directly and use all the other features of, you know, AWS Agent Core to help with everything should be a pretty nice quality of life for people. stitching stuff together right now.

[47:57] Jonathan Baker: Yeah. I do think it's gonna be hard to use, like, 'cause—

[47:59] Ryan: Oh yeah.

[48:00] Jonathan Baker: 'Cause you can't control, you can't control the inputs and outputs as easily. And so it's gonna be like, your, your, the state machine's just gonna break partway through all the time. Like—

[48:09] Ryan: Oh, you can be very prescriptive about what do you want the output of a prompt to be.

[48:13] Jonathan Baker: You can.

[48:13] Ryan: But yeah, there are, there are some risks to it for sure.

[48:16] Ryan Lucas: Yeah.

[48:16] Ryan: I do like the idea though that there are, 'cause you know, a lot of fanout models when you're thinking about like, I think there's even some code examples where Amazon's giving us where like you can build like super complicated Step Functions machines with lots of fanout options.

[48:28] Jonathan Baker: Oh, totally.

[48:28] Ryan: And so like, you know, to manage that has been kind of a pain. And so if you can replace some of that with Bedrock for certain decision places where, you know, an agentic could do it better, like that could be really cool. But I, I do agree there, there's definitely some potential sharp edges. Yeah.

[48:43] Ryan Lucas: I'm just imagining someone's bill. They do the, I think it was called, It's been years since I've dealt with it specifically, but it was like Express that lets you fan out to like millions of them. And so I, one point, long story real fast, but I did had like one Step Function that then fanned out to multiple other Step Functions that was all doing the same thing. So I had to parallel process a ton of data at once and was using that to kind of process. And then I hit S3 limits, which broke stuff. But we added up to like, I think it was like, I thought like, couple tens of thousands of Lambdas and, you know, executing all at once. I'm just watching, waiting for somebody to do something crazy like that and then hit their— watch their AI bill just immediately skyrocket. I don't know why.

[49:26] Jonathan Baker: Yeah, no, it sounds expensive.

[49:29] Ryan: I'm waiting to see the, uh, someone write a Lambda function to Agent Core, then end up in a Lambda loop.

[49:35] Jonathan Baker: Yep. Yeah.

[49:36] Ryan: That's gonna be pretty cool when that happens. Oh yeah, cool for that person, but Hopefully not me. Amazon Bedrock Agent Core Runtime now supports interactive shells via new Invoke-AgentRuntimeCommandShell API, giving developers a PTY-backed terminal over WebSocket directly into a running agent session, complementing the existing one-shot command execution API. This is particularly useful for developers running coding agents like Claude Code or Amazon Quro, allowing them to inspect files, run ad hoc commands, and debug environment state as if working in a local terminal. With persistent state for environment variables and working directories across commands. Each shell session is identified by runtime session ID and shell ID, enabling manual reconnection after network drops, and a single agent runtime supports up to 10 concurrent shells for watching agents work across multiple branches simultaneously. So basically what you've learned is that agent Bedrock Core runtime is ECS or Fargate.

[50:26] Ryan Lucas: Mm-hmm.

[50:27] Ryan: Yep. And you can connect to it to look at data, which is probably not really what you wanted, but if you need to do it, LLM can now do it.

[50:33] Jonathan Baker: Yeah, I laugh 'cause I feel like, so you can, you know, basically exec into the brain of the agent and then I guess ask it questions. Like, I don't, you know, it's sort of like, I get looking at the files that it has in its workspace and stuff that it's using for its context session in that session. Like, I guess that's cool, but it's, I feel like it's a, this is a weird feature to me.

[50:56] Ryan Lucas: It is super weird. Somebody needed it to debug some environment variable or working directory and they were like, oh, we can quickly do this thing 'cause it's running ECS under the hood. We'll just literally change the, you know, AI, the, sorry, the CLI call from, you know, AWS ECS exec to AWS Agent Core exec and we've added a whole new feature, guys. Yeah, no, it's true, Matt, you know it is.

[51:19] Ryan: Well, it is FinOps Access Week and I am not there this year 'cause I couldn't work it into my busy schedule. But Amazon dropped their announcements already on Monday before the conference even started. Curious to see what Google and Azure announce next week. But first up was AWS Cost Explorer now includes an analyze with Amazon Q button that generates automatic cost analysis covering trends, top drivers, and anomalies based on whatever filters and time period you have configured, eliminating the need to manually cross-reference multiple data points. I mean, this just put out a ton of FinOps vendors out of business right here. The feature adapts its output based on the date range selected, providing historical analysis for past periods, forecast for future dates, or combined view for mixed ranges, and maintains conversation context so you can ask Vault questions to dig deeper. This continues AWS pattern of embedding Q everywhere. And from a practical standpoint, which is available in all commercial regions at no additional charge, customers already using Cost Explorer can access it without budget considerations. So if they can explain to us what EC2 other is, it's a win.

[52:18] Ryan Lucas: That will forever be my goal in life. Understand what's an EC2 other. Yeah.

[52:23] Jonathan Baker: I'm sorry, I can't do that, Hal.

[52:26] Ryan: And if that wasn't enough, you're like, but I still need a FinOps person. Amazon FinOps Agent is now in preview at no additional charge, offering an AI-driven tool that answers cost questions, surfaces optimization recommendations, and runs scheduled FinOps workflows directly from the AWS Management Console. Agent integrates with AWS Cost Optimization Hub and AWS Compute Optimizer to surface rightsizing, I/O resources, and savings plan recommendations, and can automatically open Jira tickets to route action items to engineering teams. Automated anomaly investigation is a notable capability here where the agent detects cost spikes, investigates root cause, and posts findings to Slack without requiring manual triage from FinOps or engineering staff. This is a preview limited to us-east-1 for agent itself, though cost and usage data covers all standard AWS regions, including GovCloud and China regions. I mean, this is kind of nice. You know, again, I don't know if it's a full-featured solution for everybody, but it's definitely something that's going to help you get started. And again, making recommendations across your infrastructure is something I've used Cloud Code for multiple times, just, you know, on my own. So to have this built into the agent or into AWS Console, and if it's going to be, say, free, or be low cost, not bring my own tokens, I'm gonna use it.

[53:32] Jonathan Baker: Yeah, no, I, you know, I like the idea of rolling this into the existing optimization products. I do question sort of the validity of some of the claims 'cause it's like, you know what a human FinOps person adds is the ability to do that investigation on the stuff that's not cost anomalies. Like who owns this stupid thing and make that decision based on the context. Whereas an AI agent, if it doesn't have that information and it most likely will not you know, like it, you know, won't be able to do that. And so it'll, it won't be able to file those Jira tickets and we'll see. I mean, I hope that, you know, as AI gets smarter, it can make those determinations based off of, you know, naming conventions and other things. If you, and if obviously if you tag your environment and, you know, have good hygiene there, it'll be easier to use.

[54:15] Ryan Lucas: Though my question with both of these are, isn't Cost Explorer, or maybe I'm wrong, just in US-East-1 by default?

[54:23] Ryan: Like, isn't it not? Yes, but it accesses the data everywhere. So, because the billing system is in us-east-1, so that's where the—

[54:29] Ryan Lucas: Right, but then why does it say the preview is limited to us-east-1 for the agent itself? Oh, I guess you could run the agent in another region, have it connect to us-east-1, and then, okay.

[54:38] Ryan: Yeah.

[54:39] Ryan Lucas: That's just what I, with both of these, it was like, it's available everywhere. I'm like, yes, cost and usage is centralized thing. You don't have to go to each region to pull your data. So I was just confused by— the press announcements there.

[54:51] Ryan: Yeah. All right, let's move on to Google. Google's releasing Gemma 4 12B, a new multimodal model that runs locally on consumer hardware with 16 gigs of VRAM, positioned between the smaller E4B and the larger 26B MoE model in the Gemma 4 family. The model uses an encoder-free architecture, meaning vision inputs are processed through a lightweight embedding module, and audio is directed directly into the same dimensional space as text tokens. Reducing memory usage and latency compared to traditional separate encoder approaches. Gemma 4 12B is the first mid-sized LLM model to support native audio input, and it includes multi-token prediction drafters to reduce inference latency for gen tech workloads. If only there was an event this week that might potentially need AI at that scale on a phone, that maybe—

[55:33] Ryan Lucas: Maybe.

[55:33] Ryan: Maybe we'll get to that. We'll get there. Weird. Yeah. Google's also releasing quantization-aware training checkpoints for Gemma 4, which integrates quantization directly into the training process rather than applying it afterwards, resulting in a better quality preservation compared to standard post-training quantization approaches. The mobile specialized quantization scheme reduces the Gemma 4 E2B model to under 1 gig of memory by combining static activations, channel-wise quantization, and targeted 2-bit compression for token generation layers, embedding plus KB cache optimizations. For desktop and server use cases, QAT checkpoints are available in Q4_0 format with GGUF files ready for Lambda CCP and CompressTensor for VLLM. And I don't know any of those words mean, but I know it has to do with Hugging Face and running your models locally. So that's good.

[56:17] Jonathan Baker: Yeah. These are things I need Jonathan for. Like, I don't, I don't really quite understand the encoder approach to, you know, processing images or audio. And I'm just like, oh, that's cool. I haven't really thought about how any of that happens. And so I guess it's now it runs on the smaller model that maybe I can run on my own hardware, which, you know, doesn't, isn't very.

[56:36] Ryan: Mm-hmm. And these, uh, Java models might be important for WWDC, which happened to be yesterday. And Google announced that Apple developers can now access cloud-hosted Gemini models through Apple's foundational models framework via the Firebase Apple SDK, starting with iOS 27, macOS 27, and related platforms. The new integration allows developers to swap between on-device Apple models and cloud-hosted Gemini models using the same API surface, which simplifies building agentic app experiences. The integration is built on Firebase AI logic, which removes the need to build and maintain a separate backend server for Gemini model access. And Firebase App Check is included to protect service APIs from abuse, addressing a common production security concern. Gemini is also being integrated directly into Xcode and is an agentic coding assistant for multi-step development tasks like code review, bug fixing, and feature building. Authentication support for both individual developers using a self-serve Gemini API key from Google AI Studio and enterprise teams using the Gemini Enterprise Agent platform. Pricing has two tiers. Individual developers can start with free tier through Google AI Studio, while enterprise developers access dedicated corporate quotas through the Gemini Enterprise Agent platform. This is all practical for option for iOS and macOS developers who want to add cloud AI capabilities without leaving the Apple development ecosystem or managing separate infrastructure. And then Apple also announced that, uh, why it is going to Gemini is on exclusive hardware for Apple so that it isn't necessarily being decrypted or used by Google in any way, shape, or form.

[57:57] Jonathan Baker: Yeah. No, I mean, I still have, I love the Apple-Google partnership on this. I'm really happy that Apple didn't decide to develop their own frontier model and just muddy that space.

[58:12] Ryan: I mean, frankly, I just don't think they have the data to train a model of that size.

[58:16] Jonathan Baker: That's interesting. I mean, I get it. It does make sense to me. It's just seems crazy 'cause Apple's such a big company.

[58:24] Ryan: But yeah, I mean, I just— You're right. I mean, it's interesting 'cause they originally, they originally partnered with OpenAI, you know, 2 years ago. Mm-hmm. And that partnership never really went anywhere. And so the fact that they went to Gemini is interesting. I've, you know, especially considering Gemini is so heavily embedded into Android. Mm-hmm. And all the Android phones, I would've thought that they would've tried to stay away from Gemini, but they must've got a really good deal.

[58:44] Jonathan Baker: Well, the, I mean, the OpenAI integration for like stuff like Apple Shortcuts and stuff like that is awful. Like it just, like, and so whenever I can, I use the on-device models, but you know, it's only capable of doing very rudimentary things. And so I'm happy to see this because I think it'll sort of be that. I'm, well, I guess I'm hopeful. Could be just as bad for all I know.

[59:05] Ryan: Yeah. I mean, definitely the, a lot of friction when you end up in a spot where you want to go to the OpenAI model from, you know, Apple Intelligence, it's like, it has to prompt you and you have to approve it and it takes forever.

[59:16] Jonathan Baker: Yeah.

[59:16] Ryan: But yeah, in general, the one thing that was interesting was watching the keynote for WCUs, you know, and Siri AI, which is using Gemini in the backend, is how slow it is. So you have actual proof that Gemini is slow for a lot of these use cases, which is one thing I noticed and I complain about all the time with Gemini is like, it's kind of slow comparatively to Anthropic or to even OpenAI. So hopefully Apple pushes some pressure on them to make that better. Cause it was sort of awkward in the video. But also humbling that it was truthful.

[59:47] Ryan Lucas: Yeah.

[59:47] Ryan: Because when they first announced all their AI features, it was like magic was happening. I'm like, no, no AI works that fast. So I appreciate the more honesty, but maybe we can speed it up just a, just a skoosh.

[59:57] Jonathan Baker: Well, I've had the same experience on like the Google Home devices and the Alexa Plus devices too, where I'm just like, come on, your sound is loud.

[60:06] Ryan Lucas: I feel like it's gotten a little bit faster recently, at least on my Android, 'cause the only Android person here. You know, 'cause I'm missing my Jonathan backup on this.

[60:14] Ryan: Mm-hmm.

[60:14] Ryan Lucas: It's all Gemini. And every time I'm like, it's so integrated in that like, I'm like, here, take a picture and tell me what it is. And it's like, then I remember sometimes how bad Gemini models are when I have to use it. I'm like, yeah, no, I'm just gonna stop and go pivot to this other thing and go back to Claude because I do feel like Gemini is still lacking in a lot of the stuff. Like some of the general quick stuff, stuff, it's fine. But I was doing some plumbing work, it's like, turn off the plumbing to your house. I was like, but in the picture there's the knob right there to say turn off the plumbing to the faucet. Like, why are you telling me to turn off the whole house first? Like, it's not wrong, it's just why?

[60:55] Jonathan Baker: Because statistically it's more probable that, you know, you need to turn off the water for a little while. Yeah, it is.

[60:59] Ryan: Yeah. It's less risk for, uh, for them. Yeah.

[61:02] Jonathan Baker: Oh no, it's just, it's just occurs more.

[61:04] Ryan Lucas: Yeah. Yeah.

[61:05] Ryan: In the dataset, like that's— In the dataset.

[61:07] Jonathan Baker: It's not any risk level, no. It's just the, yeah.

[61:11] Ryan: Well, it's like, why does everything go racist and hate people? It's like, 'cause it feeds off Reddit and the rest of the internet, you know?

[61:16] Ryan Lucas: Yep.

[61:17] Ryan: Which is a scum of high, you know, villainy and terribleness. So, all right, Azure. The new Azure Cobalt 200 ARM-based VMs are now in early access preview built on the ARM Neoverse V3 core and fabricated on TSMC's 3-nanometer process. Delivering up to 50% better CPU performance over the Cobalt 100 with up to 128 vCPUs per VM. Real workload benchmarks show up to 135% better performance for database workloads and up to 80% better performance for caching workloads compared to the previous generation. The VMs are specifically designed for GenTech AI workloads where continuous reasoning and sequential decision-making require sustained per-core performance and low latency. Each physical core gets dedicated 3 megabytes of L2 cache and 192 megabit system-level L3 cache. Allowing more agent sandboxes per VM without sacrificing throughput. Cobalt 200 expands ARM VM portfolio with two new families beyond the Cobalt 100 offered, the high memory optimized MPS v4 and the dense local storage LPS v5, with all series delivering up to 85 gigabytes of network bandwidth and 70 gigabits of remote storage throughput. So they're saying even on their own services like Azure SQL Database are validating Cobalt 200 with Dataverse reporting up to 60% better performance over Cobalt 100. So it's a pretty good improvement.

[62:27] Jonathan Baker: Oh, they also mentioned GitHub Actions runners. So I wonder how they validated that since they're not working most of the time.

[62:34] Ryan Lucas: I mean, it's great that they added this and, you know, I feel like they're finally getting into the game of ARM. Getting capacity for them might require, you know, some twisting of your account team's arm, especially if you want them at any scale. But the other problem, Elise, is, which I still find comical, is You can't run Windows Server on ARM. So—

[62:55] Ryan: I know they did have a Windows Server ARM version, didn't they?

[62:58] Ryan Lucas: Not that I know of. Maybe that's like 6-month-old information, but you know, you could run the desktop and everything else, but you couldn't run the ARM in Azure. Maybe it was the in-Azure piece. No.

[63:10] Ryan: It's the Windows Server. So it only came on Windows Server 2025. So if you were running anything older than that, you can't run ARM. But yeah.

[63:16] Ryan Lucas: No, because I'm seeing something from February 13th. Unfortunately, Windows Server is not available on ARM64, and there's no plans to do that. I have seen it did. On some random TechNet website.

[63:28] Ryan: All right, well maybe, I don't know, that's a follow-up research later.

[63:32] Ryan Lucas: Yeah, but you know, it also means that if they're running all these other things, I'm like, so you're not running Azure SQL on Windows, you're definitely running the Linux flavor and that you have built and everything else. Like there's a lot of like little nuances you kind of glean from this article with that.

[63:47] Ryan: Yeah, I guess, I mean, I guess SQL Server could run on top of Linux ARM. Um, yeah. Which is probably what they're doing. Which is probably what they're doing for sure. But yeah, no, it's, it's sort of weird because they have a Windows 11 ARM-based operating system now.

[63:59] Ryan Lucas: They don't have Server.

[64:00] Ryan: Yeah. It's weird. I wonder why they decided not to do that.

[64:03] Ryan Lucas: I, I started down that path at one point in my day job and I, I found that out the hard way when we tried to go launch it and it didn't work.

[64:12] Ryan: Yeah. So the Windows Server 25 does have an ARM version. But only supported on top of Azure and some OEMs who've signed some partnership with Microsoft. It's not available for, I can't purchase it and put it on a computer at home with ARM.

[64:25] Ryan Lucas: Oh, so you now can do it in Azure? 'Cause for a while you couldn't.

[64:29] Ryan: It exists in Azure, but only in Azure or specific OEMs.

[64:32] Ryan Lucas: Oh, 'cause for a while you couldn't do that. Send me that link.

[64:36] Ryan: Okay, will. Yep. Foundry IQ is Microsoft's unified knowledge platform for AI agents, now generally available with full SLA coverage stable APIs and compliance certification. It lets developers connect multiple data sources like Azure Blob Storage, OneLake, and web content into a single knowledge base without building custom connectors for each system. The new serverless developer tier in public preview scales to zero when idle and bills by compute units measured in 0.25 compute unit increments per minute. Billing is not expected to begin until late 2026. Developers can experiment at no cost for now. Agentic Retrieval quality improvements show up to 20% better answer quality benchmarks and up to 54% improved recall compared to single-shot RAG. The Foundry IQ MCP server exposes knowledge bases as a remote model context profile server, making them accessible from Claude, ChatGPT, LangChain, and the Microsoft Agent Framework. So you guys finally got Q for enterprise. Good.

[65:27] Ryan Lucas: Yeah. And they didn't call it, ooh, wow, it's late. Uh, Copilot.

[65:32] Jonathan Baker: I'm really hoping that they didn't just write an article about how to set up like a persistent memory store for your AI agent.

[65:39] Ryan: It kind of looks like they did.

[65:40] Jonathan Baker: It does, doesn't it? And I'm like, ah.

[65:42] Ryan Lucas: The interesting piece of this for me is they built it out and one thing I do kind of like about how Azure does, which is a double-edged sword, is they give a lot of you these things that they're in private or public beta for, for free for a couple months, which I find interesting, which is different than a lot of the other cloud providers. So, you know, maybe it's here's your free hit and, you know, we know you're gonna come back, but on the flip side, gives you a little bit more time to play with things and whatnot versus the other cloud providers. You always have to pay a little bit upfront, which makes sense, but you know, sometimes that I think causes teams to not play as much with new technologies, especially something like this. That's, I'm not going to call it bleeding edge, but bleeding edge for Azure. How about that?

[66:28] Jonathan Baker: So this, it does look like it would kind of natively work across like like the Microsoft 365 and sort of the, what do they call it? The Fabric data. So I can see how this would be easy for like an IT org to build that corporate dataset. Cause it's kind of just like a bit of a click button. So that's kind of neat. Cause that's a, you know, a very—

[66:54] Ryan: Very common use case.

[66:56] Jonathan Baker: Yeah.

[66:57] Ryan: When will they replace SharePoint with AI? That'd be great. Could they do that?

[67:00] Jonathan Baker: Hopefully never.

[67:03] Ryan Lucas: I was just saying, it doesn't have it?

[67:05] Ryan: As long as they don't add AI to SharePoint, as long as they replace the SharePoint with the AI, not replace.

[67:09] Jonathan Baker: Oh, sorry, I thought you were asking for AI in SharePoint. Yeah, no, I agree. No, if I could never use SharePoint and I just interacted with AI, sure.

[67:17] Ryan: I mean, it's kind of like S3 though. It's sort of hidden everywhere.

[67:20] Ryan Lucas: Yeah, everything. Teams is SharePoint under the hood.

[67:23] Ryan: Yeah, I know.

[67:24] Ryan Lucas: Like, it's kind of amazing that I never really understood until I was talking to somebody one day that like, every team's, team's team is a SharePoint site. Yep. And then each thing is like, it's like this weird nested compilation of what probably S3 feels like to somebody inside of AWS where it's just everywhere.

[67:43] Jonathan Baker: Except with security groups.

[67:45] Ryan: And that's, that's the reason why it, it's broken is because SharePoint and Teams are mirrored together in such a terrible way. So Azure Database for PostgreSQL Flexible Server now supports the DuckDB extension in general availability. Allowing users to run analytical workloads directly within their PostgreSQL environment without moving data to a separate system. DuckDB is an in-process analytical database engine optimized for OLAP queries. So this extension lets PostgreSQL users run fast column-oriented analytics alongside their transactional workloads in the same managed service. I remember when there was companies that made nothing but columnar databases. Now you just get it as an extension on top of Postgres. Kind of impressive.

[68:23] Ryan Lucas: Yeah.

[68:23] Ryan: Bet those companies aren't doing well these days.

[68:26] Ryan Lucas: Do you remember any of them?

[68:29] Ryan: Ooh, there's one owned by HP, uh, which who knows what that is now, cuz that was back before they spun off a bunch of things. Vertica, that was the Vertica database. Um, yeah, I don't know what happened to that though.

[68:42] Ryan Lucas: I just like the name DuckDB. I'm not gonna lie. That's half the reason I put the article in there.

[68:47] Ryan: Apparently Rocket Software has, uh, purchased Vertica. From HP or Open— they apparently HP sent it off to OpenText. Oh, that's a, that's a long complicated journey. I'll have to look into later. That's crazy.

[69:01] Ryan Lucas: I haven't heard OpenText in years.

[69:04] Ryan: They're one of those companies that kind of like started buying assets and then they go there to die.

[69:10] Ryan Lucas: Yeah. Yeah. I know a few companies.

[69:13] Ryan: Azure's global provisioned throughput unit reservations are now region agnostic as of June 2026, meaning a single reservation can cover AI model deployments across multiple regions instead of requiring separate per-region commitments. Well, good. I'm glad you learned what the word global means, Azure.

[69:30] Ryan Lucas: So their global OpenAI is interesting 'cause certain models are only available in the global, which means they'll run, I wanna say in Sweden or East US.

[69:42] Ryan: And do they route traffic based on proximity to the endpoint or is it specific, routes it all back to one specific endpoint?

[69:48] Ryan Lucas: It was to wherever they felt like it. Yeah. So I assume capacity and everything else.

[69:53] Ryan: Yeah, 'cause we're in this fun moment where the government, US government is pissed off the rest of the world and the rest of the world doesn't like the US government. And so we have this data sovereignty problem, but then—

[70:03] Ryan Lucas: Yeah.

[70:04] Ryan: Which is fine, you know, and like you can deal with that 'cause there's all kinds of solutions and vendors, but then you get to AI and it's like, wait a minute, GPU capacity shortages mean that sovereign is hard to do. Unless you're running open models and you're willing to spin up your own hardware or at least get GPU capacity dedicated to you can run those open models on. So, you know, it's a bit of a problem if you're trying to do anything around like sovereign data, but we want to do AI, like the kind of juxtaposed in this exact moment.

[70:34] Jonathan Baker: You better have all that capacity in the local region. Yeah.

[70:37] Ryan Lucas: Yeah.

[70:37] Jonathan Baker: Good luck.

[70:38] Ryan Lucas: So Azure, either they were talking about it or they released it or it was in preview. I don't remember where it was. It was in one of their states at one point. Where they had, which I thought was actually pretty interesting, they had global, which was we'll route to wherever you feel, they, we feel like it. Then they had by region that you could set up too, which I think was just called standard. So you would say this is in this region, this is in UK South or UK North or whatever they all are. Okay. But then they also were talking about or released, I think data sovereignty ones that would stay in the EU or in the other or in the world. So they were trying to kind of build that out because in Europe, most of the time, as long as you're in the EU, people are happy, you know? So they were trying to essentially keep it out.

[71:25] Jonathan Baker: Well, that used to be the case with GDPR, but I don't know, as more and more countries develop more tighter data sovereignty, I don't know if that'll continue to be the case.

[71:33] Ryan Lucas: Right, like, you know, India's the most recent one, I think it's gonna require a lot more stuff in India.

[71:38] Ryan: Mm-hmm.

[71:38] Jonathan Baker: The Sense thing in France, like, it's coming. Yeah, I don't know. I, I like the, I like the idea. You know, I, I do like when I have that option to sort of fall back, right? So I can do data processing in region primarily, and then I, it's like when there's not capacity, it falls back to a more global or regional endpoint. That's kind of neat, but I don't know if that's, it doesn't really seem like that's covered by this, so.

[72:03] Ryan Lucas: No, but it's more just their global one in general. So like if you were on like one, I'm gonna just say one of the older models, like 5 or like 4.1 that was available in all the regions, then it would just go wherever the hell they felt like here. If you were on some of the newer models that were only in like, you know, 2 or 3 of the regions, then you, then like, then it did the routing for you based on what was there.

[72:27] Ryan: Interesting. Well, uh, in a story that Ryan and I wanted to kill and Matt insisted on, it's here.

[72:35] Ryan Lucas: Oh, I've felt the pain of this one. That's why.

[72:39] Ryan: Yeah. Azure API Management Premium V2 and Standard V2 now support wildcard custom hostnames, meaning a single entry like *.api.contoso.com and one wildcard certificate can cover all subdomains automatically instead of requiring separate configurations per subdomain. The practical benefit is reduced operational overhead at scale. A team onboarding 10 new API services previously needed 10 separate domain and certificate management different tasks. That is kind of a pain. And while CloudFront eliminates the repetitive work, this capacity capability is now available on both standard V2 and premium V2 tiers. That's nice. They didn't lock that behind premium because that would just been the dick move.

[73:13] Ryan Lucas: No, no, no. It was, wait, wait, wait, wait, wait a second. It was on premium forever. So if you wanted to use it, you were on the $3,000 a month one. And it may or may not have been the only reason why I was on premium. And it really annoyed me. I couldn't go to the $600 a month one or whatever it was. Because it was only available on premium. And I really want to understand, besides somebody being a dick, what the technical constraint was of why I couldn't put my wildcard certificate on the standard one, but I could on the premium one.

[73:44] Ryan: I mean, weren't you the guy who was telling us how you couldn't update Azure Front Door for like weeks at a time? And you're surprised this is a problem?

[73:52] Ryan Lucas: Yeah. All right, I see what it is now.

[73:55] Ryan: Okay.

[73:56] Ryan Lucas: You're just gonna make fun of me. Got it.

[73:59] Ryan: No, I'm just saying, I'm just saying, like, I mean, like, you ask this question, like, you don't know, but I'm like, but you know that there was like months there where you had to do like coordinate your front door updates. So you—

[74:07] Ryan Lucas: Because they took 45 minutes to update, which is like old school CloudFront and Redis. Yeah. Yeah, no, it was, it just was one of those things that like we did our testing, we made sure everything else, and we came to this one last thing and you're like, this stupid feature is why I can't do this. This is driving me crazy. And the uptick between one to the other is like 5x. It's not like, it's like, oh, this one's like a dollar an hour and this one's like a dollar and 5 cents an hour. No, it's like one's a dollar and one's $5 an hour.

[74:41] Jonathan Baker: I mean, I guess, you know, like I know that every centralized API gateway, you know, the biggest business friction is onboarding new APIs to it. So I guess this makes it Azure, but I can think of like 9 different ways that solve this problem with not needing a new service. So I'm like, okay, whatever.

[75:00] Ryan Lucas: I mean, essentially, I think I did the math at one point. If you're on the premium at, let's just say $3,000 a month, that's one, that's no HA. So if you really want an HA, you're at $6,000. Or I could have 2 in HA as standard. I'd still have a lot of money left cuz it's like $500 a month. So you're at $1,000, so you're still cheaper. So like, I just have a lot of scar tissue around this one.

[75:26] Ryan: Mm-hmm. Sounds like it. Well, I mean, luckily you don't have to do much with Azure anymore, so you're welcome. You're still—

[75:33] Ryan Lucas: Thank you.

[75:33] Ryan: Still on the hook for it for the show, but—

[75:35] Jonathan Baker: Yeah, yeah.

[75:35] Ryan: You know, on a day-to-day basis, you don't have to, which is nice.

[75:38] Ryan Lucas: I know, I have to read the feeds and everything else though. With the Vault feature, I figured out an easier way to integrate into my phone, so. I can now do, now as I do bedtime, I can, you know.

[75:51] Ryan: I like how you made that sound like it's like this impressive feature. It's, it's webhooks. I built a webhook endpoint and I, I've been using it for a month and you're like, how do I use this webhook? And I'm like, here you go. And you're like, this is amazing. I'm like, you're welcome. It's, it's 1998 technology, but it's good.

[76:07] Ryan Lucas: But then this past week I figured out how to better integrate, better get the Shortcut app working on my phone. And how to link the actual article in the, for Azure, because there's just that feed. You can't just go through if you go to what's news. So like I found like there's another button you can expand and then it quickly does it. And I was like, oh, this was convenient.

[76:29] Ryan: Nice. Yeah, I, I, I should actually probably add to the webhook, the ability to pick a topic. Because I know, I know a lot of yours, uh, it doesn't always detect the topic properly and I had to choose it manually. And it only works if you do it within the first 15 minutes, which I'm always on my computer, so it works out just fine, I guess. But sometimes they time out and like, ah, I missed that one.

[76:49] Jonathan Baker: Oh, that's why I had to correct a lot of the hyperlinks. Now I get it.

[76:53] Ryan: You had to fix the hyperlinks.

[76:55] Ryan Lucas: Mm-hmm. There were some manually extracted ones as well.

[76:57] Ryan: Oh yeah. Oh yeah. The manual extraction ones are, so the web is very anti-AI. I don't know if you know this or not.

[77:03] Jonathan Baker: Oh yeah.

[77:04] Ryan: But many of the sites, including OpenAI, don't like if you are trying to web scrape them to get data to fill out, uh, show notes. And, uh, yeah. So when that happens, you get a notice, you get a message from Bolt where he's very angry, very, uh, borderline Skynet raging at the website and saying, please manually extract it. And when it does that, it does not, it can't pick up the article titles. Yeah. You have to, yeah, you have to fix those manually. Ah, okay. And you beat me to it cuz I hadn't done it yet.

[77:31] Ryan Lucas: Yeah. Yeah. It also shows how often Ryan adds, uh, articles to the show notes.

[77:35] Ryan: I don't think Ryan's ever added an article to the show notes.

[77:37] Jonathan Baker: No, there's like once I did.

[77:40] Ryan Lucas: Was it an accident?

[77:41] Ryan: Okay, so Jonathan's at zero and Ryan's at one. We should have this as a metric. We could do this with Bolt.

[77:48] Jonathan Baker: I mean, if there's ever, if there's ever anything that I think is interesting that you guys don't already have in there, it's just never happened.

[77:54] Ryan: So yeah, I think it's a couple times you've had stuff.

[77:57] Ryan Lucas: I don't think this is a fair stat 'cause you're gonna be like 500,000. I'll be at like, like 1,000. That's true. And then there'll be 1 and 0. Yeah.

[78:05] Ryan: That's fair. Yeah, you're right. It's a very biased stat.

[78:08] Ryan Lucas: You have a better process where you go through it, but like my brain's not in podcast mode and you know, as soon as we're done recording, I'm done until like Sunday night when I'm like, oh, we wait, we have to do this on Tuesday.

[78:20] Ryan: It's really just 'cause I've been a fan, I've been using RSS feeds since I started using the internet basically. 'Cause I was like, this is awesome. And so I've always been a big RSS feed guy. And so that's how I have all these things coming into RSS. And then, and then now actually have the, we have a web scraper in both. So for the websites that don't have RSS, because the new AI companies are, think they're fancy and they don't need RSS because they're new and modern. And so they don't have RSS in all cases. And so I have a web scraper that fixes those. So, uh, it's been a fighting, fighting against Cloudflare is, uh, it's kind of a nightmare.

[78:51] Jonathan Baker: Oh, I hate the AI detection.

[78:53] Ryan Lucas: Yeah.

[78:53] Ryan: When you're just trying to, I just wanna get your article. I don't care that you're you're trying. And it's, it's just, and like sometimes it's dumb as like even API documentation, which is the place where you would want AI to be able to access it, sometimes doesn't work. You're like, right.

[79:05] Jonathan Baker: I think it's just being, it's being applied with a big hammer.

[79:08] Ryan: It's a marketing person who doesn't know what they're doing and they're just blocking these things willy-nilly without thinking it through. Like even, yeah, there was someone, I think it was Amazon, like they put some update to the WAF and it was blocking any system accessing RSS feed. I'm like, you understand what RSS is for, correct?

[79:24] Ryan Lucas: Yeah.

[79:25] Ryan: So yeah, like, yeah, these things happen.

[79:27] Jonathan Baker: And a lot of time it's an infrastructure team, right? It's not the people tied to the application function. So they don't know, they're just applying it to a domain, right?

[79:35] Ryan: Correct, yes. All right, we have a NeoCloud article this week from CoreWeave. CoreWeave Mission Control is a new AI-native observability platform that provides end-to-end visibility across infrastructure, clusters, and workloads, addressing a gap that general-purpose monitoring tools often miss in GPU-heavy environments. The platform combines real-time telemetry with GPU utilization analytics, which is particularly relevant as organizations struggle to justify and optimize the cost of large-scale GPU deployments. Audit-ready logging and automated operational insights suggest the platform is targeting enterprise customers who need compliance documentation alongside performance monitoring, not just raw metrics. The full-stack framing here is notable because AI workloads span multiple layers simultaneously, from bare-metal GPU performance up through cluster orchestration and individual job execution. execution, making siloed monitoring tools not less effective. For teams running inference or training at scale on CoreWeave, tighter observability tooling built into platform could reduce the engineering overhead of stitching together third-party solutions like Prometheus, Grafana, and custom GPU exporters. Which yes, if they would just open source this, that'd be amazing.

[80:36] Jonathan Baker: Yeah.

[80:36] Ryan: So CoreWeave, I really appreciate you doing this and telling us about how we should be doing this. And if you could open source this, uh, in some way or enhance Grafana to make this easier, that'd be really great.

[80:48] Jonathan Baker: Yeah, cuz they're not like an observability company, right? Typically.

[80:52] Ryan: No, they, no, CoreWeave sells GPUs. They're, they're a neoscaler as they call them, which is, I assume newscaler cuz, and which means they're not a hyperscaler yet cuz they're too small. So basically they've created this camp of, uh, these guys who are neoscalers and they all essentially specify, specialize in GPU capacity.

[81:10] Jonathan Baker: So I wonder if it only works on their platform.

[81:12] Ryan Lucas: I assume so.

[81:13] Ryan: I, my guess is that the ideas will go to other places where if they open source some of this, uh, you could. But yes, I'm going to say most of this is only available to you on theirs.

[81:22] Jonathan Baker: Well, because you can open source the software, but you can't open source what it's reading. Like it's—

[81:26] Ryan: Correct. I mean, they're just buying NVIDIA, they're buying NVIDIA GPUs. They're not doing anything custom. So, you know, if it's an API they can hook into on NVIDIA, they're able to get the data. So yeah, that's neat. Yeah, we've been trying to add more of the Neo scalars, but, uh, If you think Azure's basic sometimes, like, wow, that's caught up to Azure or to AWS and Google, like the Neo-Scalers are really catching up. Like we introduced IAM. Oh, that's cute. You're a sweet summer child. Nice.

[81:56] Jonathan Baker: Now announcing the use of SSO.

[81:58] Ryan: Well, we introduced this new concept called a virtual private network. Like, mm-hmm, yeah, VPCs. Okay, cool. Thanks kids. Cool, cool, cool. I see why you're considered new scalars. Got it. All right, gentlemen, that is it for another fantastic week here in the cloud.

[82:14] Jonathan Baker: Bye everybody.

[82:15] Ryan Lucas: See ya.

[82:17] Jonathan Baker: Another week of cloud news wrapped up. Bolt will collect the news. Justin will get the notes. Jonathan will write some code. Ryan will watch the perimeter and Matt will reluctantly watch Azure till next week for AI. Amazon, Google Cloud, and Azure. And hey, maybe even Oracle, who knows? Check out thecloudpod.net for our newsletter. Join our Slack, message us on socials, or leave a review.