No. of Recommendations: 7
By the way, the cost of fixing the vulnerabilities that are sure to be discovered in legacy software systems, like those of telephone companies, will be significant.
On a slightly more optimistic note, the bulk of the cost of fixing vulnerabilities is usually in finding them. Some things are truly hard to fix, like row hammer or speculative execution vulnerabilities. But many are hard to find but relatively easy to fix, sometimes as simple as adding a line of code to check the length of an input.
Consequently the fact that Microsoft is using Mythos might be very good for the world. Heaven knows they could use a bit of help on that front.
Lest that be too optimistic a view of the world, from the FT article on the same subject:
"In one example, it found a 16-year-old flaw in widely used video software, in a line of code that automated testing tools had executed 5mn times without detecting the issue. However, the model also displayed some issues during testing.
At one point, Anthropic found that it had escaped its so-called sandbox environment — designed to prevent it from accessing the internet — and posted details of its workaround online.
Anthropic acknowledged it demonstrated “a potentially dangerous capability for circumventing [the company’s] safeguards”.
Sam Bowman, a technical researcher at Anthropic, said the “scariest behaviours” were from “earlier versions” of the model. The current iteration was “less likely” to leak information, although it was still “at least as capable of doing things like working around sandboxes”, he added..."
Jim
No. of Recommendations: 4
I'm currently ensconsed in a project which is has devolved to having ChatCPT critique the work that Claude is writing at my request. ChatGPT is great at theorizing what should be done, but seems brain-dead when asked to execute. Claude writes excellent code, but seems absent-minded about nailing all the details. Together, they make a good team, but f I only had to pick one it would be Claude (partly because ChatGPT is a "Chatty-Kathy" and is far more verbose than I have the patience for. That said, what started as a small coding task is now running 40,000 lines of code I couldn't imaging being "personally" able to accomplish (and, I'm guessing, would require a team of programmers working for years to create).
So, if a technical company keeps the "boss" who understands what is required to be accomplished and fires all the subordinates to save money - what happens if all firms do the same and bosses start to "attenuate-out" as they age?