Welcome to the future
You'll be receiving an email from us momentarily.
Please confirm that your address is correct so we can get in touch.
The FØCAL Team
AI, Moore's Law,
& the death of the Architect
& the death of the Architect
…at least as we knew her. The concept of “application architecture” has been around for decades, but only recently did an actual “architect” role finally enter our collective org charts. Thank the cloud. The idea that one person could be responsible for the correctness of the technology and for planning how other engineers would “fill in the blanks” was, rightfully, attractive. With cloud infrastructure, tooling, and practice all converging to create only a handful of meaningful getting-started choices, the Architect became a shining icon of risk-reduction. And for a brief and magical time, software-focused R&D turned into just “D”. As AI took center stage, organizations were quick to file it under the heading “cloud,” and the Architect’s purview expanded. But then, as if in slow motion, the rules of engagement changed in ways that the Architect could never have anticipated. Prior to 2012, progress in AI R&D closely tracked Moore’s Law, with compute doubling every two years. Post-2012, compute has been doubling every 3.4 months (1), contributing to an explosion in AI research and a new state of the art every week. Image credit: OpenAI (1) Meanwhile, a strong competing insurgency was also taking place in application development. During that same period, sensors exploded in number and capability. Data went from being sparse (think web requests) to incredibly dense and numerical. Software shifted away from deterministic business logic toward non-deterministic, computational codes (2). Finally, and maybe most importantly, the applications of highest value kept popping up in places where Moore’s Law couldn’t do all the heavy lifting. Specialized devices – edge compute – eclipsed virtual machines as the key vehicles of value delivery. In the end, the very nature of AI applications prevented the Architect from making use of her most powerful weapon: the persistent doubling of centralized computing power. The fall of the Architect’s standard meant an immediate redistribution of risk to technical contributors that weren’t even on the field in the previous campaign. “Scientist” reappeared on the org chart, along with a data-munging support staff. Developers started sharpening metal and writing C code like it was 1999 again. It’s a different world, and yet here you are looking for a champion among the rubble. Why? Fog of war The problem is that modern AI application teams are still missing their battle captain. Day in and day out, a lack of situational awareness is tangible and pervasive. No single contributor is ever sure that their piece of the technology is correct. Critical things are falling between roles, incentivizing competition over cooperation, and generally spewing fog of war (3): Software optimization. Data scientists drive hard to meet accuracy requirements, picking algorithms and training models on their desktops or in the cloud. What they can’t know is if the Python snippets they’re cooking up in those Jupyter notebooks will run efficiently on the production hardware. Who takes those Python snippets and optimizes them? Without intimate knowledge of the algorithms, a performance engineer has no guarantee that meaningful optimization is even possible. No self-respecting bit-meddler is going to agree to an optimization path under these conditions. Nor is any modern data scientist going to agree to writing hardware-ready code from day one. Stalemate. Hardware acceleration. The struggles individual developers face in working with hardware accelerators are thoroughly documented elsewhere (4), but the problem just gets worse at the level of the application and the team. The hardware market is growing like crazy, creating a whole host of problems: Unpredictable results – It’s extremely difficult to know in advance what the actual latency and power consumption benefits of any given hardware accelerator will be, especially if algorithm selection isn’t finalized. Curse of choice – Without a detailed knowledge of both the algorithms and the larger application architecture, hardware acceleration is opaque and undifferentiated. No one person on the team has enough context to evaluate the hundreds of seemingly identical options. Disincentive to act – Anecdotally, teams spend six to twelve months just on the offload problem (5). Consider that in that time new general-purpose (CPU) hardware will likely come to market that offers 2x the performance benefit. The team might as well have done nothing. No architecture. The lack of situational awareness is not only stressful for the team but masks an even deeper, more existential dread: the possibility that no combination of contemporary hardware and software will successfully unlock the application. The team is looking for the proverbial needle in a haystack without any certainty that the needle even exists. The stuff of legends Enter the mythical “AI application architect,” a seer capable of cutting through the fog of war. She is the keeper of secret, arcane knowledge: Which pieces of the software should the team optimize, how much time optimization will take, and critically what will their gains be in terms of latency and power consumption? Which hardware should the team use, what are the likely engineering man-hours associated, and critically what will their gains be in terms of latency and power consumption. Which combinations of software optimization and hardware acceleration, if any, will unlock which application features? The bad news is that this Architect doesn’t exist. Given the current pace of technological change, no human can reliably navigate these trade-offs. Most application teams know this implicitly, but nonetheless want to believe in a champion that can lead them confidently forward. The good news is that we’re no longer in an age of superstition. These trade-offs are modelable – mathematically and computationally – with sufficient data. And the sages have been thinking about exactly that problem for a long time. Long live the architect! Hardware/software co-design is an old idea (6). At its core is a hypothesis that the relationship between hardware, software, and end-to-end application performance can be modeled using statistical techniques. Despite solid science to back it up, co-design nonetheless sounds fantastical to some technical people. This AI thing is “just engineering as usual,” they say. But it’s not. Imagine you are a team of mechanical engineers trying to build the next-generation jet engine, and your directive is to balance maximum speed, fuel consumption, and manufacturing costs. Would you go down to the hangar, get a bunch of arc welders and sheet metal, and start building jet engines to figure out which jet engine to build? Absolutely not. Your R&D process would look much different. Your “engineering as usual” almost certainly involves some kind of virtual environment – a CAD environment – that allows you to model the trade-offs and simulate your designs before ever putting heat to metal. The benefits of modeling and simulation to jet engines are obvious. The impact of CAD on navigating knotty physical constraints is undeniable. So it will be with hardware/software co-design. It’s a no-brainer, and it’s the future of AI application architecture. FØCAL (7) is the first co-design company of the AI era. We’re taking on the momentous challenge of adapting the technology of co-design to the expectations of modern – read: agile – engineering teams. FØCAL brings co-design directly to your engineering process, inserting it into the AI software life cycle as part of your normal testing, integration, and deployment workflows. Find us on GitHub (8)! (1) Amodei, D. & Hernandez, D. (2018-05-16) “AI and Compute.” OpenAI.com. (2) Rossa, B. (2018-02-20) “Computer vision: The process problem.” LinkedIn.com (3) Ozkaya, I. (2019-08) “Are DevOps and automation our next silver bullet?” Computer.org (4) Rossa, B. (2019-11-27) “Deep divides between AI chip startups, developers” – A developer’s perspective. f0cal.com (5) Hemsoth, N. (2019-10-29) “Deep divides between AI chip startups, developers.” TheNextPlatform.com (6) Bailey, B. (2019-07-25) “Hardware-software co-design reappears.” SemiconductorEngineering.com (7) FØCAL website. (8) FØCAL GitHub organization page. About the author Imagine a premature graybeard with a bad Emacs habit who has been talking trash on neural networks since the 90s. Then throw in a couple of big DARPA programs with GPUs, Beowulf clusters, and LIDARs on A-10 Warthogs. Since 2013, I’ve been a “chief vision officer” for hire, helping companies industry-wide tackle their computer vision R&D and HR challenges. Oh, and I also launched a startup. Brian F. Rossa – C*O, FØCAL
“Deep divides between AI chip startups, developers”
— A developer’s perspective
— A developer’s perspective
Close to two hundred AI-specific hardware accelerators are coming to market in the next few years. Let that sink in. Just last month TheNextPlatform dove into the current state of the market with ‘Deep divides between AI chip startups, developers.’ In her piece, Nicole Hemsoth addresses the AI hardware ecosystem, its stakeholders, and its struggle to drive developer adoption. It’s a harsh wake-up call: The gap in understanding is not a general gripe about the lack of tools or time or resources. There are some deep-seated, fundamental holes in how AI chips are designed (and by whom), how developers experiment with new architectures, and how the market/more developers can adopt them if they ever do manage to find a way into production. ‘Deep divides between AI chip startups, developers’ – TheNextPlatform I was engineering machine learning on DARPA programs long before there was such a thing as a “machine learning engineer” and, as such, I had to be a jack-of-all-trades. But we’re now at an inflection point in the software and hardware industries where differentiation and specialization in engineering roles is happening at a dizzying pace. To echo Deep divide’s message, if you are building or betting on hardware acceleration it is critical to understand: How market forces, technology constraints, and the developer’s individual incentives create a clear path-of-least resistance leading away from hardware acceleration. That the path to mass adoption of hardware acceleration isn’t paved with free chips. The only way to bridge the huge divide with developers is to show your commitment to the developer experience. Pattern-matching matters Having worked on dozens of AI applications and reviewed many times more architectures during my career, I’ve found that every single one has been “hardware-constrained” in one way or another. They either did benefit or could have benefited from hardware acceleration. Through those fifteen-odd years, I have seen just three successful team-building patterns: The “Mastermind” pattern – Find one or two people who are 10x performance engineers. Ideally, they’ve built your application at least once before and are already familiar with both the algorithms and target hardware. The mastermind’s key skill is in the art of tuning the software to get every last drop of performance out of the hardware. The “Hardware-first” pattern – Set the hardware target at the start of the project and only hire people that are experienced with it. Enshrine the performance requirements and demand performance compliance on every git push. The “Algorithms-first” pattern – Use the availability of scientific papers and open source codes to derisk the application’s accuracy requirements. Worry about the rest later. I’ve written about the “Mastermind” pattern elsewhere, and Deep divides flirts with it a bit: …. hire people that seem (sic) the complexity of the [acceleration] problem in all of its forms … that understand [acceleration] and what it means from the developer point of view. It’s no small undertaking to offload the right bits of a workload to the right device and it can take a year or more to establish that on a new architecture. ‘Deep divides between AI chip startups, developers’ – TheNextPlatform If an application venture can find a “mastermind” – big if – this teaming pattern can be effective until the business needs to scale. Unfortunately, for all the reasons that Deep divides covers, there simply isn’t enough godmode to go around these days. Consequently this teaming pattern is relatively rare in the wild. Adopting the “hardware-first” strategy works for application ventures that already have capital and/or plenty of firmware engineers in-house. Can those same engineers learn ML? Probably, but while individual hardware engineers can be agile, hardware-focused organizations tend to be fatally sluggish around AI. Incubating an everything-on-the-device culture creates fundamental rate limits in the R&D process. Consider: While data scientists employed by “algorithms-first” ventures are training and testing hundreds of models a day in the cloud, “hardware-first” engineers are bottlenecked by the clock speed of the one or two devices on their desk. “Algorithms-first” ventures typically seed where a business case and a line to novel data intersect. These organizations blossom in the relative availability of academic ML papers and open source codes. ML-savvy data scientists are generally among the first, more junior hires. Because of that early inertia, the modus operandi is to focus on derisking accuracy requirements above all else. Who is the hardware acceleration customer here? There simply aren’t enough masterminds out there to build a business around. The hardware-first customer is a good one, but they may know you too well; you will face competition from in-house teams and your technology is unlikely to transform their business. That leaves the algorithms crowd. It’s universally understood by software folks that swapping in new hardware is the cheapest way to improve application performance. But is using your technology as turnkey as upgrading from an i5 to an i7? Unlikely. So how do you make the sale? The developer experience Deep divides tells us about a mastermind, circa 2009, who needs a sample chip and twelve months to determine whether or not your hardware acceleration technology is useful. If that sounds painful to you, imagine how it sounds to your developer customer. The culture of software development has changed dramatically in the last decade, and the focus on 2019 is squarely on speed and agility. Thanks to the cloud, software engineers – especially those that are data-facing – are now organized around the idea of hardware as an abstraction. This is doubly true for algorithms-first ventures. When an algorithms-first developer gets excited about hardware acceleration, the last thing you want them to have to do is leave that comfy bubble of abstraction. Let me spell it out for you: Dear hardware acceleration vendor, your go-to-market strategy must turn your technology into the Wheaties that your opportunity customer eats for breakfast, namely software and data. – the author But before I do the big reveal here and give away all my best tactics for becoming developer Wheaties, let me assure you of the following: developers are NOT going to talk to you. When targeting the developer market, the burden of proof is on the vendor. Don’t expect your customers to give you feedback about your technology until the day they can pick it up and use it. Then, expect bug reports – lots of them. With that in mind…. CI > fabrication. Waterfall development required lots of talking, and developers are done with that now. Do you know what the agile software community has settled on instead of talking? Continuous integration. Whether it’s CircleCI, Jenkins, or good ole Travis, developers have enshrined CI as the lynchpin of the engineering process. If you want your hardware acceleration product to be a part of the conversation before you go to fabrication, then find ways to support the customer’s process during your R&D phase. At a minimum, you’ll need a software interface (Wheaties!) that developers can start integrating against RIGHT NOW and a way to show them the expected delta (more Wheaties!) in key performance metrics. As long as you treat those two touch points as contracts and hit some intermediate milestones, you will win adoptees. You might not even need working hardware to do it. hello_world time > run time. What’s the difference between a begrudging adoptee and a self-describing “happy customer?” Delight. And what does it take to delight a developer looking for a way to accelerate their code? Plenty of great hardware companies have gained reputations for bad software because they get this question wrong. They’re hyper-focused on the “acceleration” part and refuse to think about the end-to-end developer experience. Latency reduction is certainly important, but it comes at the end of a lengthy integration process. If you don’t want to be in the “bad software” camp, the first thing you should try to minimize – before latency, before power consumption – is the adoptee’s hello_world time. This is a tactical no-brainer: Don’t drag the customer through the smouldering hell that is every vendor-specific installer ever. Figure out which package manager they’re using, and support it. If the OS requires additional setup, provide a bootstrap script. Done. Packages > algorithms. If it wasn’t obvious from the previous suggestion, toolchains that may appear ancillary to your core value are nonetheless extremely important to your success. This is because software development is now dominated as much by “frameworks” and associated tooling as it is by programming languages. It’s no longer sufficient to provide a handful of accelerated algorithms in a language and declare it “supported.” To drive adoption, the data structures under the hood of those algorithms need to play nicely with the core data structures of key application frameworks. Figure out what those are. Then work with framework maintainers to provide interop. Do even a little of this and you’ll have have sealed your reputation for caring about the developer experience. Images > docs. Docker ruined docs. How? Repeatability. When a developer is trying to reproduce an artifact – a test condition, a trained model, whatever – docs leave too much room for error. In the age of Docker, things aren’t considered repeatable unless they’re containerized and runnable. This is great for developers – and for you. The first thing you should do is build a hello_world container where everything just works. Then, find a way to keep this thing up to date. Need PCI passthrough? Fine. Implementation details don’t matter that much, so just use a VM. The point here is to make sure that the very first steps in your prospective customer’s experience aren’t a trip through kernel-compiling purgatory and dependency hell. Note: This is NOT an excuse for skimping on docs! They’re still 100% necessary, but no longer sufficient. Got ‘deep divides’? If you’ve read this far, there is at least one thing I hope you’re scratching your head about: Exposing hardware in a controlled way before fabrication sounds … impossible? Far from it. FØCAL partners with hardware vendors to expose their products to key software workflows like CI. By putting your prototypes in our farm, you can start to build developer community today. Got “deep divides?” FØCAL is here to help. About the author Brian F. Rossa Imagine a premature graybeard with a bad Emacs habit who has been talking trash on neural networks since the 90s. Then throw in a couple of big DARPA programs with GPUs, Beowulf clusters, and LIDARs on A-10 Warthogs. Since 2013, I’ve been a “chief vision officer” for hire, helping companies industry-wide tackle their computer vision R&D and HR challenges. Oh, and I also launched a startup.
FØCAL / CircleCI partnership
Build once. Build everywhere.
Design, derisk, deploy, and deliver
cloud to edge.
cloud to edge.
Quick Start Guide Skip