The majority of the revenue derived from the SAS language comes from three highly regulated industries, Financial Services, Insurance, and Health & Life Sciences (HLS). In fact, its dominance in those industries, comes in part from regulators historical willingness to accept the SAS language as a standard, some even going so far as to accept its proprietary data formats for regulatory submissions. Those same tailwinds the SAS language has enjoyed for the past 50 years might be shifting direction.

Since the financial crisis, the pace of regulatory change has dramatically increased, leaving businesses struggling to keep up using their ancient systems. What’s clear from recent guidance by a number of regulators is that they are no longer happy with overseeing businesses that are using proprietary non-standard legacy software. How can you be sure of a result if the underlying code is a complete black box? It’s also a burden for them to support multiple formats and languages, especially when they require costly tools and training to reproduce or interpret work.

The recent Data + AI Summit, the premier conference for Apache Spark and open analytics, was attended by thousands of professionals. Among the crowd, were high level attendees from notable government regulators like the SEC, OSFI, FCA, FDIC, CMHC, FDA, PHAC, as well as many of the worlds largest central banks including BoC, BoE, BdP, BdE, BCRP, BCB, and the US Federal Reserve. Their increasing involvement in the community speaks volumes about the future intentions of those institutions.

Nimbleness is no longer just a competitive advantage, it’s increasingly a requirement in these industries. Ironically, the few areas of growth remaining for the SAS language these days are directly tied to regulatory changes, especially IFRS and Anti-Money Laundering (AML). These regulations require expensive and complex bespoke solutions because of a lack of out-of-the-box capabilities in the language. The Apache Spark ecosystem and the Databricks implementation in particular, makes modern regulatory compliance a first class citizen, not an expensive custom add-on. For example, there is a custom SAS language based IFRS9 solution, but such capabilities are trivial to reproduce with popular PySpark ecosystem tools (e.g. Delta, MLFlow, Spline, etc.).

MLFlow in particular is an amazing tool for modellers, it keeps track of all your experiments, so you can show your work. Delta, with its ability to do time-travelling, is a feature I very much wish I had back in my days as a Data Scientist at Capital One. Coupled with the incredible performance of PySpark, these tools are not just useful, they are the industry standard. Auditors of the future are no longer going to accept that an information request will take a week or two to fulfill, they will demand the answer in hours or minutes.

Of course the major supporter of the SAS language is the company itself. With the recent departure of another high profile executive, the head of R&D, and the disclosure they are loosing money, a cloud of uncertainty hangs over the organization. The past few years have seen virtually all seasoned executives walk out the door. Andre Boisvert, Mikael Hagstrom, Jim Davis, Carl Farrell, Oliver Schabenberger, were all heirs apparent to the future of the company until they weren’t. With the majority owner and CEO nearing his 80th birthday, and no public succession plan, regulators seem increasingly concerned by their industries reliance on this single point of failure. Software can’t be guaranteed to work forever. Time-bombs can and do exist in software, as the recent “Y2K” problem (YEARCUTOFF=1920 option) clearly demonstrated. A systemic failure of the SAS language right now could be catastrophic for any of these industries.

Are we looking at a downward spiral for the SAS language? As companies abandon the language, regulators are taking note, and at some point they will expect the rest of the industry to follow suit. Financial leader Capital One, has completely removed SAS language processes, and have been leveraging Databricks and Apache Spark for over 5 years to meet regulatory requirements. How long until regulators take concrete action on the laggards depends on many of factors. One large wildcard, is one in which we all have to face, our own declining health and mortality.

Too-big-to-fail is often spoken of in a negative connotation, especially around the health of companies. The SAS language is currently only supported by 2 proprietary software companies, neither one of which is currently growing in revenue, profitability or headcount, despite exponential growth in the industry overall. Too-big-to-fail takes on an entirely different connotation when it comes to Open Source Software. It effectively means you’ve hit a critical mass by which the community is self-supported indefinitely, as is clearly the case with PySpark.

With such fervor and interest in PySpark, it’s not a matter of if, but when regulators will step in and mandate open data and analytics standards. It’s time for organizations to start embracing the future, before they find themselves scrambling to meet new regulations.