Friday, 8 March 2013

The legacy systems problem

A comment I made on BBC 5Live's Wake Up To Money programme has attracted quite a bit of attention. It's quoted in the BBC's report about RBS's recent IT failure:
"We just have a lot of legacy systems out there," said Frances Coppola, a former RBS employee and an independent banking analyst.
"Some of those systems haven't been replaced for a long time," she said.
And my colleague on the programme added the following:
"The bank has given too much priority to grand schemes and acquisitions, rather than running a day-to-day bank," said Alastair Winter, chief economist at Daniel Stewart Securities.
Both of these remarks need further explanation, and to do so I need to talk about what I actually did in banking. I am usually described as a "former banker", but that isn't entirely true. I worked in systems. I started as an IT analyst/programmer, moved on into IT project management, then crossed the fence into the business (doing an MBA in the process) and became a business analyst and project manager working on joint business/IT projects. So when I talk about bank IT systems, I'm not making it up. I really do know what they are like. I worked on them.

Admittedly, I left banking ten years ago, and systems have changed in that time - considerably. But the principal areas of development have been in front-end applications (customer interface) and in settlement processing and payments. The core systems, that run the basic banking processes, have not been upgraded. That's why despite the massive increase in processing power and storage capacity in modern IT technologies, banks like RBS are still running massive traditional mainframe computers. Their software won't run on anything else. I suppose it keeps IBM in business, but it is not what we might expect for a modern real-time, mission critical banking system.

The existence of ancient "legacy" systems within the modern banking systems architecture is not necessarily to do with lack of investment, as Alastair Winter suggested, though fast growth and acquisitions complicate IT architectures and can make systems vulnerable. I shall return to the likely effect of RBS's aggressive expansion strategy shortly. But the real problem is the size, complexity and criticality of these old systems - plus the fact that many of them are written in progamming languages that are not widely used now, so there are skills shortages among IT staff. Many of these systems are also poorly documented (comments in code were something of a rarity when these systems were written) and their functions are poorly understood. Replacing them without affecting functionality is therefore not an easy task. Even replacing a single program can have adverse effects if the program is not properly understood, as I discovered when my team replaced a start-of-day batch program in a systems upgrade on one occasion: the old program was complex and poorly documented, we (perhaps inevitably) failed to understand exactly what it did and we therefore subtly changed its functionality without realising it. Fortunately the changes we made didn't cause major problems, and it wasn't a major retail banking system anyway. But imagine that, scaled up to an entire suite of retail banking applications running millions of bank accounts, with trillions of transactions going through every day? No wonder banks have shied away from replacing these systems. The risks, and the associated costs, are terrifying.

However, there is another reason why banks have avoided replacing legacy systems. Banks are driven by the need to make profits, and profits don't come from upgrading basic infrastructure. No, they come from new business lines, risky mergers and applications, swanky front-office applications to support fancy new products. Infrastructure is boring and the cost of replacing it is a hit to short-term profits. Therefore banks do whatever they can to avoid expensive replacement of legacy systems. Keeping the things running, patching them up and circumventing them if necessary is the order of the day. Banks that are expanding very rapidly are particularly prone to develop a patchwork of unintegrated and incompatible systems, because every time they acquire a new business they also acquire the systems that support it and they seldom allow the time to integrate these before moving on to the next expansion opportunity. I saw this happening very clearly during my time at UBS, and I have no doubt that RBS was doing the same during its fast expansion in the run-up to its failure in 2008. Piecemeal, incompatible systems are a risk, but banks don't worry about that when the business opportunities are good.

It would be nice to think that if banks that are expanding won't invest in new core systems, maybe damaged banks that are trying to reduce their risks might do so. Sadly that isn't true either. Core system replacement is very expensive, and damaged banks are trying to reduce costs as well as risks. Very expensive IT infrastructure projects simply aren't acceptable to management or staff when their jobs are on the line. So since 2008, despite their supposed commitment to reducing risk, banks such as RBS still haven't addressed the legacy system problem. They've reduced their balance sheet risk, but not their operational risk.

But as I noted above, bank IT systems have developed enormously in the last couple of decades despite the existence of old systems. That is because to avoid replacing legacy systems banks have adopted a practice known as "wrapping". This approach was recommended by consultancies as a lower-cost and perhaps more importantly, lower-risk approach to improving banking system functionality. Basically you treat your legacy system as a "black box" which remains untouched at the core of your system: around it you create a "shell" of additional applications that provide your customer interface, your straight-through settlement processing, your point-of-sale functionality and your real-time updates (yes, this functionality can be added even if the "black box" is a batch system). This is akin to the way in which off-the-shelf package applications are typically customised, but it is of course on a much larger scale.

The "wrapping" approach has been all too successful. Customers now expect real-time banking services twenty-four hours a day: in fact the entire economy has come to depend on them. But the core banking systems still do not support this. The functionality to support real-time banking services is in the shell. So as far as customers are concerned, the shell applications ARE the banking system. The balances that we see on our internet banking screens ARE our real balances, to us - but not to the bank. To the bank, the real balances are in the core system, which is invisible to customers and is probably updated in overnight batch processing, not in real time as customers think. There is divergence between how financial information appears to customers and how it appear to the banks themselves. And this opens up the possibility of system errors causing data corruption, with a serious impact on customers. Just to give an example of what MIGHT happen, suppose that a customer deposits cash into their account. That cash shows up immediately in their balance in the customer-facing applications - online banking screens, telephone services, branch information points. And they can use it for payments. But the core system doesn't apply that cash to the balance, or the payments made using that cash, until the overnight process. If that process fails, or the customer application doesn't transfer the information correctly, the result could be an imbalance between what the customer thinks they have in their account and what the bank says they do.

Over time, the "shell" of additional applications becomes ever larger and more complex, and customers come to rely on it more and more. And because the core uses old technology, and the shell applications are much newer, there are problems with technological compatibility and connectivity. That increases the risk of failures. The more fragmented your systems architecture, and the more it relies upon stable interconnections between different technologies, the riskier it becomes. Efforts have been made to reduce risks, of course - to improve the stability of connections, and to provide "fail-safe" backups for critical components - but in the end the "pasta rule" still applies: the more your systems architecture looks like spaghetti, the higher risk it will be.

So the problem for banks is the balance of risk: the risk of replacing a critical legacy system and it all going horribly wrong (and costing a fortune) versus the risk of increasing instability in an ever-more-complex systems architecture founded on diverse technologies. It's rather like the risk of a major operation (which could result in death but might lead to full recovery) versus medical treatment to control symptoms - you get iller but you don't die, at least not for a while. But eventually the operation becomes necessary. The question is whether IT systems in banks have reached the point where radical surgery is the only option.

This is not simply a question for banks to consider. It is a matter for political consideration. The economy as a whole has become critically dependent on the real-time performance of bank systems. If the payments network goes down because of the failure of one component - a large banking system - the entire economy stops working. The RBS system was only out for 3 hours, but in that time people couldn't make debit card payments, they couldn't get cash, automated payments weren't made, wages weren't paid into accounts. Just imagine what would have happened if it had gone down for days - as nearly happened in 2008. Those who think that RBS should have been allowed to fail do not understand how damaging that would have been to ordinary people and businesses. Even twenty years ago, the damage from a three-hour outage in the late evening would not have been so extensive.

The increasing fragility of banking systems poses a real risk to the economy as a whole. Unless this is addressed, we run the risk of a major systems failure in one or more banks at some point. The two RBS failures are warnings. Banks and politicians need to address this problem before there is a real disaster.

29 comments:

  1. The term "legacy" orginated with business consultancies in the 80s, when mini-computers offered a cheaper alternative to mainframes. The aim was to convince businesses that upgrades were necessary: out with the old IBM, in with the new DEC Vax. A more value-free way of describing such systems would be: "proven" or "established".

    Your use of the medical operation metaphor reinforces the idea that a "legacy" system is a problem waiting to happen. In fact, there is no natural obsolescence for software (as opposed to hardware), so the operation is not inevitable. Software that works, and goes on meeting a need, is effectively immortal.

    We don't know the details of this latest incident at RBS, but we do know (broadly) what happened last year. An erroneous change made (by a human) to a CA-7 job schedule caused some overnight posting jobs on the mainframe systems to not run. In other words, this was a failure in the first instance of change control.

    The second failure was an inadequate rollback plan. The suspicion is that some jobs ran, leaving data out of sync, and that rather than reverse the lot, RBS staff tried to knife-and-fork the data back into sync, hence the long outage. This also implies a third failure, namely poor crisis/problem management. These are all human failures.

    As you will no doubt know from your own experience, systems are at their most vulnerable when being changed - i.e. undergoing operations, in your metaphor. They do not fall ill out of the blue (all bugs are potentially predictable) or suffer from age-related ailments. The "black box" approach makes a lot of operational sense: once you've got a system running correctly, leave it alone. It also encourages the use of abstraction layers to facilitate inter-connection with other systems, which is good architectural practice.

    The irony is that these recent travails are likely to lead to the demand that RBS upgrades its systems, which will actually increase risk rather than reduce it.

    ReplyDelete
    Replies
    1. Software becomes obsolete as the business changes. Adding external apps to avoid changing or replacing old software increases complexity and risk. Eventually it does have to be replaced. You can't put that off for ever, however risky and expensive the change will be.

      Delete
    2. @David, the problem with the black box approach is that it breaks a cardinal rule of solid system design: that the reference value of each piece of data is stored exactly once in the live system (backups are OK of course, they're outside of the live scope).

      When you wrap an older system, you usually tend to need data that is too hard to extract when you need it, so you duplicate it, and synchronise it on occasion, and at some point this is likely to cock up and you have inconsistent systems. The problem is exponential because it compounds: this year you black-boxed last years' system, which was itself wrapping the one from 2005, and six layers below you get some 1975 VMS binary the source code of which has been lost. Of course, fixing it becomes harder with each layer, disentangling N layers of legacy is orders of magnitude harder than just the one. This is where systems do get "age-related ailments".

      But at some point you need to do it, because you will reach the point where tiny changes to be implemented through dozens of legacy layers becomes so tricky that it costs more than just starting from scratch, because the layering code totally overwhelms the actual business code (which can become a small percentage of deployed code).

      Maybe they should do that with the whole of RBS: put it in run-down mode, no new products/features, no new accounts, and people who need a modern product encouraged to move or sold to a fresher competitor.

      Delete
    3. cig, thanks. You've explained the "wrapping" problem better than I did!

      Delete
    4. My point was less about the technology and more about the metaphor. Of course you cannot expect all software to remain in use for ever, but the fact that bank systems retain this "legacy" core is telling, and I'm sceptical that this reflects solely a reluctance to invest in infrastructure. Banks pay ridiculous sums for IT contractors with legacy skills, so they don't lack imperatives for change.

      What I think is interesting about the "legacy" issue at banks is that it points to the dual (and conlficted) nature of the industry: a dull and conventional core purpose (let's call it Captain Mainwaring) "wrapped" within an outer shell of dodgy customer service, product mis-selling, and leveraged trading (let's call it Fred Goodwin).

      RBS's "system problems" look to be human failings, and specifically errors in operational planning and risk management. Sound familiar?

      Delete
  2. "Infrastructure is boring and the cost of replacing it is a hit to short-term profits."

    Hi Frances. Not too sure about this point. Would not such a replacement be counted as capital expenditure and, as such, bypass the income statement (save for the depreciation charge)?

    ReplyDelete
    Replies
    1. Not exactly. Hardware and package software is capital expenditure, but software design, development and testing are costs.

      Delete
    2. Am not sure that would be the correct treatment because it breaches the matching concept, that is costs must be matched against benefits/revenues, irrespective of the period when the costs were incurred.

      The annual deprecation charge is the device used to match capital costs against future benefits/revenue.

      It should not matter whether a replacement system is purchased or constructed in-house. If it meets the criteria for capital expenditure and its benefits are not speculative then, at least in my view, it should be capitalised

      In some circumstances there may be a "prudence" over-ride of the matching concept but I hesitate to believe such an over-ride would apply to the costs of a replacement system where they are significant.

      Delete
    3. No, that's not right. Staff costs are the biggest proportion of software development & testing and no way can they be regarded as "capital expenditure". Every project I've ever done has distinguished between capital expenditure on hardware, firmware and package software, and staff costs and other expenses for design, development & testing. A bespoke development under a turnkey arrangement with an outside supplier, or possibly an outsourced IT department, could be regarded as capex, but even then the testing and implementation - and usually the design too - would be mainly in-house staff costs and other expenses. The capital element of in-house software development is quite small.

      Delete
    4. No,that is theoretically wrong. Labour assigned to capital projects should be capitalised.

      All costs incurred in getting an asset to its operational state should be capitalised unless there is a prudence over ride or the costs are not material.

      Delete
    5. I have done a lot of software projects and I have never, ever seen labour treated as capital. Sorry.

      Delete
    6. Well I don't know the specific circumstances of your software projects or even whether your bank was accounting for its assets correctly. Perhaps not, given the bank involved and its record in other matters.

      A major in-house software project, as per the type you allude to in your post, is a capital project. The project's costs are easily traceable to the project (materials, labour and some overheads). These costs should be capitalised and amortised over the expected lifetime of the asset.

      Expensing the costs of such an asset (whose labour cost is likely to be high) in the income statement in the year(s) of construction would indeed give rise to the very problem you identify, where the bank is afraid to replace its system because of the impact on the income statement. This is a reason why such assets should be capitalised and why the bank may have fallen victim to its own flawed accounting.

      Had the asset's costs (including labour costs) been capitalised and taken to the income statement smoothly over the asset's lifetime, then the cost of replacement would not have the impact on the income statement which you suggest acts as a deterrent to the asset's replacement.

      Delete
    7. In-house software development can be capitalised (though you're not obliged to) and amortised over the useful life of the asset. The costs can include design, development and testing, as well as project management and implementation. Training has to be expensed.

      Once capitalised (i.e. on the asset register), the software may also be eligible for capital allowances in respect of R&D, though this requires you to convince HMRC that you've essentially invented something you couldn't buy off-the-shelf.

      Delete
    8. @Dave Tomoney

      Yes, it seems software development costs fall under SSAP 13, the accounting standard related to R & D. You are correct that a company has a choice over whether to capitalise or expense qualifying development expenditure. My only concern is that the Companies Act requires financial statements to show a "true and fair view". I personally don't believe that expensing qualifying software development costs, as per Frances's post, does show a true and fair view. It's only a judgement, I know, but one I am happy to defend.

      @Frances

      The reason why I picked up on this issue is that it ties in with competing explanations of the UK's productivity puzzle. As you know, one camp suggests that GDP is being under stated by ONS, and the other school looks to the UK labour markets for an explanation.

      The GDP school argues, I believe, that GDP (or gross value added) is being understated because of the practice of expensing internally created intangible assets. The expensing of software development costs, which they believe, correctly in my view, should be capitalised is a stark example of their case, I believe.

      I am skeptical that GDP measurement is very sensitive to this matter but, nonetheless, the topic itself throws up some interesting issues

      Delete
    9. There are three problems with the theory that changes in capitalisation policy might be influencing falling productivity.
      1) Inhouse software development is unlikely to be sufficiently large in GDP terms to make a material difference.
      2) Productivity went into a nosedive in the last few years, which would imply a sudden change in policy. Why would this happen?
      3) Had there been a change in many companies simultaneously, the auditors would have noted this. This would have brought it to the attention of the ONS et al.

      Delete
    10. @Dave Timoney

      The GDP school argue that in our post-industrial (?) society investment in hard tangible assets is declining and is being replaced by investment in soft intangible assets. They argue that this trend has accelerated since the 2007/2008 crash and has now become significant. They argue that self-constructed (ie built in-house) intangibles exacerbate the problem

      They argue (I believe) that ONS has not caught up with this trend and continue to measure GDP in the way they always have. Consequently, the GDP school suggests ONS is not treating expenditure on intangibles as investment expenditure and that this new type of capital formation activity is being omitted from GDP (Gross Value Added). In particular, I believe they focus on the case of software development costs to illustrate their stance.

      I am not convinced because software development is labour intensive and most of a project's cost would be labour. Only a small portion of such a project's cost would consist of bought-in materials and services. Labour costs do not affect GVA calculations whereas the cost of bought in materials and services does.

      So in my view, the accounting treatment of self constructed assets (whether tangible or intangible), at least in cases where they are labour intensive, would have only a marginal effect on the GVA (GDP) calculation.

      Purchased intangible assets must be capitalised on the purchasing company's balance sheet as mandated by law. Moreover, it is likely that the selling firm will be classed by ONS as a capital goods supplier. The latter's output would be captured by ONS in the GDP figure as investment expenditure.

      Delete
  3. Of course, the company which develops and implements the replacement software will own the bank's arse. Maybe that's a factor.

    ReplyDelete
  4. Don't know about the specifics of RBS but in another bank successive waves of outsourcing programing and system maintenance made the systems even more fragile.
    These strategically important functions were run by another (outsourced) company with a very small core of personnel skilled and experienced enough to understand the overall systems. Majority of these outsourced staff were abroad and while well qualified only trained in a specific part of the system. Add problems of staff retention and communicating over a distance with people whose first language is not Engliish....

    ReplyDelete
    Replies
    1. Very good points, Fiona. I didn't address those specifically in the post, but outsourcing does tend to increase risk and it may also increase cost. Managing outsourced development and maintenance can be difficult.

      Delete
  5. One cannot black box software just like one cannot black box hardware. Well, more accurately, one can do these things, but in time, hardware degrades and requires replacement, and software becomes obsolete. Patching on functionality as is done in banking and likely every industry is a hack, and in time all these hacks become a mess. Its hard to believe anyone with knowledge of software design principles and practices (e.g., code must only provide specific functionality in one place) would argue this point. Unless of course their job depends on toeing that line.

    ReplyDelete
  6. This comment has been removed by the author.

    ReplyDelete
  7. It is not only the legacy support problem for software and hardware, the multiple functionality and definition problems, the messy interfaces, the hack support, the manual intervention and above all the massive cost of such operations that have been highlighted in your post and the comments.

    It is the fact that banking is restricted by the design of these legacy programs. Contemporary object-orientated languages are capable of much more and with much more security and coherence, which would enable banking to become more efficient.

    The problem then becomes how to get to that position from where we are now because the longer this goes on, the more likely we will be to have a complete banking meltdown with catastrophic consequences. Maybe the way to go would be to migrate the legacy functions to the front end (with suitable firewalling and proper protection) and only when all the functionality is reproduced correctly over a long period of time, remove the legacy system completely. But no doubt there are committees of people trying to work this out as we write! And no doubt the antics of some macho senior management make that an impossible task. Targets and timescales are very difficult to achieve in such an environment.

    I suspect that these issues are common to all banks everywhere in the world, not just RBS and the UK. And if some smart upstart does bring in a contemporary IT approach, once they get big enough they are either crushed by the big guys or bought out.

    Whichever way, true competition is not working to the benefit of the customer paying the bill. So I suspect that only by regulation will banking IT ever be brought into the 21st century

    ReplyDelete
  8. I agree with YankeeFrank. Further, if you can't or don't read the code then you don't know how it works. Documentation is no good here because, over time, it becomes decoupled from the code logic so it may become misleading. If they don't know how their own systems work they must inevitably fall over; especially after they've outsourced the code maintenance.

    ReplyDelete
  9. Re "banking is restricted by the design of these legacy programs". That isn't necessarily a bad thing.

    The core systems of a bank will cover traditional activities such as the posting and reconciliation of daily transactions between accounts. In terms of the business process, they are doing what used to be done with ledgers and quill pens. As the rules of account management have not changed much since the invention of double-entry book-keeping, there has been little need to change these legacy systems up until recently.

    The commercial push for real-time account updates and instant clearing is changing this, however many banks remain reluctant to upgrade their core systems. This ultra-caution is often attributed to the high risk involved in a switchover, but I think it also reflects a conservative desire to hang on to something that is at least comprehensible to bank management.

    The choice of programming language is a minor consideration. The real issue is architectural separation of the core ledger management (which you will always need) from the customer-facing apps (which may come and go over time). Retaining legacy systems has been an effective way of ensuring this prudent isolation.

    PS: Pedantic techie points ...
    a) Ironically, legacy procedural languages like COBOL are better suited to ledger processing than object-oriented languages like C++. The strength of the latter is the ability to change the code (e.g. add a method to a class), but that is typically not a priority for a ledger system.
    b) "One cannot black box software" - well, conceptually, that is precisely what you are doing when you create a class in OOP. This is the principle of encapsulation.

    ReplyDelete
    Replies
    1. There is a fine line between "if it ain't broke, don't fix it" and sensible investment in new software through which a business can evolve seamlessly.

      The basic banking functions haven't changed for hundreds of years now but I suspect that is because the paper-based technology didn't change either. 50 years ago computers started to replace the paper trail and have generally proved their worth.

      Just as web-based applications now do more than just replicate the shop for which they were originally designed, and enable all sorts of new ideas to become profitable businesses, the same is unlikely not to be true in banking.

      I suspect no-one actually learns COBOL these days unless they are in a shrinking number of industries (although I would agree that C++ is not a good idea for transaction processing). This makes the whole process very expensive to run and places the power in a small number of (possibly outsourcing) organisations but also

      I think the underlying issue is that if London wishes to remain a major financial centre, its business processes need to be fit for purpose for the next 50 years. The recent problems at RBS raise some doubt about.

      Delete
  10. Some thoughts:
    -Companies outsourcing of individuals with unique knowledge of the systems in question or the same time their lack of training employees have caused these issues to take place

    -The Business cycle is much faster/demanding than the ability of the IT department (Taking into account issues as above) to complete the required task. For example M&A action, can be quite demanding and instant but the IT systems will take years in order to reach a final stage.

    -Legacy systems can be an issue and there has to be a perspective of long term planning/replacement. This must be in place and must be tracked.

    I think the banks attempt to outsource everything and do not retain core knowledge has/had been a big mistake in the past. You cannot knowledge transfer many years of experience.

    From someone with initial background in the telco sector, I always found it quite scary how basic were the systems/software used/applied in banks.

    ReplyDelete
  11. In the US, all SDLC costs can be capitalized up to the point of user training. Many companies do not elect capitalization. Some reserve treatment for only the largest, multi-million$ types of projects.
    @pacecar86

    ReplyDelete
  12. Almost all banking software in the Northern Europe runs on ancient Java, if one is lucky. Thus old-timers are kept close by in dozens because they are the only ones who can understand it.

    Your claim: "things have not changed much since the beginning of the new millenium" is true

    ReplyDelete
  13. I believe Web time sheet software makes the complete employee time clock tracking task easier. Its easy to update, approve and maintain the time sheets in no time.Time Attendance Software

    ReplyDelete