Latest news with #operationalresilience

Finextra
3 days ago
- Business
- Finextra
Ensuring operational resilience in 2025 – why the status quo no longer works
0 This content has been created by the Finextra editorial team with inputs from subject matter experts at the funding sponsor. Operational resilience is on all UK payments leaders' minds. In 2024, 95% of business leaders stated that they're aware of operational weaknesses which leave them vulnerable, yet 48% said their organisations aren't doing enough to improve resilience. The European Union (EU)'s Digital Operational Resilience Act (DORA) – having come into effect on 17 January 2025 – is the regulators' push towards improved operational risk incident management across the industry, but how have financial organisations fared when it comes to readiness? What else needs to be done from an infrastructure perspective to achieve greater resilience? Finextra spoke with Rob Reid, technical evangelist at Cockroach Labs, about the company's research on the state of resilience across financial services; how to effectively achieve operational resilience; and what needs to be done to reap benefits that extend beyond mere compliance. How outages turned into the new normal While outages have become common in organisations – Cockroach Labs found organisations experience on average 86 a year – it's major infrastructure blackouts that grabbed the headlines in 2024. The Bank of England reported seven outages to the UK's RTGS (Real-Time Gross Settlement) and Clearing House Automated Payment System (CHAPS) in 2024. Most notably, in July 2024, CHAPS failed and delayed large, time-sensitive payments, which had a substantial impact. With CHAPS usually enabling 200,000 payments a day – with an average daily value of £345 billion – a system shutdown of this scale (245 minutes in fact) resulted in considerable losses. That very same month, the CrowdStrike outage caused global chaos, affecting over 8 million Windows devices. The fallout was immense, with GPs unable to treat patients; hundreds of businesses reporting revenue losses; and planes being grounded globally, leaving travellers stranded in airports. 'Once you've hit rock bottom, the only way is up. And I think we're going to need to see changes,' said Reid, when speaking on the state of operational resilience in financial services. 'The technologies and practices used across the industry aren't keeping up with the needs of modern resilience requirements. Fundamentally, if what we had was working, we wouldn't have DORA.' The state of resilience in light of DORA The EU's DORA officially took effect on 17 January 2025, providing a universal framework designed to enhance information and communication technology (ICT) risk management. 'I am a software engineer with an almost comically low-risk appetite. So, as you can imagine, I've been bemoaning lacklustre operational resilience for many years,' commented Reid. 'DORA is a much-needed wake up call for the industry. I wish we would have had it years ago, because as a software engineer at the coalface, I would have had something to wield.' In order to understand the state of resilience going into 2025, Cockroach Labs surveyed 1,000 senior cloud and technology executives. Alarmingly, the data showed that while 94% of technical executives stated that the CrowdStrike outage encouraged their organisations to reassess their risk management, the operational resilience reality still looks bleak: 93% of leaders are concerned about the financial and organisational impacts of outages; 95% are aware of operational weaknesses that leave them vulnerable; 53% of banking and financial services companies report experiencing service disruptions at least weekly; 20% of respondents describe their organisation as fully prepared for outages; 33% have an organised response approach, and less than a third conduct regular failover testing. Speaking on the results, Reid emphasised: 'Every single person we spoke to reported revenue loss as a result of downtime in the last 12 months. On average, businesses are seeing 86 outages per year, with the average downtime lasting more than three hours. In terms of approaches, this hints at an industry-wide tendency of being reactive to downtime, and I would question whether teams are being given the time, space, and resources required to make meaningful, positive changes in preventing it.' Considering the research was conducted at the end of last year, it is surprising to see how little progress organisations have made toward operational resilience – especially given the DORA deadline. However, considering how much information geared toward DORA readiness has been available, these results show that it might be an issue of agility rather than an issue of understanding. 'Consider DORA from the perspective of a company with aging technology and infrastructure,' commented Reid. 'This all serves to reduce their ability to innovate. They're having to manage all of this potentially archaic infrastructure, let alone react with agility. And it's not only DORA, there is GDPR [General Data Protection Regulation], there is CCPA [California Consumer Privacy Act], and a host of other regulations. Add to that a disaster recovery mindset, necessitated by the presence of primary/secondary architecture, and you've got a perpetuation.' So how can organisations go beyond the minimum requirements of DORA to develop holistic operational resilience strategies? Developing modern resilience strategies For organisations running primary/secondary architecture, failovers and failbacks are key concepts of resilience and disaster recovery. A failover is the process of switching to a backup, secondary system or site when the primary architecture fails – ensuring business continuity – while failback refers to the process of returning to the primary system once the issue is resolved. Reid explained that many organisations are running primary/secondary architectures 'with the hope that things don't go wrong. Because if something goes wrong, they need to fail over, and that is risky. Some businesses never fail back because of the risk associated in failing back to the primary architecture. However, hope is not a strategy. Modern and capable technology must be considered if we are to move beyond the traditional primary/secondary failover mindset, and businesses should be considering technologies that minimise RTO and RPO.' RTO (recovery time objective) is the amount of time that an organisation will be down following an outage, which, according to Reid, should be measured in seconds, not minutes or hours. RPO (recovery point objective) is the amount of data that an organisation loses in an outage. 'And that should be zero,' he argued. 'Let's assume you have a traditional database that you are backing up every hour. That's up to one hour of data that you're going to permanently lose in the event of an outage, simply because you didn't back up more regularly within that time window.' Thinking beyond the primary/secondary architecture approach, self-healing technology is the more modern approach in achieving effective operational resilience. Referring to applications that are capable of detecting, diagnosing, and repairing their own issues without human intervention, self-healing technology – made even more powerful through machine learning and artificial intelligence (AI) – enables organisations to better manage their systems' availability. Crucially, self-healing technology can work both reactively as well as preventatively which, according to Reid, is not just important for systems, but for employees as well. In order to achieve reliable availability, the mindset within organisation needs to start rewarding prevention more than finding solutions to existing issues: 'Do employees get more recognition for putting out fires, or do they get more recognition for preventing fires in the first place? Preventing fires will inevitably be a lot less visible if the reward culture celebrates firefighting,' emphasised Reid. 'Businesses can and should be adopting self-healing and distributed technologies. This places the burden of operational resilience on software instead of people, and that frees people up to innovate.' Operational resilience in 2025 and beyond In 2025, downtime is no longer tenable. Resilience, in its many forms, must be made a priority. A failure to comprehensively overhaul and modernise systems and processes will inevitably incur disruptions. 'DORA is the recognition that the status quo isn't doing enough to keep businesses online, and it should be seen as an opportunity,' finalised Reid. 'DORA will shore up trust in the industry as a whole, and each of those businesses that work within it are going to contribute to that. I have watched organisations reap the benefits of self-healing applications. Modern technology has the potential to completely revolutionise the way we approach operational resilience.' It is now imperative for financial institutions – both banks and regulated, non-bank financial institutions – to ensure business continuity meets organisational needs in an increasingly volatile global environment.

Finextra
4 days ago
- Business
- Finextra
Compliance, IT resilience, productivity: the case for Digital Employee Experience in finance: By Dominic Mensah
In 2025, the UK Treasury Committee revealed that leading banks and building societies experienced more than a month's worth of IT outages in just two years. These weren't caused by cyber attacks but by internal system failures, exposing a broader weakness in how the IT organization boosts operational resilience. Despite an abundance of performance data, many firms still lack visibility into what really matters: how systems function at the point of use. Without insight into the end-user experience, problems often go undetected until they escalate into serious disruption. Digital Employee Experience (DEX) platforms help close this gap by providing real-time, experience-level data that enables earlier detection, faster resolution, and stronger operational resilience. Traditional endpoint monitoring focuses on metrics such as device availability, and infrastructure health, but these don't tell the full story. Friction at the user level, such as sluggish applications, login failures, or system crashes, often slips under the radar until productivity takes a hit. For traders on the floor, a momentary delay can mean missed market opportunities. But the impact extends across the organization: financial advisers, compliance teams, operations staff, and contact centre agents all rely on fast, stable systems to serve clients, and meet regulatory requirements. When digital devices underperform, the consequences are immediate and widespread. DEX introduces a new layer of observability– illuminating how the health of the endpoint, where employees actually interact with them. This visibility allows IT teams to move from reactive troubleshooting to proactive service delivery. Real use cases for DEX in financial institutions This section explores how DEX is being used to address key operational challenges, from supporting significant digital transformation projects, reducing IT support tickets, and streamlining service desk functions to optimising the digital estate: Enhancing digital transformation projects : DEX platforms are essential for facilitating key digital transformation initiatives, including operating system migrations such as Windows 11, VDI adoptions, and cloud transitions. By harnessing data-driven insights gathered from thousands of endpoints every few seconds, these platforms inform decisions, mitigate risks, and boost efficiency. Their deployment also hastens the realisation of strategic project value. For example, through the optimisation of endpoint management, one financial institution discovered £3.4 million in unused software licences, uncovering a savings opportunity that would allow them to redirect these resources towards high-impact digital transformation investments. : DEX platforms are essential for facilitating key digital transformation initiatives, including operating system migrations such as Windows 11, VDI adoptions, and cloud transitions. By harnessing data-driven insights gathered from thousands of endpoints every few seconds, these platforms inform decisions, mitigate risks, and boost efficiency. Their deployment also hastens the realisation of strategic project value. For example, through the optimisation of endpoint management, one financial institution discovered £3.4 million in unused software licences, uncovering a savings opportunity that would allow them to redirect these resources towards high-impact digital transformation investments. Ticket deflection and service desk efficiency : DEX technologies equipped with self-healing capabilities and AI-assisted tools have significantly enhanced IT service desk operations by automating the resolution of many issues that previously required human intervention. This shift towards automation has led to substantial decreases in ticket volumes across the financial service sector. As a result, IT staff can now allocate more time to strategic tasks, streamlining overall service desk efficiency. For example, one bank with previously high service desk call rates has reported saving approximately £232,000 per year by detecting and resolving technology issues before they impact key personnel. : DEX technologies equipped with self-healing capabilities and AI-assisted tools have significantly enhanced IT service desk operations by automating the resolution of many issues that previously required human intervention. This shift towards automation has led to substantial decreases in ticket volumes across the financial service sector. As a result, IT staff can now allocate more time to strategic tasks, streamlining overall service desk efficiency. For example, one bank with previously high service desk call rates has reported saving approximately £232,000 per year by detecting and resolving technology issues before they impact key personnel. Cost reduction through IT asset optimisation: DEX data supports software licence rationalisation and hardware life extension. This strategic use of endpoint data has led to significant cost savings. For instance, a New York City-based financial institution avoided approximately £7.5 million in unnecessary laptop refreshes by leveraging DEX insights to assess real usage, realising that 91% of its laptops planned for annual refresh did not need replacement based on performance indicators. Endpoint data in transformation and compliance As we have already established, in financial institutions, endpoint data from DEX platforms is pivotal for both transformation projects and compliance. This data offers detailed insights into system performance and user interactions, crucial for orchestrating major initiatives such as operating system upgrades and cloud transitions. It also aids in assessing employee readiness and identifying system-level risks, ensuring smooth implementation and minimal disruption. Further, endpoint data enhances compliance with frameworks such as the Digital Operational Resilience Act (DORA) and the Financial Conduct Authority's (FCA) guidelines, which require a deep analysis of how disruptions impact critical business services. Traditional monitoring systems often fall short in providing the necessary visibility into endpoint-level interactions, a gap filled by DEX platforms. For instance, during a critical system outage, the swift analysis of real-time endpoint data can lead to immediate problem identification and resolution, significantly reducing downtime and the overall impact on operations. By integrating endpoint data into their strategies, financial institutions not only comply with stringent regulatory demands but also strengthen their overall resilience. This capability enables them to manage risks effectively and ensure continuous service delivery, even amid potential disruptions. DEX drives employee productivity and satisfaction DEX data provides strategic advantages for financial institutions by optimising IT asset management and reducing costs through the alignment of device usage with specific job roles. For example, by creating differentiated services and role-based environments for traders, bankers, and agents, an investment management firm used DEX data to prevent potential revenue loss from failed financial trades. Additionally, it significantly enhances employee productivity. According to Deloitte's Human Global Capital 2024 report satisfied employees are approximately twice as productive as their unhappy counterparts. By linking digital experience metrics to productivity and compliance outcomes, DEX insights enable more informed decision-making. With more than 80% of surveyed workers indicating that an enhanced work experience would improve their productivity, the role of DEX in boosting operational efficiency and resilience is undeniable. In summary, experience-level data has evolved from a luxury to a critical necessity for operational resilience and regulatory compliance in financial services. The 2025 findings from the UK Treasury Committee underscore the urgent need for enhanced visibility into how digital services impact user experiences and business outcomes. Financial services leaders must now ensure they have precise insights into user interactions to proactively address digital friction and device degradation that can compromise operations. Endpoint device intelligence represents more than just an enhancement of IT management; it signifies the end of IT as we know it in data-driven operations at financial services firms. This paradigm shift towards proactive service delivery and strategic decision-making is redefining competitive financial operations. As the sector continues to evolve, embracing this data-driven approach will be essential for maintaining compliance and spurring innovation, positioning financial institutions to not only respond to but also anticipate the demands of a rapidly changing market.