logo
#

Latest news with #Cybersmart

EXCLUSIVE: Cybersmart founder owns the internet outage
EXCLUSIVE: Cybersmart founder owns the internet outage

Daily Maverick

time15-05-2025

  • Business
  • Daily Maverick

EXCLUSIVE: Cybersmart founder owns the internet outage

'This lesson could help the industry,' Cybersmart cofounder and CTO Laurie Fialkov told Daily Maverick in a candid interview as the national outage dust settles. The criticism for the outage that brought Cybersmart customers to a digital standstill earlier this week, came swiftly. And so did an apology – and then, unusually, an unvarnished account of what really went wrong. Cybersmart reached out to Daily Maverick to let founder and CTO Laurie Fialkov pull back the curtain on 39 sleepless hours that nearly broke the network, the business, and a few engineers along the way. (Spoiler: it was not a cable break.) 'We did screw up,' he admitted. 'That kind of outage is long in this industry.' The Cape Town-based ISP and fibre network operator experienced a near-total service disruption starting around midday on Monday, 12 May. What followed was a cascade of misdiagnoses, desperate rewiring, and a rude awakening to the dangers of old hardware lurking in even the most redundant systems. From café to carrier Fialkov's internet journey began in a Sea Point internet café called Inthenet in 1996. Back then, a 33.6kbps modem counted as high-speed. Cybersmart, as the business became known in 1998, grew slowly and steadily. It's now a national network operator with thousands of businesses and residential customers, a fleet of fibre infrastructure and what was – until this week – a 22-year record of uninterrupted uptime. 'We spend so much time trying to be the ISP that never goes down,' Fialkov said. 'We've got everything. Multiple cable systems, battery redundancy, ringed networks… and we just never had an outage.' But this week, everything went down – silently. Phantom signals It started quietly. Customers called Fialkov directly – just a few at first. Network monitoring showed all systems green. 'I can reach the whole network. It's impossible that we are down,' he recalled thinking. 'But then you get 15 calls in five minutes, and it's got to be an issue.' Outside-looking diagnostics (via 'looking glasses' – remote tools that simulate connectivity from various points) revealed the horror: Cybersmart's Autonomous System Number, AS36874, had essentially vanished from parts of the global internet. 'Like this shell on the internet – completely isolated.' A Denial-of-Service attack seemed a likely culprit. NexusGuard, Cybersmart's DDoS mitigation partner, was called in – only to say: Not a DOS. Then, the real enemy emerged: old gear. The ghost in the chassis The culprit? Legacy Cisco 6500 routers – high-end switches that formed the spine of Cybersmart's original network. 'These things have been working for 15 years. End of life, yes. But working. We were meant to replace them. But… if it ain't broke, right?' Until it broke. Hard. One router in Johannesburg froze. Another in Cape Town followed. 'Too coincidental,' Fialkov said. Then more went dark. 'It was like a cancer – six machines, six different places, all failing.' The root cause? A global routing table explosion. New peers (China Telecom, Hurricane Electric, Saudi Telecom) dumped an extra 150,000 routes into the internet's core. The old routers couldn't cope – they choked and silently failed. With no support (Cisco dropped them years ago) and no viable fix, the team made a call: rip them all out. Wait, what is a 'routing table explosion'? At the core of the internet is a global 'routing table' – a constantly updated map showing how data travels between networks. On Monday, three major networks (China Telecom, Hurricane Electric, and Saudi Telecom) suddenly added around 150,000 new routes to that map. The result? A routing table 'explosion'. Older routers – like these legacy Cisco 6500s still used in parts of Cybersmart's network – couldn't cope. These machines rely on specialised memory with strict limits. When overloaded, they didn't crash loudly; they just stopped forwarding traffic, silently dropping data. These left parts of the internet unreachable, even though the hardware appeared 'green' and online. It wasn't a cyberattack or a power cut. Just old infrastructure overwhelmed by a sudden global change – and management not retiring it soon enough. The first cut, and the deepest That decision triggered a national network reconfiguration. More than 180 switches and 65 PPE servers had to be re-patched, reconfigured and brought back online. Fialkov described the operation as 'cutting out the cancer'. The operation took 39 hours and 16 minutes – a truly Herculean effort. 'There are some guys who haven't slept for 40 hours now. They really showed up for us,' he said. By Wednesday morning, most services were restored. But not without a cost – to Cybersmart's reputation and its customers' businesses. Lessons in humility 'This has been a life lesson,' Fialkov said. 'You get too arrogant. Twenty-two years without an outage, and you start to believe your own myth.' He admitted that the company had been sitting on a known problem: ageing infrastructure, flagged for replacement years ago. 'We'd been looking at the same thing for four years. Working perfectly. Until it didn't. Took us out at the knees.' Still, he insists the issue wasn't a lack of contingency. 'We've got spares. We've got redundancy. This was human complacency. We left something broken in the network for too long.' Heartfelt apology to the 'R399' Interestingly, he says the customers hardest hit weren't the big corporates – it was the small businesses. 'The R399 customer? That's the guy who might be running his whole business off one link,' Fialkov said. 'An outage like this could be the end of him.' He told Daily Maverick how he spent 11 hours on a support call with one such customer, trying to assure them their business would survive. That sobering reality drove home what Cybersmart had become. 'Ten years ago, no one would've noticed if we went down. Now? The whole country feels it.' Where to now? There are still issues being resolved, and some customers are wrongly blaming Cybersmart for unrelated faults. But for the most part, the network is back. Fialkov's candour in this moment of failure is unusual in South Africa's telecoms industry – and maybe even refreshing. 'You owe your customers a service,' he said. 'And if you can't deliver it, you must be called out on that.' And then, just like that, the fibre was (mostly) back – but the scar remains. DM

‘Everything just sucks' — tracking Cybersmart's 44-hour nationwide outage
‘Everything just sucks' — tracking Cybersmart's 44-hour nationwide outage

Daily Maverick

time14-05-2025

  • Daily Maverick

‘Everything just sucks' — tracking Cybersmart's 44-hour nationwide outage

For thousands of Cybersmart customers in SA, Monday turned into a digital nightmare that's still a reality for some. It's 8.02pm on Monday, and I'm watching the end of Netflix's Nonnas with my wife because we both fell asleep about a third of the way through last night. *Bzzzt bzzt* 'Dude, do you know what's up with the internet?' reads the message on my Fitbit — it's still the best sleep tracker, okay… 'I'm with Openserve and it went off at about 6. But all my diagnostics point to me being connected to the internet.' Internal thoughts (my wife already raised an eyebrow): Strange, I'm also on an Openserve network — must be a localised outage that I'm too tired to deal with now. Tuesday, 6am. *Bzzzt bzzt* Workshop17 (the operational home of Daily Maverick): 'Good morning, all. Unfortunately, we're starting today with ongoing internet disruptions due to unresolved issues from our service provider. While we've implemented our failover solution to maintain some level of connectivity, performance may be inconsistent across our location. You might experience slower speeds, dropped connections, or difficulty accessing certain online services throughout the day. 'We've been in constant communication with the ISP [internet service provider], but so far we've received no clear timeline for full restoration…' Oh, crap. First frustrating hours Digging through forum posts and Downdetector logs, it emerges that troubles began around 12.40am on Monday, when Cybersmart customers nationwide suddenly found themselves unable to connect to the internet. Even the company's own websites and status pages went offline, leaving customers in an information void. 'They just keep increasing it by two hours when they're close to missing their previous two additional hour estimation,' wrote one exasperated user on a tech forum as the company repeatedly extended its estimated time to resolution (ETR) from an initial four hours to six, then eight and eventually 12 hours. Cybersmart is one of South Africa's veteran ISPs and the operator of the Lightspeed fibre network. This old man of the internet has spent the past 44 hours battling what began as a 'core switch failure' but has since been described as a 'routing issue on legacy hardware'. The outage has left customers from Cape Town to Johannesburg disconnected, frustrated and increasingly vocal about their dissatisfaction. 'Everything just sucks' The technical symptoms varied, but the experience was universally described in a techie WhatsApp group as 'everything just sucks'. Customers reported massive packet loss, difficulty accessing websites and painfully slow connections when anything worked at all. Data from Cloudflare Radar confirmed what customers were experiencing: a sudden and sharp drop in internet traffic from Cybersmart's network (AS36874) and fluctuating Border Gateway Protocol announcements, an indicator of serious routing instability. By Tuesday morning, the company had updated its ETR to 'undetermined' — a change that did little to reassure increasingly anxious customers who had already spent nearly 24 hours offline. Into the news cycle Tuesday 2.22pm. *Bzzt bzzt* 'Okay, it seems deeper… I've dug a bit,' says a close friend who heads up IT for a big firm. 'EASSy and Seacom suffered faults near Mtunzini.' Those are two massive undersea cables that make landfall in KZN — you'll recall the massive outage this time last year on the west coast of Africa, and this would be a hilarious mirror of that if true. Founded in 1998 with roots tracing back to an internet café established in 1996, Cybersmart has grown from a basic ISP to a significant infrastructure player in South Africa's digital landscape. The company operates as both an ISP and a fibre network operator, with its Lightspeed brand representing its high-speed fibre-optic internet services. In 2022, the company received substantial investment from Infra Impact Mid-Market Infrastructure Fund 1, which acquired a minority stake to fuel the expansion of Cybersmart's fibre network. Cable games On the technical side, Cybersmart has invested in advanced network architecture. This technology essentially allows Cybersmart to send multiple separate internet connections down that single strand of fibre at the same time. They use a couple of variations of this 'coloured light' technique (known as CWDM and DWDM) to pack in as many of these separate 'lanes' as possible. The end result? Cybersmart can use a single strand of fibre optic cable to provide fast and reliable internet to up to 40 different customers. It's a very efficient way to deliver fibre internet. Importantly for this story, the company has also secured capacity on major undersea cables, including SAT-3, Seacom and WACS for international connectivity, and provides network-to-network interfaces to other ISPs at key data centres around the country. In an unusual move for a South African ISP, Cybersmart designs and manufactures its own customer premises equipment, aimed at delivering better fibre protection and cost-effectiveness. Yet despite these investments, the current outage has exposed critical vulnerabilities in the company's infrastructure, particularly in its core network components and the apparent lack of redundancy for legacy hardware. The long road to recovery Tuesday 4.27pm. *Bzzt bzzt* 'The internal team is on it and I will have feedback as soon as they revert,' says my contact at Seacom. *Bzzt bzzt* I've lost track of time under the weight of deadline pressure from my editors wanting to know if I'm 'on it'. 'Client just got back and confirmed that there is no problem at Mtunzini — they had load shedding from 16:00 to 17:00 when they ran on a generator, but the grid power was restored at 17:00 when all returned to normal. They did not experience downtime. I trust that this helps. All the best.' Crap. By Tuesday night, there were signs of life as local routing between Johannesburg and Cape Town began to stabilise, though with intermittent drops. Engineers appeared to be working through the night, with a brief window of connectivity observed between 2.59am and 3.21am on Wednesday. By Wednesday morning, Cybersmart announced that the 'majority of services were back online'. However, customer reports contradicted this optimistic assessment. 'Some areas, like Vuma in Cape Town, are still totally dead as of Wednesday morning,' reported one user, marking nearly 44 hours of complete disconnection for these customers. Cutting losses Wednesday 9.28am. *Bzzzzzt* 'I got one response saying they need to check. Almost feels like we weren't impacted, otherwise someone somewhere would have made a noise. Let me wait to hear from the other guys and I'll come back to you,' says my Cassava Technologies contact — Liquid Intelligent Technologies is an owner of EASSy. The prolonged outage has proven to be the last straw for many loyal customers. 'I'm biting the bullet on the notice period and getting onto Afrihost ASAP,' declared an IT manager in a WhatsApp group, referring to their willingness to pay cancellation fees just to escape what they perceived as an unreliable service. A particularly telling comment came from someone identifying themselves as a former employee: 'Sad to see what's happened to the company; I wonder if they will recover from this event as a company.' The situation has been especially dire for customers in buildings or complexes with exclusive Cybersmart agreements, who found themselves completely stranded with no alternative connectivity options. Questions of contingency Industry observers and customers alike have raised questions about Cybersmart's preparedness for such failures. While the company attributed the outage to a routing issue on legacy hardware, the extended and escalating repair time suggested deeper problems. 'Seems to me they probably lost a key piece of kit somewhere and they had no redundancy or any real contingency plan,' observed one forum participant. Others speculated about possible DDoS attacks, though the consistency of the packet loss and the extended duration led many to discount this theory in favour of simple hardware or routing software failure. Interestingly, Lightspeed clients connected through Afrihost remained online throughout the crisis, suggesting the problem was within Cybersmart's own ISP layer rather than a complete failure of the underlying Lightspeed FNO fibre infrastructure. For now, as engineers work to bring the final affected areas back online, customers are left wondering whether this outage represents a one-time perfect storm or a sign of deeper infrastructural problems that may resurface in the future. It's all connected What this means for you Everything is connected — literally. South Africa's internet doesn't just travel along streets in fibre cables; it relies heavily on a few undersea cables like Seacom, EASSy, SAT-3 and WACS to connect us to the rest of the world. A fault on one of these can bottleneck or break international connectivity, affecting multiple ISPs at once. One weak link — many broken ones. Even if your local fibre line is fine, your ISP's upstream network — the 'core' — could be compromised. If that network lacks redundancy (a backup plan), then a single hardware or routing failure can take entire regions offline. Outages ripple outward. When a network like Cybersmart's goes down, it doesn't just affect its own customers. Other ISPs that rely on its infrastructure (or who peer traffic with it) can be caught in the crossfire, creating a cascade effect across the internet. Backups aren't magic. Many ISPs and businesses use failovers (alternative routes), but these don't always work perfectly. If the core routing is down or if backups aren't configured properly, those plans can fail too. Check your provider's peering and redundancy. Not all ISPs are equal. Some invest more in network resilience, multipath routing, and independent upstream providers. Before switching, ask how they handle outages and what cables or networks they depend on. Internet in South Africa works — until it doesn't. And when it breaks, it can take days to figure out why. Choose ISPs that are transparent, well-peered and communicate clearly when things go wrong. DM

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into a world of global content with local flavor? Download Daily8 app today from your preferred app store and start exploring.
app-storeplay-store