Root causes of XT failure identified

SUBSCRIBE
Newsletter & Subscriptions Computerworld is New Zealand's only specialised information systems fortnightly.
Subscribe now for $100 (23 issues) and save more than 37% off the cover price!
SIGN UP
Newsletter & Subscriptions
Get the latest news from Computerworld delivered via email.
Sign up now
Open access cellphone infrastructure mooted
  • Share

Telecom and its infrastructure provider Alcatel-Lucent have got to the bottom of the causes of the four major problems in the early days of the XT network, members of TUANZ were assured at a meeting last Tuesday.

However, no absolute guarantee of failsafe operation could be given, the companies say.

LV Martin's slogan “it’s the putting it right that counts” was quoted several times during the meeting. TUANZ CEO Ernie Newman is positive about Telecom’s efforts, but several members clearly believe, in the words of one contributor from the floor, that “what really counts is not getting it wrong in the first place”.

Before and after the formal session the possibility of New Zealand’s three mobile phone companies sharing infrastructure, particularly backhaul and even cellphone transmission towers, was canvassed. It seems wasteful and a target for public protest to have three towers in close proximity when one will do, says TUANZ chairman Chris O’Connell. With less complexity, he suggests, resilience may well improve.

The XT breakdowns had no common underlying cause, say spokespeople for Telecom’s software arm Gen-i and Alcatel-Lucent; they were a sequence of hardware and software failures and a switch misconfiguration due to human error.

The most southern of the XT network’s two original radio network controllers (RNCs) was implicated in at least two of the failures. Measures against a recurrence of the problem include the addition of two more RNCs, already in operation, with a further two planned by mid-year, says Gen-i’s national mobile manager Joe Caccioppoli.

A map of the coverage area of each RNC shows significant overlap covering Auckland, Wellington and Christchurch. People in those centres should at least now have improved redundancy.

All the “rehoming” (the migration of phones) to the South Island RNC has been completed, he says. A total of 400 new tower-mounted amplifiers (TMAs) are being added to the network to improve signal strength and therefore coverage, as well as 50 new cell sites. Also, 2100 MHz transponders will be added to the current 850 MHz equipment in high-population areas.

When they have been installed “we are confident that XT coverage will match CDMA’s,” says Caccioppoli.

Alcatel-Lucent NZ chief technology officer Martin Sharrock emphasises that at no time was there an overload of traffic in the main speech and data channel of the network; it was the signalling channel where the flooding occurred. This carries network administration traffic as well as SMS messages.

Failures such as the collapse of a backhaul router on January 27th (the third of the four incidents) caused cellsites to lose connectivity and “spams” the RNC with signals attempting to reconnect. Alcatel-Lucent and Telecom are now “really focussed on looking at traffic volumes and profiles and making sure we’re not surprised,” Sharrock says.

Despite favourable comments on Telecom’s speed of reaction from senior TUANZ people and on remedial measures and communication around the failures, there were still views from the audience that more could have been done by way of advance testing and that Telecom had perhaps chosen the wrong supplier in Alcatel-Lucent, rather than Ericsson.

Sharrock says testing would not have eliminated human failings.

Caccioppoli pointed out that Alcatel has done sterling work in New Zealand’s fixed and mobile networks for 10 years. He says entrusting the company with XT’s smooth future interoperability in a world where the boundary between fixed and mobile networks is fast becoming invisible. Ericsson, he believes, had done no significant mobile work here.

Sharrock looked to seal the argument by playing an unexpected patriotism card: “If you want your network built by Aussies, choose Ericsson,” he said. “If you want your network built by Kiwis, choose Alcatel-Lucent.”
Comments
the challenges of the New Zealand market-place So it seems that us New Zealanders:
1/ think we know better (technically, commercially, marketing, management, etc., ...)
2/ want it all
3/ want it cheap at costs comparative to Europe and US
4/ have no idea of the challenges that go with commercially modelling a ROI (return on investment) for a new service such as XT in a market-place supported by a mere 4.5 million (assuming there's a 100% subscription).

Ask a builder if every job goes 100% to plan: it NEVER does - new things always crop up. And that's just on a $250k house. So consider a $multi-million project like XT.

Anybody notice how Telstra and Vodafone didn't jump in to criticise Telecom? Perhaps they understand the challenges ...

I do wonder when we will, as a nation, start to support our market leaders with their efforts and commitment to raising the bar rather than criticising them.

Tall Poppy Syndrome? Much? ...

How about taking our hats off to Paul Reynolds for facing the public in his press releases rather than having someone else deliver the message. Personally I have to respect a CEO who takes personal responsibility for resolving the issue.

Just my 2c.

Posted by Anonymous at 13:54:05 on April 5, 2010

Flag abuse

The Design was done by Marketing! An ex-Telecommer has hinted that the original design for XT had 5 RNCs and a GSM failover network. Most of the engineered resilience was stripped out to lower costs and get to market quicker. A gamble that hasn't paid off.
Posted by Anonymous at 16:59:52 on March 31, 2010

Flag abuse

The Design was done by Marketing! "An ex-Telecommer has hinted that the original design for XT had 5 RNCs and a GSM failover network. "

And that would be like installing a digital TV network with analog fallover in case it went off.. Having a GSM failover point would be a backwards technology step plus expensive..
Posted by Paul W at 18:45:05 on March 31, 2010

Flag abuse

The Design was done by Marketing! GSM was never for failover, it was due to GSM being the primary network in use around the world that that time and the unavailability of 3G handsets. GSM is a legacy technology, if they just wanted redundancy they could have built two independent 3G networks.

No one in the world had backups for 2G before 3G existed.
Posted by EX-T at 22:44:37 on March 31, 2010

Flag abuse

The Design was done by Marketing! GSM was never for failover, it was due to GSM being the primary network in use around the world that that time and the unavailability of 3G handsets. GSM is a legacy technology, if they just wanted redundancy they could have built two independent 3G networks.

No one in the world had backups for 2G before 3G existed.
Posted by EX-T at 22:43:55 on March 31, 2010

Flag abuse

Network Management The events listed are the root cause for the start of the outage.
Network management is what keeps a network operating after somthing doesnt go to plan. The XT network is a new network and it takes time for the network management to learn how best to keep the core network functioning when all is not well. The Alcatel hardware is just the fabric, how it is operated makes the service.
There were executive decisions made here that led to the size of the customer impact
Posted by Anonymous at 16:53:55 on March 31, 2010

Flag abuse

Network Management Actually the operations are outsourced to ALu as well.
Posted by Anonymous at 20:15:53 on March 31, 2010

Flag abuse

Not Great Root Cause Analysis Well, it is a bit lightweight. Hardware failures I'll accept, grudgingly, even if they should be fully redundant.

Software Failures. Ask why 5 times. Was the software tested (in factory and here in NZ)? Was there a backout plan and was that tested? Were the people involved trained? Were the Software installation procedures up to scratch. How come Auckland RNC software seems OK (so far).

A switch misconfiguration due to human error. Ask why 5 times. Was the configuration tested? Was the human trained and supported? Did the human have instructions? and were they actually followed? Isn't there a software check that asks "are you sure you want this misconfiguration?"

And how come there was no redundancy/failover at all?

Can't wait for the publication of the real review.
Posted by Anonymous at 15:13:34 on March 31, 2010

Flag abuse

Not Great Root Cause Analysis Redundancy isn't really possible at this time. Each base station must be homed to a single RNC. If the RNC fails they must be manually moved to another. In 3G release 5 the connectivity changed from E1 to IP, which makes it possible to implement RNC redundancy. This feature is is very new and requires changing the whole national 3G voice network from ATM to IP, which requires significant change and testing. I'm not sure if this feature is fully supported by any major telco vendor yet, most are only in testing.
Posted by Ex-T at 22:59:26 on March 31, 2010

Flag abuse

Supplier choice According to Caccioppoli & Sharrock's logic when a Kiwi needs heart surgery they should choose a Kiwi GP over an Aussie heart surgeon. Is that really the best argument they have? This decision methodology is OK when deciding which bottle of wine to buy, but a mobile network supplier?
Posted by Anonymous at 12:00:59 on March 31, 2010

Flag abuse

IPv6

computerworld
Computerworld NZ has now reached LinkedIn! Join to expand your networks and meet others interested in information systems.