Outage at IBM’s Auckland datacentre worsens
LATEST NEWS
SUBSCRIBE
Computerworld is New Zealand's only specialised information systems fortnightly. Subscribe now for $100 (23 issues) and save more than 37% off the cover price!
SIGN UP
Service to clients down since 3am Monday, IBM has no updates
By Stephen Bell | Wellington | Wednesday, 20 February, 2013 | 103 Comments
IBM’s problems with its datacentre in Auckland are apparently being compounded by new issues. The service has been down since 3 am yesterday (Monday).
Requests for information for an update today (Wednesday) from Computerworld were unanswered as of 11am. We have been referred to a company contact in Melbourne.
The $80 million datacentre, which opened in May 2011, provides cloud service to major IBM clients.
“While working to resolve the outage of New Zealand Virtual Server Services, a new technical issue was experienced which continues to affect the delivery of this particular service from the data centre,” says an emailed statement from an IBM spokesperson, in response to Computerworld’s enquiry.
“Our team of global experts continue to work on this as a high priority and remain committed to restoring service for our clients.”
Comments
Happens to everyone
Microsoft's Azure has just had a global outage...
Posted by Anonymous at 8:08:22 on February 25, 2013
Posted by Anonymous at 8:08:22 on February 25, 2013
Is Cloud Services a Good Thing ?
Cloud Services is wonderful in theory in a perfect world, however the world is not perfect, there are many potential issues to factor into account. I.E Virus, Data Corruption, Hardware Failure, Human Error. This becomes a real challenge when designing shared storage platforms, as a LUN is often assigned to many virtual machines...this means a rollback impacts multiple VM's usually...For this reason I am not a fan of Cloud Services, or whatever fancy name they come up in the future for a shared platform...Guys (IT managers), do yourselfs a favour and run your system internally and set them up correctly to begin with allow individual rollbacks of VM's, without affecting other productions systems. Put a good aunt of planning into DR at the same time..
Posted by Anonymous at 21:57:10 on February 24, 2013
Posted by Anonymous at 21:57:10 on February 24, 2013
Is Cloud Services a Good Thing ?
Not sure i agree... Most internal ICT depts, particular government agencies, do a much worse job than a Cloud can deliver. The other problem is that Cloud at some point becomes kind of mandatory. There will be a point where it is the only way you can buy services. Both Oracle and Microsoft are already a long way down that track and within seven years if you're not on Cloud, then you're probably not on anything.
Posted by Anonymous at 7:04:55 on February 25, 2013
Posted by Anonymous at 7:04:55 on February 25, 2013
Hmmmm
We're not getting any closer to Godwin's law with this thread...
Posted by Anonymous at 18:02:04 on February 24, 2013
Posted by Anonymous at 18:02:04 on February 24, 2013
The Real Facts....
Ok, So here are the real facts....
Their IBM XIV storage was due an upgrade, a new cache card was installed that was faulty and it took the entire array down. It then took 3 Days to re-store the array from back-ups. This has come from a customer effected by the outage.
So much for XIV being Tier 1 Storage.... In my opinion, there are only 3 suppliers that produce Enterprise Grade Storage in the IT market: HDS - VSP; EMC - VMAX & IBM - DS800 range... Everything else is midrange with single points of failure.
Posted by Anonymous at 22:08:52 on February 22, 2013
Their IBM XIV storage was due an upgrade, a new cache card was installed that was faulty and it took the entire array down. It then took 3 Days to re-store the array from back-ups. This has come from a customer effected by the outage.
So much for XIV being Tier 1 Storage.... In my opinion, there are only 3 suppliers that produce Enterprise Grade Storage in the IT market: HDS - VSP; EMC - VMAX & IBM - DS800 range... Everything else is midrange with single points of failure.
Posted by Anonymous at 22:08:52 on February 22, 2013
Is it still a 'fact' when it's WRONG!??
This quality of the comments on this forum really staggering.
Seriously people how about we stop bashing IBM because they are big blue yanks and look at the facts. Not that IBM should be smelling of roses, there is some serious rot; but lets at least be rational about what the issues really are.
As previously noted in other comments, the data centre did not go down, a single virtual server hosting offering went down. The reporting on this across the NZ media has been disgraceful and fear mongering.
The comment titled "IBM NZ lack quality engineers" is just an example of NZ's tall poppy syndrome and the poster obviously lacks understanding of the complexity and opportunity for failure that exists in such a complex system.
I don't see anyone claiming that Amazon Web Services 'dont have the engineers capable of running a cloud' despite the fact that in the last six months of 2012, they suffered 3 major outages, one which continued for >70 hours and resulted in customer data loss. And it's not just AWS and IBM NZ clouds who have suffered issues. Last year, Tumblr, GoDaddy, Salesforce, GoogleTalk, Dropbox, Google Apps, Office365, Azure all suffered major customer impacting outages.
People need to understand that architecting business IT using cloud platforms requires you to think differently to how it was done when we all owned all our own hardware.
Secondly - IBM don't comment locally because they are not allowed to. Thats a corporate policy and has nothing to do with the calibre of the people in country.
Thirdly - the availability advertised is three nines, not five nines and many of the customers are on older contracts which were sold with an SLA of 98.5%. Furthermore for cloud services, the uptime is typically calculated on a month by month basis, not annual basis.
Finally - people who are are running critical business services need to understand there DR posture and the resulting risk. If they choose to put all their services into a single site with no DR plan, they will almost certainly at some point suffer an outage. If your services are that critical, plan for disaster and don't put all your services into a single site offering. It's not rocket surgery.
The real problem at IBM NZ is the management (from the very top in the US, all the way to exec level in AU/NZ) have run the delivery organization in to the ground, driving down moral and cutting staff levels to the bone, forcing overworked smart technical engineers to focus on mundane compliance tasks that could or should be automated but have not been due to complex, cumbersome and frequently changing process.
Your source who is purportedly a customer and provide you this hearsay knowledge obviously received this knowledge via Chinese whispers as it simply incorrect.
Fact #1. It wasn't XIV storage.
Fact #2. It was not related to planned work or changes - it was a fault.
Fact #3. No data was lost.
Fact #4. There was some corruption with a small subset of data which was successfully restored from recent backup.
Fact #5. There are many other storage appliances that are worth of the 'enterprise' tag. I'm glad you prefixed your inaccurate opinion with the fact that it was just, your ill informed opinion.
Jeeze - hearsay information is simply not 'a fact'
Posted by Anonymous at 12:52:17 on February 23, 2013
Seriously people how about we stop bashing IBM because they are big blue yanks and look at the facts. Not that IBM should be smelling of roses, there is some serious rot; but lets at least be rational about what the issues really are.
As previously noted in other comments, the data centre did not go down, a single virtual server hosting offering went down. The reporting on this across the NZ media has been disgraceful and fear mongering.
The comment titled "IBM NZ lack quality engineers" is just an example of NZ's tall poppy syndrome and the poster obviously lacks understanding of the complexity and opportunity for failure that exists in such a complex system.
I don't see anyone claiming that Amazon Web Services 'dont have the engineers capable of running a cloud' despite the fact that in the last six months of 2012, they suffered 3 major outages, one which continued for >70 hours and resulted in customer data loss. And it's not just AWS and IBM NZ clouds who have suffered issues. Last year, Tumblr, GoDaddy, Salesforce, GoogleTalk, Dropbox, Google Apps, Office365, Azure all suffered major customer impacting outages.
People need to understand that architecting business IT using cloud platforms requires you to think differently to how it was done when we all owned all our own hardware.
Secondly - IBM don't comment locally because they are not allowed to. Thats a corporate policy and has nothing to do with the calibre of the people in country.
Thirdly - the availability advertised is three nines, not five nines and many of the customers are on older contracts which were sold with an SLA of 98.5%. Furthermore for cloud services, the uptime is typically calculated on a month by month basis, not annual basis.
Finally - people who are are running critical business services need to understand there DR posture and the resulting risk. If they choose to put all their services into a single site with no DR plan, they will almost certainly at some point suffer an outage. If your services are that critical, plan for disaster and don't put all your services into a single site offering. It's not rocket surgery.
The real problem at IBM NZ is the management (from the very top in the US, all the way to exec level in AU/NZ) have run the delivery organization in to the ground, driving down moral and cutting staff levels to the bone, forcing overworked smart technical engineers to focus on mundane compliance tasks that could or should be automated but have not been due to complex, cumbersome and frequently changing process.
Your source who is purportedly a customer and provide you this hearsay knowledge obviously received this knowledge via Chinese whispers as it simply incorrect.
Fact #1. It wasn't XIV storage.
Fact #2. It was not related to planned work or changes - it was a fault.
Fact #3. No data was lost.
Fact #4. There was some corruption with a small subset of data which was successfully restored from recent backup.
Fact #5. There are many other storage appliances that are worth of the 'enterprise' tag. I'm glad you prefixed your inaccurate opinion with the fact that it was just, your ill informed opinion.
Jeeze - hearsay information is simply not 'a fact'
Posted by Anonymous at 12:52:17 on February 23, 2013
Is it still a 'fact' when it's WRONG!??
Well you seem to know a lot about the issue, yet you are not prepared to provide details. If it wasn't XIV, then surely it wasn't that V7000 rubbish. Also, feel free to provide details on what you consider enterprise storage in other appliances... This will be hilarious.
Posted by Anonymous at 19:28:14 on February 23, 2013
Posted by Anonymous at 19:28:14 on February 23, 2013
Is it still a 'fact' when it's WRONG!??
IBM is crap, everyone knows that, except... IBM
Posted by Anonymous at 13:06:42 on February 23, 2013
Posted by Anonymous at 13:06:42 on February 23, 2013
yep.
Yep, my money is on linked clones, on top of vcloud with storage on xiv.
Only a storage drop out can cause failures of this magnitude.
Also one has to question scale up verses scale out.
The real cloud providers and big guys (facebook/google) all scale out rather than up, but nz clouds are typically scale up which means there are often single pieces of technology than can and do fail.
I would rather not have to deal with vcloud objects in a restore scenario - its another layer of abstraction on top of vsphere, but i do hope ibm are using Veeam and not some other solution and I do hope their on disk backups weren't on the same xiv device that is probably by now in tears!
Posted by Anonymous at 7:15:22 on February 21, 2013
Only a storage drop out can cause failures of this magnitude.
Also one has to question scale up verses scale out.
The real cloud providers and big guys (facebook/google) all scale out rather than up, but nz clouds are typically scale up which means there are often single pieces of technology than can and do fail.
I would rather not have to deal with vcloud objects in a restore scenario - its another layer of abstraction on top of vsphere, but i do hope ibm are using Veeam and not some other solution and I do hope their on disk backups weren't on the same xiv device that is probably by now in tears!
Posted by Anonymous at 7:15:22 on February 21, 2013
IBM NZ lack quality engineers.
IBM NZ lack quality engineers to maintain the VSS services infrastructure. Some where at some point some one or more than one person, has commited an act of engineering technical negligence on the VSS front end and back end components, which torpedoed into various directions bringing down the whole stack of cards down and there may have been no real way of undoing the negligent engineering act, OR they may have tried to undo which then probably made it more worse. $80 million data centre running into issues suddenly, it is supposedly world class standard, don't they have monitoring, proactive checks and detection, prevention mechanisms in place , any regular health checks, go figure.
Posted by Anonymous at 2:09:57 on February 21, 2013
Posted by Anonymous at 2:09:57 on February 21, 2013
MOST POPULAR
Social Media @Computerworld NZ

Computerworld NZ has now reached LinkedIn! Join to expand your networks and meet others interested in information systems.





