Crappy providers....

So my company switched phone systems this week. My boss was handling the sip trunk side of things and I was handling a lot of the programming side of the house. We bought a company and were switching to the same Cisco system they use. The project was being run by a director of the company we bought.

This project originally included our sip provider migrating the sip services to another set of connections. However they could only do 100 at a time. We have way more than that. My boss came up with a work around, create another trunk group, move the numbers to the new trunk group and, Ta Da!,we could do all of them.

So that was the plan until my boss had a personal emergency. So the day before the switch we had a 30 minute meeting in which I got the low down on what I had to do. It doubled my load but it was doable.

So we start the switch. I go into the service providers web portal and start moving the numbers. I did a few and noticed that the ported numbers went to ‘updating’ and just sat there.

This is bad. Very fucking bad.

So I call the provider and open a ticket stressing that this is a time critical issue. The rep said an engineer had to look and they would call. An hour passes, no call. So I call back and get the ‘it is in engineering, but I will call and check’ bit. I wait another 20 minutes then call back.

Then the rep says ‘Well, maybe you ought to reschedule. I don’t know if the engineers wil get to it tonight’. Did I mention that we have a call center, corporate offices and offices in every freaking state? And none of them would have phones in the state the system was in at that point. And 5 people in the office from the consulting company that helped us put the system together? I didn’t scream at the guy but I let him know that canceling wasn’t an option.

So a flurry of calls, people freaking out, lots of ‘Holy shit, this is going to go sideways and we have no control over it!’ drama. At the same time the director running the project was sending emails every 5 minutes asking for an update. I finally sent him an email stating that no progress was being made because I was writng progress reports every 5 minutes about how no progress wss being made.

Finally at about 4 am we get roughly 95% of it done, enough to test before we open at 4:30 am. We test and the migrated numbers look good. People start arriving at work and we assist the reps and make sure everything is good.

Then there were some routing and firewall issues affecting remote offices. So I am fixing our production firewall after being up for 25 hours. Not good.

Did I menation that I had to come in on Wednesday at 7 am to work with the consultants?

My boss called for an update. I told him where we were at and said ‘I am in the prod firewall and am probably about to break the shit out of it. Can you get it?’ His personal emergency was such that he could do a bit of work from home so he jumped on it.

I went and talked to the consultants, talked to the V.P. of the customer service side and confirmed everything was cool.

Then I clocked out after 27 hours on the clock. The consultants had rotated throughout the night and had some rest.

And I went home to my wife and kissed her, told her happy birthday and slept for 3 hours. Yes, the upgrade was scheduled the night before her birthday. I then cooked her dinner, kissed my boys goodnight and slept.

Today we cleaned up some stuff, a bit more to do on Monday. But it works.

Life lesson number 6,666. If you have a giant project, confirm every vendor involved will have enough staff on hand to handle any emergencies*.

Slee

*The sip provider is a giant company and it blew my mind that they didn’t have enough engineers on duty.

Can we get an update, now?

Just kidding. Man, sleestak, I feel for you. Haven’t been there/done that in exactly the same way, but have had my own set of issues with other vendors during stressful times, and it is just absolutely no fun. Your personal level of service is admirable and commendable. Congrats on a job well-done, as far as your end went.

Been there, done that, have the t-shirt (with phones, servers, routers and network issues)

Going by the thread title, I half expected to find a post about some entremanures.

Oh, I’ve been there and done that a bunch of times as well with a bunch of other systems*. However, this is the first time that I’ve told a vendor that we had a system down emergency and got the 'Well, we are a bit busy so reschedule." bit.

Thankfully things are up and running. A couple small fires but nothing huge and I have some more numbers to port. But man, that was stressful.

Slee

*My favorite was this. We had a SAN that ran our slot floor in a casino. The SAN had two controllers, A and B. If A went down, B took over.

Everything was redundant. Well, everything except a Link Controller Card. The LCC was the heartbeat between controller A and controller B.

The LCC broke. And it wasn’t redundant. So controller B thought that controller A had stopped. Controller B took over and started writing data to the SAN. However, controller A was happily writing to the SAN 'cause it knew nothing of controller B.

And the SQL database on the SAN was absolute toast. We had to restore the entire DB which took almost 24 hours. That sucked.

Slee