Wednesday, November 27, 2013

What do Odin, 5 hours of my life, and one surprisingly horrendous and anti-user friendly firmware update have in common? A lesson for SaaS!

"Odin sends them to every battle [the valkyries].  They allot death to men and govern victory." - From "The Gylfaginning".  Available online at: http://mikespassingthoughts.wordpress.com/2012/05/08/interesting-quotes-concerning-odin/

"Odin can't see JACK!  And even if he could, there's so much adware around these kernels there HAVE to be 100 codemonkeys throwing malformed javascript at your browser extensions for every one that will actually download.  Forget uncompressing one, I'm too busy killing random tabs!!!!" - Me, approximately two hours into the second burst of activity trying to unbrick a Samsung Galaxy III that got bricked because I follow best practices.

"Aargh!  I need pie!  One of us - me or the phone - will not survive to see tomorrow!!!  This is why people root phones!  I play by the rules, and don't root one, and still get technorubble!"  - Me, approximately another hour later


By now if you are still with me, you are probably wondering where this is headed.  For your interest, this is a cautionary tale, as well as what tripped me to one of the most glaringly bad design decisions and deployments I've ever seen.  And now I share it with you, for better or for worse (hopefully not the latter).

It is pretty much considered a best practice to maintain your computing equipment at the most recent stable release. We've all done those update cycles for our Windows machines, or had them done on us even though we said to let us choose (don't get me started on THAT!) but they exist for every system there is, including our cell phones.  So, this morning, being a responsible computing professional, I initiated a system software (firmware) update from the AT&T system.  It is a standard process, and generally only non-harmful software is located on their servers.  To foreshadow, the service professional I spent a while on the phone with actually said, "It's the safest thing to do, and I do it all the time.  I've never seen it do that."

After downloading 680+ MB of code over my handy dandy home wireless pipe, my phone acted like a system in that scenario normally does.  It blinked a bit, put out a message about rebooting to complete the process, then underwent a process known as brickulation.

It turned into a brick.  When the system tried to come back up, it stuck at the splash screen.  For the remainder of its normal life.  Battery pulls, resets, even what is known as a factory reset - none of those things could revive it.  It would boot to the same spot, then die right there, heating the battery so rapidly that I had to actually put it in the freezer while I was on the line with tech support to avoid a diagnosis of brickulation with advanced iBatteryfirehazarditis.

We tried everything, and I do mean everything.  I took it to the Samsung tech at Best Buy (first time I've walked out of there without having spent some serious coin) for a wizardly stab at fixing it. I tried the Kies updater, I tried every permutation of the reboot sequence, and nothing happened.  It always froze at the same spot.  I wiped the device.  I plugged and unplugged, everything.

I eventually even moved to trying Odin 3.7, which is the name given to the kernel loading system for Android.  With that you should be able to insert an operating system kernel onto an Android device, but it couldn't see the phone, and getting a true version of the kernel code instead of adware riddled garbage left the pursuit empty.  Nothing worked, and so a new phone will arrive via delivery sometime soon.

So I'm frustrated at what happened, but am very happy with AT&T and how they handled the situation, and are getting me a replacement phone, so kudos to them.  BUT... a pox upon the coders and system wonks who came up with the updater system, and here's why.

As someone who works often with problem resolution for computing technology, I'm a fan of the clarity that comes from well-documented reporting of issues, so while I was on the line, using my wife's phone, which is identical to mine, I walked step by step through the update process (there was a fair amount of confusion on the call at the beginning, and it took a while to get beyond the concept of updating an app to the reality of an approved firmware update that went toes up).  I reached the point where you click 'Continue' to actually start the upgrade, and figured that since her phone is identical to mine, and the update bricked mine, my luck was too thin to go any further.  However, at that moment I noticed there was no cancel button.  NO CANCEL BUTTON!

Figuring that this was just sloppy code engineering - and it truly is - I hit the back button.  There was no result.  A reboot of the phone brought it back to the same point, where the update was demanding that the Continue button be clicked.  Every trick I know how to do, except for randomly killing processes, resulted only in the insistent presentation of the screen demanding I press Continue so that it could proceed down the same road I'd been down with my new brick.

Yet another phone call with tech support provided me with the unbelievable answer that, though there had been nothing downloaded yet, when you click on 'Check for updates', it commits you to a process from which there is no recovery, except to go through it.  There is no stop, killing processes will brick your phone (apparently so will the update, but I digress), and if you accidentally click Continue, you have to stay inside your WiFi coverage, or unknown, bad, bricking things will lurk (apparently, they lurk within the update as well).

I maintain that coding up a firmware update process without the ability to stop it before it even downloads anything, or does anything, is a deplorable and inexcusable violation of good coding practices.  I don't know who would be the one responsible for that - the carrier, Samsung, Linus T. (though I'd bet a large amount of cash that Linus would be just as appalled as I am, so maybe not him) - but whoever it is should have to compress 1 TB of video files using only a 486-based system running only Windows ME as a punishment.  We are now going to pass off the risk of finishing the upgrade that cannot be stopped to the AT&T store, because if they run the update, and it bricks the phone, then they broke the phone (I want to be clear that was the support suggestion - I don't really go for the slick passing of risk, preferring instead to work together to get solutions, rather than play hot potato with device warranties - I MOSTLY prefer intelligent code practices, though, and since I don't get my wish there...).

WAIT!  I promised a lesson for SaaS, didn't I.  Here it is: If you are providing an assertion to a customer that an update is approved and okay for them to use, then it truly needs to be okay, and safe, and needs to contain sufficient reversibility, as well as proper design and a stepped approach, with appropriate user-optional choices to end the process.  Consider designing software to be used in an update scenario more along the lines of how you plan updating of a database.  When you update a database, there is a process that changes the value being used, and at the end of whatever process runs, there is what is called a commit, which then makes the change permanent.  If you do not plan for that kind of update process, and there is not sufficient attention to making sure that the user can decide not to carry it out, then you are asking for an issue such as this, occurring to someone like me, with enough knowledge of the guts of the systems to handily dismiss the front line of OS defense - it wasn't us, and it is out of the warranty period - to entertain your service staff.  As I end this, I humbly submit that success in the SaaS arena will come from changing your internal interpretation of the abbreviation from 'Software as a Service' to "Service applied as Software'.  Now to rescue the remainder of the evening, while simultaneously placing the turkey into much peril.  Happy Thanksgiving, everyone, and may your updates always result in a good reboot.

No comments: