Monday, January 12, 2015

Jabbered to Death: Counterproductivity Software, or TPS Reports in the Age of the Interwebs

" I'm going to need those TPS reports... ASAP..."  - Bill Lumbergh in Office Space.

* Side note: occasionally I apply a literary device called sarcasm, as well as hyperbole and stating the truth in a humorous or reframed way.  A good example of that is found in the comment about SLAs.  Aside from blogs (especially technology blogs - hopefully this one doesn't induce as much sleep as some others) being more readable if they consist of some entertainment value, there is often helpful truth hidden in humor.  The degree of that depends on the reader, and the topic.  Any sore toes with what is written here are unintentional, and we all know as professionals that even the best tools and methods get stretched and misused from time to time.  We also know that there are no absolutes, and that any resemblance to companies/departments living, or soon to be scooped up by the Google-Facebook Complex are sheerly coincidental, except for sanitized exemplars.  So please enjoy, and if you receive a laugh, so much the better.  If you receive good inspiration from this, then you owe me a penny.  To understand that last comment,  you may want to consider furthering your education in the Center for Information and Comunication Sciences at Ball State University.  Knowledge is a "Force" multiplier.

First let's clear the air, lest I offend any fellow Jedis - I'm not picking on Jabber specifically, it just happens to have the snappiest name, though given that jabbering is sometimes used to describe the sound a band of monkeys makes, it could be apropos.  It's all about the user of the tool, not just the tool.  So due respect to the Cisco Kids...

Now to the point.  How do you handle production incidents in your company?  I'm not asking about your SLAs (Service Level Agreements - basically agreements ahead of time which determine the period of time a service provider can say "I'm working on it, be patient" versus saying "I'm sorry, reality and promises made to get the business have unfortunately collided"), or your Help desk staffing, I'm referring to the response/communication mechanisms that are used to try to arrive at a solution.

We have worked very hard to provide a good way to attempt to improve the flow of information for all business processes, incident responses especially.  There was a day when we had the telephone,  and that was it.  The passage of information was very much serial, rather than parallel. 

I'm referring to the basic two ways we have of wiring up an electrical circuit - serial, which means the power flows to one place, then to the next, along a series of connections, and parallel, which means the power flows to all the places at the same time.  For a frame of reference, if you've ever had a light in a strand of Christmas lights go out, all of them go out, because the series is interrupted.  The strands where that doesn't happen are wired in parallel.  They are also more costly [implementation of foreshadowing complete].

The two people connected up could pass information in real time, but anyone else wanting in on the feed of information would receive a busy signal.  So the solution to that problem was the speakerphone/conference call.  That is properly understood as the "5 minute delay to find the number and figure the equipment out/sound like someone speaking backwards through a box fan" solution.  Aside from some impracticality involving gathering and finding equipment during a response, it is an okay solution.

However, distance and time are increasingly becoming inexcusable barriers to doing what we do, because the inverse of Moore's law also applies to user tolerance for delays, namely that the tolerable time for waiting is cut in half every 18 months.  So we saw the use of email in response to production issues.  It works efficiently and offers the parallel passage of information so that a person being away doesn't halt anything.  There are distribution lists and other ways of getting things out to everyone at the same time.

The issue with that was that such a solution was asynchronous,  and not only could you not know when someone acted on something when they did it, you had no idea if they even saw it, and there was the risk of a fix being overwritten by something happening out of order.  In terms of command and control structures, email is the UDP of human interaction (aw, shoot, sometime you just have to Google some things - suffice it to say TCP/IP is better, UDP is faster).  Information is sent out and you don't know if it was received unless someone sends you something in reply.

Enter the instant messaging client.  It allows the direct sharing of real time information like the phone call, the use of groups in different places like the conference call, and the broadcasting ability of email, adding in the ability to tell if someone is online, or offline when something comes up.  It seems to be the perfect vehicle for efficiently addressing problems, connecting the managerial side to the technical side with the real time, blow-by-blow progress toward a solution. The technology is nailed down.

The TECHNOLOGY is nailed down.  The USE of the technology and carbon-based error resolution procedures are not.  One problem is that, unless there is a well-followed process, there is a tendency to use all of them, simultaneously.  The end result of that is an inefficient resolution.  The most efficient tools we can come up with, when used in conjunction, counteract their own efficiencies?  Here's how.  The following is a somewhat sanitized version of a real situation.  See if you can spot the efficiency losses.

☆Tech_0 receives an email from Programmer_0: There's a problem with some data loaded a couple days ago, I have the correct data ready to go in to replace it.  We need to verify it will work in test before adding it into production.

☆ T_0 sends an email to P_0: Will request the appropriate group do the delete, then when it is complete I'll load...

☆ [popping up in mid-email] IM from Manager_0: hearing that there is some bad data out there, can you check into that to see what is happening?

☆ T_0 IM to M_0: just received msg. from P_0, starting the process.

☆ [T_0 returning to email]: ...the new data into production, validate it, then load...

☆ IM from M_0: Okay let me know when it is done.

☆ [T_0 finishing email]: ...it into prod.  I'll let you know when all is done.

☆ When message is sent, open waiting email from Helpdesk Manager_0: I understand there is a data issue.  Can you please look into it?

☆ T_0 initiates process before answering by using in/out indicator to locate proper systems tech to delete bad data, locates and sends IM to Systems_0: Production issue.  Need to...

☆ IM from HM_0: I sent you an email.  There's a production data problem that needs to be taken care of.

☆ IM from T_0 to HM_0: I received the email and am initiating the process.  I will send an update when it is done.

☆ T_0 finishing IM to S_0: ...delete load 1 and load 2.  Are you available to do that?

☆ IM from Helpdesk Tech_0: I hear there is a production data issue.  What is the change order for it?

☆ IM from S_0: I can take care of that.  Give me a couple minutes and I'll let you know when it is done.

☆ IM from Manager_1: Just got out of a meeting and heard there may be a data problem in production.  I asked the Helpdesk to open a change order for it.

☆IM to M_1: That is a change to the way that we normally handle the process, since those are normally for configurations and such.  This is replacing bad data...

☆ IM from Helpdesk Tech_1: I opened up a change order for the data issue that's happening.  What should I put in for the task steps?

☆ T_0 checks email to get the change order number for further use.   While there sees an email from M_0 asking for an update.

☆ T_0 IM to HT_1: X, Y and Z are the steps we would do.  We've not ever done a change order for this before, so there may be additional components.  The major pieces are what listed though.

☆ T_0 email to M_0: Starting process, will let you know when done.

☆IM to HT_0: The change order is 123.

☆ T_0 opens email with correct data to extract data.

☆ IM from HT_0:  So you opened the change order already?

☆ IM to HT_0: No, HT_1 opened one up and sent me the number.

☆ Email from HS_0: where is the fix for the data?  M_1 is asking for an update.

☆ Email from M_1: Where are we on the data problem?

☆ IM from HS_0: Is there an ETA on the fix?

☆ T_0 attempts to pull threads into a group chat for updating purposes.  More than half are shown as in meetings/do not disturb, but are already in existing chat sessions.  Half the requested personnel keep going in the existing individual chats rather than joining the chat.

For brevity's sake I'll summarize the rest.  There are several more email exchanges from various individuals, and the flow of questions and requests for updates kept flying in.  The whole thing was taken care of in 40 minutes.  That includes IMs (Instant Messages) asking for me to mark various tasks done, some of which couldn't be done yet.  There were a total of 9 different IM chats, and email exchanges with 4 of those individuals, who also had chat windows open, plus three unique recipients. I work remotely, otherwise I imagine the phone would have been busy as well.

The actual working time involved to resolve the issue was less than 10 minutes total.  The communication and other portions consumed three times as long.  It took the efforts of two people to directly fix it, and the other dozen-plus were just icing on the cake.

Examining this, it appears to be a conflict between the need to provide a solution and the need to drive a solution.  The basis of that can be found in the fundamental shift that has happened in the professional world.  

Imagine a brown out happened 50 years ago: "The coil winder got knocked out when the brown out happened.  How long to rebuild the motor?"

"Assuming we have the parts in the crib," (and they always did, unlike a lot of the bare bones equipment stores we have to deal with today), "it'll be two hours."

Imagine the same scenario within our context today: "The brown out took down the network.  When will it be back up?"

"If we just need to do a reboot, 15 minutes should be it.  If we lost a switch then it will be an hour or two, depending on which one and..."  The difference is that the structures and mechanisms we use today aren't so simple. 

The pacing of equipment and processes is so much different now than it was.  What's worse, asking for an estimate to bring the network back up is like asking your dentist how long it takes to repair a piston ring - a very vague guess is the best that can be done, given that the issue could reside in one of so many places.  Worse, the issue could be caused by two components simultaneously, and NOBODY gives an estimate for that scenario.  Actually there's one person who starts to, but they get stifled and transferred to another office far, far away.

While I'm a firm believer in never bringing a problem without a solution, this is a little different.  This is a problem generated by multiple, individually-good solutions, which step on each other.  So, with that in mind, I submit to the ether the following three suggested maxims for responding to production issues:

1) Let your one be a one, and your zero be a zero. 

As highlighted above, an estimate to fix an unknown problem is worthless.  What's worse, once an estimate is given, even if there is something totally new and frightening found, the estimate comes across as law, and turns into a bludgeon if it is exceeded.  So, since we trust that our IT staff is dedicated to quick, efficient resolution of issues, the statement that they are actively working on it needs to suffice.  Demanding a 1 where there is a 0, or a 4 where there is a 1, is pointless.

Let me illustrate with an example from a few years ago.  A computer room operator performed a scheduled maintenance task every Friday night at midnight.  There was a certain user who inevitably (I mean every Friday night - there's passive-aggressive, then there was this individual) would try to log in, find it was down, and call the operator.  Hearing it would be 90 minutes (this WAS a few years ago) they would call about every 10 minutes asking for an updated estimate.  This even though in this case all was known - a maintenance process is a maintenance process.  

One night he had enough and said it was laying in pieces and he had just finished hosing it out, but would get it back together as soon as it was dried out, lest someone get electrocuted.  The calls stopped, but the true question was why they started to begin with.
 
Then again, in a network outage, in which recovery requires dedicated efforts of all hands on board, why does there seem to be such an emphasis on getting updated estimates, rather than a functional network?  If IT knew exactly what was wrong they could give a highly accurate estimate, in which case updates would be unnecessary, and the outage could probably have been foreseen and even avoided, and if they had no idea, then the original estimate and subsequent updates would be meaningless.  And, lest we forget, an estimate never fixed anything.   If it's wrong it's not like you can obtain a solution by turning it in like a warranty, "You said 5 hours, and it's been 6 hours, so I demand my time back and a working interwebafacebookagoogle now.

2) Circumventing the system when an issue arises is like sticking a penny into a fuse holder: a bad idea, waiting to destroy everything.
 
In the example detailed above, did you catch that there were a couple instances where someone went around the regular pathway because they "knew a guy" in IT?  Did you also see where that added additional circuits to the communications chain? 

In a response to an issue, there may indeed be a need for some parallel communication to take place.  However, that need should be initiated by IT in an effort to get some highly-specialized, or at least scoped information to directly resolve the issue.  It should never be initiated by someone trying to get an update. 

If this sounds like an avoidance tactic, it isn't.   It is an efficiency tool.  Each person processes a maximum amount of communications.  I would prefer to maximize problem resolution energy, rather than assess where we are every half an hour and report that officially and via the many chains opened up during the issue.  Remember that, unless there is a formal structure, each interaction could be "the official" one, and failing to communicate back to the correct person can cause serious trouble.   So we communicate to all who have opened one up so we don't miss the one we need.

3) 186,000 m.p.h., minus resistance is all we have, and even that is faster than a hard drive.

That's all we have to work with.   In the above example of a coil winder, one electrician could be winding the wire around the commutator while the other one pop rivets in the assembly that engages the brushes, effectively multiplying the efforts in the same amount of time. Though sometimes that is possible with computing technology, oftentimes there are tasks which are tied to physical limitations of equipment, and some things must be done solo, and at the speed of the process/machine, and no quicker.

For a restoration of data from a tape, for example, there is a hard limit on the speed the information comes back with.  You can use different algorithms and techniques to make the new versions of backups written to tape more efficient, but generally when restoring, your speed is part mechanical and part logical, and you are stuck restoring with the best logic that existed when the backup was created. 

So while no offense was meant here, if you find that the scenario above, or one like it, has played out in your shop, you're doing nothing wrong.  You're using the tools as needed.   The problem is that by doing nothing wrong, you could be failing to do it right as well.  Such is the paradox of the bit twiddler. 

Wednesday, November 27, 2013

What do Odin, 5 hours of my life, and one surprisingly horrendous and anti-user friendly firmware update have in common? A lesson for SaaS!

"Odin sends them to every battle [the valkyries].  They allot death to men and govern victory." - From "The Gylfaginning".  Available online at: http://mikespassingthoughts.wordpress.com/2012/05/08/interesting-quotes-concerning-odin/

"Odin can't see JACK!  And even if he could, there's so much adware around these kernels there HAVE to be 100 codemonkeys throwing malformed javascript at your browser extensions for every one that will actually download.  Forget uncompressing one, I'm too busy killing random tabs!!!!" - Me, approximately two hours into the second burst of activity trying to unbrick a Samsung Galaxy III that got bricked because I follow best practices.

"Aargh!  I need pie!  One of us - me or the phone - will not survive to see tomorrow!!!  This is why people root phones!  I play by the rules, and don't root one, and still get technorubble!"  - Me, approximately another hour later


By now if you are still with me, you are probably wondering where this is headed.  For your interest, this is a cautionary tale, as well as what tripped me to one of the most glaringly bad design decisions and deployments I've ever seen.  And now I share it with you, for better or for worse (hopefully not the latter).

It is pretty much considered a best practice to maintain your computing equipment at the most recent stable release. We've all done those update cycles for our Windows machines, or had them done on us even though we said to let us choose (don't get me started on THAT!) but they exist for every system there is, including our cell phones.  So, this morning, being a responsible computing professional, I initiated a system software (firmware) update from the AT&T system.  It is a standard process, and generally only non-harmful software is located on their servers.  To foreshadow, the service professional I spent a while on the phone with actually said, "It's the safest thing to do, and I do it all the time.  I've never seen it do that."

After downloading 680+ MB of code over my handy dandy home wireless pipe, my phone acted like a system in that scenario normally does.  It blinked a bit, put out a message about rebooting to complete the process, then underwent a process known as brickulation.

It turned into a brick.  When the system tried to come back up, it stuck at the splash screen.  For the remainder of its normal life.  Battery pulls, resets, even what is known as a factory reset - none of those things could revive it.  It would boot to the same spot, then die right there, heating the battery so rapidly that I had to actually put it in the freezer while I was on the line with tech support to avoid a diagnosis of brickulation with advanced iBatteryfirehazarditis.

We tried everything, and I do mean everything.  I took it to the Samsung tech at Best Buy (first time I've walked out of there without having spent some serious coin) for a wizardly stab at fixing it. I tried the Kies updater, I tried every permutation of the reboot sequence, and nothing happened.  It always froze at the same spot.  I wiped the device.  I plugged and unplugged, everything.

I eventually even moved to trying Odin 3.7, which is the name given to the kernel loading system for Android.  With that you should be able to insert an operating system kernel onto an Android device, but it couldn't see the phone, and getting a true version of the kernel code instead of adware riddled garbage left the pursuit empty.  Nothing worked, and so a new phone will arrive via delivery sometime soon.

So I'm frustrated at what happened, but am very happy with AT&T and how they handled the situation, and are getting me a replacement phone, so kudos to them.  BUT... a pox upon the coders and system wonks who came up with the updater system, and here's why.

As someone who works often with problem resolution for computing technology, I'm a fan of the clarity that comes from well-documented reporting of issues, so while I was on the line, using my wife's phone, which is identical to mine, I walked step by step through the update process (there was a fair amount of confusion on the call at the beginning, and it took a while to get beyond the concept of updating an app to the reality of an approved firmware update that went toes up).  I reached the point where you click 'Continue' to actually start the upgrade, and figured that since her phone is identical to mine, and the update bricked mine, my luck was too thin to go any further.  However, at that moment I noticed there was no cancel button.  NO CANCEL BUTTON!

Figuring that this was just sloppy code engineering - and it truly is - I hit the back button.  There was no result.  A reboot of the phone brought it back to the same point, where the update was demanding that the Continue button be clicked.  Every trick I know how to do, except for randomly killing processes, resulted only in the insistent presentation of the screen demanding I press Continue so that it could proceed down the same road I'd been down with my new brick.

Yet another phone call with tech support provided me with the unbelievable answer that, though there had been nothing downloaded yet, when you click on 'Check for updates', it commits you to a process from which there is no recovery, except to go through it.  There is no stop, killing processes will brick your phone (apparently so will the update, but I digress), and if you accidentally click Continue, you have to stay inside your WiFi coverage, or unknown, bad, bricking things will lurk (apparently, they lurk within the update as well).

I maintain that coding up a firmware update process without the ability to stop it before it even downloads anything, or does anything, is a deplorable and inexcusable violation of good coding practices.  I don't know who would be the one responsible for that - the carrier, Samsung, Linus T. (though I'd bet a large amount of cash that Linus would be just as appalled as I am, so maybe not him) - but whoever it is should have to compress 1 TB of video files using only a 486-based system running only Windows ME as a punishment.  We are now going to pass off the risk of finishing the upgrade that cannot be stopped to the AT&T store, because if they run the update, and it bricks the phone, then they broke the phone (I want to be clear that was the support suggestion - I don't really go for the slick passing of risk, preferring instead to work together to get solutions, rather than play hot potato with device warranties - I MOSTLY prefer intelligent code practices, though, and since I don't get my wish there...).

WAIT!  I promised a lesson for SaaS, didn't I.  Here it is: If you are providing an assertion to a customer that an update is approved and okay for them to use, then it truly needs to be okay, and safe, and needs to contain sufficient reversibility, as well as proper design and a stepped approach, with appropriate user-optional choices to end the process.  Consider designing software to be used in an update scenario more along the lines of how you plan updating of a database.  When you update a database, there is a process that changes the value being used, and at the end of whatever process runs, there is what is called a commit, which then makes the change permanent.  If you do not plan for that kind of update process, and there is not sufficient attention to making sure that the user can decide not to carry it out, then you are asking for an issue such as this, occurring to someone like me, with enough knowledge of the guts of the systems to handily dismiss the front line of OS defense - it wasn't us, and it is out of the warranty period - to entertain your service staff.  As I end this, I humbly submit that success in the SaaS arena will come from changing your internal interpretation of the abbreviation from 'Software as a Service' to "Service applied as Software'.  Now to rescue the remainder of the evening, while simultaneously placing the turkey into much peril.  Happy Thanksgiving, everyone, and may your updates always result in a good reboot.

Monday, November 04, 2013

Piles of Bits

" I don’t think enough people study the measurements that have already been made. Hiding within those mounds of data is knowledge that could change the life of a patient, or change the world. If I don’t analyze those data and show others how to do it, too, I fear that no one will.”–Atul Butte, Stanford, available at http://whatsthebigdata.com/2012/07/13/big-data-quotes-of-the-week-13/

I'm at IOD 2013, waiting on the start of the morning General Session, and on the screen are various stats - #of Tweets sent, etc.  It brings home what situation we're in as we try to make our way through the forest of data.

Without inferring anything (and please don't,  because coffee #1 isn't yet halfway gone) I'm reminded of the old scenario of taking a large number of chimps and putting them in a room filled with typewriters, with the result being that one of them would write a best seller.

What I mean by that is we are beyond the days of carefully crafted communication,  as well as the mysterious art of divining tone and undercurrents from a small piece of communication. We stand in front of a data cannon, loaded with birdshot (large numbers of small projectiles).

That isn't a bad thing.  Instead of worrying about nuances, those are fairly gone (as are grammar, spelling, and civility filters), so we only need to gather the right pieces in front of us and see them where they are.

This is the butterfly effect writ larger - larger because of the potential financial impact, which is the driver of most everything. 

I could write the same again, instead here's an earlier post reflecting what's up here.  http://www.cicsworld.org/blogs/ctuite/2013/10/edward_tufte_and_samuel_morse.html#

Friday, October 04, 2013

The Most Useful App Ever

I love the Instamatic application on my iPhone, it takes the coolest photos.
Carolyn Murphy 
Read more at http://www.brainyquote.com/quotes/keywords/application_2.html#52DOUw4rZwBkkxPP.99

I have now been a smartphone user for the last half a decade.  I began of course with the Blackberry, and then expanded to the Android system.  In all that development there has been a tremendous amount of development of power and data manipulation ability.  It's even such that I now can almost replace my laptop - almost.  Blogging is something that I have done using that keyboard form factor, but if you have attempted to type more than a text message, you can see the futility.

So there was one thing which hadn't been done until about a year and-a-half that was thoroughly surprising.  Up until that point in time, there was really no way to print anything from the phone.  I have to wonder why.

What is it about the process of replacing the computing devices with more mobile computing devices that left an dearth across the entire platform?  Every portion of the computing experience was duplicated, from the ability to use a word processor and a spreadsheet to the ability to mindmap and plan.  There have been applications to organize appointments and to manage finances, yet none to get anything useful out of them.
In the 'good old days', there were these things called printer drivers, and whenever software was written, it was written to be able to send data to the printer, no matter what it was.  It was useful if you needed to take information off of your computer and share it with someone easily, even if they didn't have a computer handy.

Enter the best of the technological advances, and suddenly the useful pieces of technology went away.  Screenshots and printing (though screenshots are now available on the Galaxies III and IV, and possibly other devices) were suddenly gone.  I'm not really sure if they were gone because the technology didn't exist to make it happen, or because nobody who was writing the apps took into consideration that they might be good for more than social media and Angry Birds.  

The problem has now been solved with an app to print to wireless printers - incidentally the highest price app I've purchased (not even $15, but with the general free-ness of the app store...) - but that still leaves the overall issue of the loss of functionality it what should be an evolutionary step forward.

My point with all this is that as we move forward with what we do, we need to be able to keep the things that we need to be able to use before we get to the new stuff that we might want to use.  This isn't to say that we need to always do things the old way, but consider this: what would be the point of a word processor without a way to print the output?  Spellcheck is already everywhere on the phone, and there are much simpler things that can be done if the text will remain mostly on the screen.  The purpose of a word processor is to produce things that will eventually be printed out (or converted into a PDF, which is actually similar in function), yet that feature was the laggard among all the others.

Tuesday, February 28, 2012

Celling Out the Future of the Cloud

"The computer industry is the only industry that is more fashion-driven than women's fashion. Maybe I'm an idiot, but I have no idea what anyone is talking about. What is it? It's complete gibberish. It's insane. When is this idiocy going to stop? We'll make cloud computing announcements. I'm not going to fight this thing. But I don't understand what we would do differently in the light of cloud." - Larry Ellison, as quoted at http://www.techno-pulse.com/2011/02/memorable-cloud-computing-quotes.html

This one is a little out there, but humor me for a moment.  I want to take a walk down a road that is possible, though potentially not currently feasible.

With the explosion of cloud computing, there are now several things that may be possible which we haven't considered.  This is one of those which may in ten years paint me as a sage, or as a fool, but I want to put a theoretical foot in the door about cellular cloud networks.

I'm not talking about large things here, I'm talking about smaller ones.  A few years ago a program was begun that used spare computing cycles to assist the Search for Extraterrestrial Intelligence (SETI) in their search for meaning in the random radio signals received from space.  People could let their computers help out while they weren't using them.  I believe even though they found nothing, it went well.

Imagine that the cell phones within an area were able to be used for similar things.  Maybe for better weather prediction, taking more readings, crunching numbers for predictions, whatever.  The cell tower would have to direct traffic, but with bans on cell phone use while within a car it should be easy to determine the speed of a phone and either switch it to process a bit if it is moving slowly, or not use it if it is moving at traffic speeds.

They could also be used for data storage.  You wouldn't want to necessarily store your bank account numbers on someone's cell phone (though to be fair, if your bank is using cloud computing, you don't REALLY know where the data is being hosted, you just assume due diligence is enough to avoid calamity, said the Board of Directors at Enron...), but there is lots of data that could be stored in the cloud, and 'demoted' to the relatively slower and less-reliable cell phone network, duplicating on as many as three phones if needed so that the data is available if it is needed.

Move it to another step.  We live in an internet of things.  Suppose your cell phone were able to use cloud-type processing to store information on refrigerators, can openers, whatever has memory it can share and a connection to WiFi.  Would you be able to save a little on your cell bill by offsetting processing and storing web-based requests for insurance quotes using your cell phone and/or coffeemaker, selling processing back to the company in the same way that you can sell power back to the power company when you install windmills?  You cell phone can likely already provide a hotspot for others, so what if one of those hotspot ports were set so that this capability could be used?

We already are working on near-field communication, where your cell phone can receive an in-store coupon when it is detected to be inside a store, so this is just an extrapolation of a type of process that is already happening.

I could even see a business case constructed where you get a pay as you go cell phone that you actually save money on the more processing you do for the cloud.  Call it something like Talking Cloud Cellular, or Yak Yak Money Back Cellular, and begin building out from there.  Of course there would need to be a primary data center for the important cloud-based work, but the lower-need/chump processing could be done via the cell.  Your cell phone already sends location information back and forth to the tower, so a little augmentation of that signal and the data could be flowing.  You are already sending WAY more information than you realize, and downloading apps you have no idea the size of, nor how much space they'll take up, so why wouldn't that be 'found processing'?

As usual, if anyone takes this idea and runs with it, I want a nice percentage, plus a T-shirt (preferably from ThinkGeek - I love their stuff), and also an iPad for this one, since it started out as something I wasn't sure would work and evolved into more of a functional one.  There would be a lot to sort out, but then again, there was to establish the internet, and it was time and effort well-spent, as far as I can tell.

Thursday, January 26, 2012

No Matter How Great - and it IS Great - This is One Thing Big Data Cannot Do

"The project would not have been started if the truth had been told about the cost and timescale." - Unknown, quote from  http://www.famous-quotes.net/Topic.aspx?Project_Management


To treat your time as preciously as possible, please read the following statements.  If you have NEVER said, heard or overheard (there is a difference) any of these, please feel free to stop reading now and go over to http://www.smartertechnology.com or http://www.internetevolution.com and do a little learning about big data.  Now for the statements:

- Officially the manager doesn't want any administrative costs entered.  We know there are always administrative costs, so just fold them into whatever you spent most of your time on.

- Make sure you spend more than $15 on supper, otherwise Accounting will reduce our per diem.

- We know they found a problem in testing, but the release date is tomorrow.  Ship it, and make sure anyone who gets the first build gets the fix.

- We have monthly inventory tomorrow, so load up all the machines, even the trimmers, and make sure the scrap goes on the trimmers.  All corporate cares about is raw materials, scrap and finished goods in the warehouse.  The more we have in process the less we have to explain.

- He's out for his three hour daily lunch.  He needs to be back by 2:00 though so he can make his 4:00 tee time.

Still with me?  I thought you would be. All but one of those I've heard firsthand.  In fact I was once called on the carpet for eating fast food instead of something more substantial for lunch, but that's another story, from a company that has since gone out of business. Follow me for a bit here.

Big data is a revolution in the way that things are monitored, analyzed, predicted and operated.  It really is fascinating stuff, and I've seen demonstrations of its power in action, grabbing input from Twitter, Facebook and a few other sites and using comments and such to determine the relative quality of a laptop, for example.  I sometimes refer to it as 'butterfly effect' computing, because its single-minded pursuit is the purchase of a pack of gum that foretells the results of a recall election.  That isn't written with any humorous intent - it truly is the last great frontier to cross in order to be able to put a blueprint to the seemingly patternless, and often hinges on the correlation between strange things.  It is a game of digital poker, learning the tells and the styles of everything and using that knowledge to position your hand to win.

BUT...

There is a factor which big data really cannot evaluate, at least not yet.  It is the ethics of what we do, and the connection between what we do and what the data say we do.  For example, I used to work for a boss who was constantly in the pursuit of receipts.  He would gladly accept dining receipts from any area restaurant, and  was very diligent in the filing of his expense reports.  You see, without going into a lot of the details, he spent a  lot of his money in $1 increments, and was not really able to request a receipt for the bills, nor would the receipts be accepted by Accounting.  So, he turned in his 'business meals' on his expense reports, supported by the receipts, and I would assume recouped his spendings.

The system said there must be receipts, valid receipts were submitted, so the system was satisfied.  The same system was not satisfied with allowing tips, though they are customary when you are actually dining out, so those had to be rolled in differently.  The system also didn't keep track of when a restaurant was open, so a receipt claimed as lunch from a restaurant open only from 5:00 p.m. onward was fine. 

You could find examples of driving logs that are all written in the same ink, though they span a year. (Truthfully, other than the one in the decorative desk set, has anyone anywhere ever held onto the same pen for more than a month?)  You could find (and I did find at the same place that was angry about a trip to Burger King) the same parts being rejected at an 80%+ scrap rate by one division, and only at a 20%+ rate by another division, the personal relationships affecting business decisions, making things work when they shouldn't, and burying the bodies in jointly-held cemeteries.

It is actually time for a wholesale change in the way that business is conducted.  Big data can look at the things that are reported and make the analysis based on what is there, but big data cannot find the actual truth in all matters.  It cannot tell you the actual cost of a project, the actual breakdown of expenses, or the actual aggregate series of events in a project.  The reason is a confectioner's treat.

In business, there is a long-standing thing called the fudge factor, whereby it is naturally assumed that there needs to be a margin of error, so you overbudget, overestimate time, overestimate cost (unless you are in an ERP project - to my knowledge there has NEVER been a cost OR time overestimation of one of those) while you also undercommit and overdeliver.  When you have a supplier and a consumer, each of them has their own fudge factor, and the two fudge factors can wind up creating a surplus of fudge.  That surplus enters the datastream, which then is used to figure out other things.

As with all software issues, the problem originates in the carbon-based units.  No matter how good a tool is, it still relies on the human input to determine how it is working.  It also must work within a system that was designed and implemented by humans many years ago.  To be blunt - we have never in the history of the human race been able to have such potential accuracy, yet we cling to the inaccuracy of the fable because we perceive the inaccuracy to be a necessary part of our process, and then we pass the disease on to our data.  Consider that - we have the purest form of power there is, data and the tools to analyze it - yet we would rather infect it with our inaccuracies than let it tell us where we can truly have an advantage.

I am a big believer in the concept of never bringing a problem without a solution, but in this case I can't do that.  Here's why.  The human factors reign supreme.  As an illustration, one problem I have blogged about in the past is the meaningless metric.  If someone processes 100 files a day, and you bring in a system that lets them process 200 files a day, in order for it to be a win there has to be 1) actually an additional 100 folders for them to process to necessitate the need, and/or 2) another processor position that you then cut in order to save and have 200 folders a day represent what it was being presented as being.  No salesman will EVER tell you in an open meeting that you can reduce headcount with a product, but do the math. The whole metric is meaningless unless you have the demand that the new system would fill and the political will to cut people until the savings are realized...at least as far as you think they are.

Which means that you make the assumption that people can work with 100% efficiency, and cut until you think they are at 100%, which then leads to sick time and if additional needs enter into the realm of what is being done you have training expenses and down time and, and, and... you trade potential for short-term gains, mostly on paper.  Do you tell the full-timer they are going to part-time?  Maybe, but they also may leave and you will then have to train their replacement.  I've been gone almost a year from a previous employer and they occasionally still send me an e-mail, asking about this or that.  Do all former employees answer the questions you didn't know needed answers before they left?  I do, but I'm likely in the minority.  So even if the new system is put in place, the headcount doesn't change, and possibly the need for efficiency isn't there either because of low demand.  Big data tools could tell you, but you have to tell them the right thing first.

You can say that the per diem will never adjust downward as a matter of policy, and that will get rid of the whole lunch and dinner expense fiasco, but then the human factor enters in and someone who always has to entertain clients complains about someone else who never has to and thus eats fast food and pockets the rest of the per diem (because the fix of just paying the per diem regardless was the first solution they tried).

The only fix I can suggest is that we reframe how business is done so that instead of garbage numbers we are feeding our systems the truthful story.  An executive who states that they don't want to see any administrative overhead within a project is just wrong.  I don't mean to offend with this statement, though I stand behind its logic: there cannot be a project plan without time being spent setting up meetings, running interference, adjusting communications, explaining and justifying to the C-level, etc.  Big data tools can help to streamline and shape that time to find better ways of doing things, but ONLY IF IT IS POSSIBLE TO KNOW ABOUT THEM!  If you say the cost is $150, the data do not know that you spent $127 on the component and the rest on the tip for the delivery man, they only know the check was for $150.

In the same vein, accounting policies and procedures can cause more data headaches than they solve.  The policies have evolved from the receipt-based concept to more of the per diem concept, but in the end they are not informative or accurate enough to allow a true handle on the costs.  If a company is headquartered in Kansas City, but they have a Manhattan office, my guess would be that the expenses of the Manhattan office would be far greater than those of the Kansas City office.  Do they make different amounts the standard per diem depending on where the employee is located, or do they find a mutually agreeable number for all involved, and the guys in Kansas City brown bag their lunches and pocket some extra money?  This has Human Resources hassle written all over it, so the business response is to equalize everything and pad the number such that the complaining reduces to an acceptable amount.

For big data tools to truly help, a base level of honesty needs to pervade the corporate culture.  For them to truly predict costs, they must have an accurate baseline.  For them to determine market trends, they must truly have a grasp of raw materials versus in-process materials, finished stock versus returns.  For almost every area in which a creative accounting method has been developed, there is a potential peril for using big data to analyze.

I suppose I can say there is a fix, and it is a two-part fix.  Part One is to report things as accurately as possible.  That only makes sense.  The tools involved in any project need accurate inputs so that they can give accurate outputs.  Can a system you lie to tell you the truth at a later time?

Part Two?  That is the tough part.  Since there is now, and always has been , such a cloak over the accurate data, it will take time for things to normalize.  The second part of the solution is to be okay with what you see in the analysis.  So many things in business - the fudge factor, the inflated receipts, the hiding of administrative expenses, the re-classification of raw materials - lead to false assumptions and bad strategy.  The second part of the solution is just being okay with the results.  Over time the analysis will grow better and more accurate, but in the beginning there is a lot of fudging to root out.  It is a concept that hearkens back to our agricultural heritage - first you muck out the barn thoroughly, then after the water dries you see how much room there actually is and where you should build your new stalls. Without the mucking, there is no accurate plan for building.  That needs to not only be understood, but demonstrably accepted by the management.  Then you can bring in the big guns of data and blow a hole through your paradigms to true efficiency and profitability.  Make sure you keep the receipts for the ammo, though...

Wednesday, August 24, 2011

Sophie-Tech X: Intuition and Learning - How the UI Changes the Game Fundamentally and Why Anachronisms are Sometimes Better

It's been a while since the last Sophie-Tech post.  Here's a short review.  With the birth of my daughter in 2007 came an opportunity to observe a new human interfacing with technology that didn't exist when I was in her position.  By observing her approaches to and successes with technology, I am able to infer quite a lot - both good and bad - about the technology, especially the UI.

Today the focus is on the UI as represented by the lowly touch tone keypad, versus the lowly - and anachronistic - pulse telephone.  I was recently surprised to learn that Sophie is able to dial the telephone.  This isn't the dialing she did a couple years ago, when she called 9-1-1 on my Blackberry, but truly a completed phone call.  At the age of three.  She dialed my wife's cell phone.  She was supervised by her older sister in the task, but it was just to watch her do it.  She has since demonstrated it all by herself.

This is not about cleverness on the part of a kid, but is about the alteration of a time-honored interface that hones in on different and more powerful properties than were had by the previous interface.  The old rotary dial phone was based on the principle of turning a wheel and thus generating a certain known number of clicks  that could be translated into a route.  You had to be careful when doing it, because to make it work there had to be a hard point that would stop the rotation of the dial, and if you were in a hurry you could bruise your finger on the post.  If you didn't turn it all the way to the post, then though you started to dial a seven, you could turn it into a six, just by stopping the rotation early.  This was what passed for great fun when I was a kid;  that and dialing up the time and temperature number.

The touch tone interface changed all that, though.  You could not turn a seven into a six, and you did not have to worry about injuring your finger when spinning the dial.  Instead of being based on the art of counting pulses, it became about the art of recognizing tones.  There was a degree of skill involved in dialing the phone, now there is not.  You remember a pattern of numbers, and then you touch a series of pads with those numbers on them.

This is all so simple even a kid can do it.  Really.  All they need to know is a sequence of information and they can translate that sequence directly into a connection.  No mechanical knowledge or skill is required.  This isn't about being a guy in his 40's all of a sudden complaining about things - that never comes to anything anyway, so why bother - but it is about the design of newer interfaces, and the surprising limiting effects that technological development can make.

What would you say if I made the statement that the Steampunk movement is actually a good example of how the UI can be done differently?  Does an anachronistic appearance automatically mean that a technology is not as good as another?  This has really bothered me in the past, because I have seen decisions made around, of all things, the appearance of a UI.  This is the actualization of the Dilbert comic strip where the pointy-haired boss wants a background color of mauve because mauve has more RAM.

The truth of the matter is that the touch tone enables the entire telephone network to run - hang with me, kids - as part of the internet.  The ways that signals are sent have been merged into the ONE WAY (capitalized to foreshadow the Matrix-esque nature).  Your telephone signal is very likely working off of a technology called VOIP, or Voice Over Internet Protocol (yes, the same IP from when your tech guy asks you to give him the IP address for your computer, and you tell him it's a Dell... will the scars never heal?), and that technology works only with the beeps.  The pulses need not apply.

It isn't even that we need pulses, except... there is a problem that can exist in times when the power is not so available.  If you are using VOIP and the power goes out, then it is likely your phone will.  Remember how in olden days your telephone received its power from the low-voltage feed present in the wire?  Since the switch has been converted to use optical transmission - the fiber optics that basically go from the phone company to the room where your router is installed - you can understand that you can't have light going through all the time, ready to power your equipment should you need to make a phone call.  So, the power of the phone is only there when the power of the house is as well.

I know, that problem is solved by having a cell phone.  That is true... at least just as true as if you go to a Colts game and the stadium is packed and Manning throws such an incredible pass that 50,000 people have enough bandwidth through their phones to post the picture they just took of the pass to Facebook.  Try that out before you decide how true it is.  I did at the Bengals game last year, and let's just say the hamsters running the switching equipment at the phone company were working through their breaks!

So what is the point?  The point is that in our quest to take the anachronistic out and replace it with the smoothly-digital, gray slider-bar, cookie-cutter graphics, stole-this-Flash-animation-from-another-site world, we sometimes take a step backwards, a step illustrated by the contrast between the ability to dial a number, and the ability to understand what is happening.  Truthfully, my daughter also dials her pretend phone and has conversations, so the relative difference between her play and her true ability to place a call is nil.  The bar has been lowered, sacrificing some reliability.

Here is a short list of some anachronistic things I think we could all learn from, and possibly resurrect:

1) The rheostat.  This is basically a control to determine the level of sound, or electrical function, or whatever is being controlled, that is allowed to pass.  Think about the volume control.  I recently had to adjust the slider control for my speakers, then the volume control for my video player, just to be able to clearly hear the sound.  That is because in the effort to engineer digital controls into everything, there have been multiple controls placed into the same application, and both are required.  A simple rheostat that controls the physical volume of the speakers does the trick without having to make multiple adjustments using multiple similar, but maddeningly different in a few key areas, interfaces.

2) The physical connection.  No, really.  It struck me today that, though the technology has improved greatly, with every leap forward in 'technology', we sacrifice a little QoS - quality of service.  I use Skype on a daily basis, and I experience signal problems there which would never be experienced with a physical connection (I realize there is a plug at a couple steps in the process, however 1) you don't have a dedicated circuit, just a logical one and 2) you use radio technology if you're using wireless access, so signal degradation is as close as the nearest electrical device.  First there is a problem with the video feeds - annoying, but understandable, and certainly something that I'd expect with a physical connection as well, because of the slower nature of the data transfer; then comes the call killer - audio difficulties.  If the video feeds are shut off, then the quality of the call should be at least as good as it used to be when a call came in on copper, through copper switches.  If we cannot fix the quality of our calls to be at least as good as they were when we used copper - I won't get into the strangeness that is present when trying to understand how a signal sent using electricity can be clearer than one sent using light, the purest substance we have - can we truly say that we have accomplished anything other than adding to the capacity to transmit data?  This is the digital equivalent of putting 500 people in steerage and heading out on a trans-Atlantic voyage, versus putting 100 people in cabins and making more trips.

3) The chalkboard.  We moved to the dry erase world a while ago, but there is a problem.  When you are solving a problem, you first have to decide which colors to use for which pieces of the puzzle, then you have to find markers that work - does it bother anyone else that people who are supposed to write thousands of lines of code that will be used to issue payrolls don't have the pre-requisite skill it takes to close the cap on a marker?! How good is their code if their coloring skills are suspect?  Out of the lines = code.Fail, my Moutain Dew besotten comrade!.  Then you have to make sure the markers aren't permanent.  Then you have to find the eraser.  When you have a chalkboard, you have one color that you use, and a freeform experience.  You don't worry about colors, or about anything else other than working on your problem.  And erasers are optional - anything can be an eraser, and it comes out in the wash.

4) The CD.  WAIT!  CD's aren't dead yet!  Maybe not, but MP3's have put a serious hurt on them.  I equate this one to taking my kids to one of Emeril Lagasse's restaurants a couple years ago.  Serious coin was dropped for some of the greatest food on Earth.  I savored each bite, wringing every bit of flavor and texture out of the dish, enjoying it in a way that would make the chef feel good about the craft.  My kids devoured everything with the same gusto and speed they do a Big Mac, so the details didn't make a bit of difference to them.  A $40 duck entree and a Big Mac being placed at the same level?!  The same is true of the music stored on a CD, versus that from an MP3.  The MP3 produces a digitized, COMPRESSED version of the music.  It squeezes the same music into a tighter space than a regular audio file.  When you squeeze audio, you lose some of it.  It happens.  The average of 3 and 5 is 4.  The average of 4 and 4 is... 4.  When I try to uncompress those numbers, how do I know which are 3's, which are 5's and which are 4's?  There are algorithms that can do it, sort of, but by the time you take out sampling loss, and compression loss, and, and, and, the music suffers.  If you're not a fanatic about your music it doesn't make a difference, but I want to hear the bass line, and I want to hear the little things the keyboard player is doing, and unless I'm listening to some grunge, I don't want muddied-up sound.


There are many other things that could go into this list, but I think the point is made.  With great technology comes great responsibility...great design and a dedication to using the best, not just the 'looks like everything else' should also follow.