It Takes 6 Days to Change 1 Line of Code

(A true story.)

Philip (President): Our factory is underutilized by 10%. Either we start building more of our backlog or we lay people off. I'd rather keep everyone busy, build inventory, and get ahead of the curve before the busy season. How can we do that?

Lee (Operations Manager): Company policy restricts us from building more than 3 months of backlog. If you just change that to 4 months, we'll have plenty of work.

Philip: Done. Now how do we implement that?

Lee: I'm not really sure. I think we'd have to change a setting in the legacy software.

David (IT Director): No problem. It's probably one line of code in our core routine. Fill out a ticket and submit it to IT Services.

Judy (IT Admin): I'm assigning this request Ticket# 129281. But it still needs the section on Business Impact completed and Director approval.

David: It's for Philip. It we don't do this right away, we'll have to have a layoff.

Judy: OK, then I'll fill out that section myself and put this on the fast track.

2 days later.

David: What's the status of 129281?

Judy: It's the first Enhancement in the Developer Queue, after 14 Bug Reports.

David: Forget the queue. Mark it urgent and send it to Ed immediately.

1 hour later.

Ed (programmer): On line 1252 of Module ORP572, I changed the hard-coded variable MonthsOfBacklog from "3" to "4". I unit tested this successfully and ran 2 batch test runs. The Operations work queue increased 10% as expected. This is good to go. I just submitted it to Code Review and moved in to Homer for User Acceptance Testing.

Shirley (Code Review): It is now against company policy to have any hard-coded variables. You will have to make this a record in the Parameters file. Also, there are 2 old Debug commands, an unassigned variable warning message, and a hard-coded Employee ID that will all have to be fixed before this module can be moved to production.

Ed: Fuck that shit.

Shirley: That may very well be true. But since you were assigned ORP572, you are responsible for fixing preexisting errors that violate new company policy. I cannot promote this as it is.

2 hours later.

Ed: OK, done. I just resubmitted it to Code Review.

Julie (IT Testing): Homer is not available for User Acceptance Testing because Fred is running a controlled test for month-end accounting close. Use Marge instead.

Ed: I don't have access to Marge.

Julie: Then contact Joe in IT Security. He'll get you permissions.

2 hours later.

Joe (IT Security): I cannot grant you access to Marge without David's signature. He's out of town. Can this wait until Monday?

Ed: I don't think so. Philip wants this right away. Get him to grant access.

Shirley: Your new Parameters record "MonthsOfDemand" needs a better name. The offshore programmers won't understand what this means. Also, it should have an audit trail of changes.

Ed: What policy is that?

Shirley: It's not exactly written down anywhere. The offshore team is 3 months late updating the wiki, but I assure you, all new Parameter records must satisfy new naming requirements and keep audit trails.

1 day later:

Ed: I renamed the Parameters record "MonthsOfDemand" to "SelectedMonthsOfBacklogDemand" and added Module PAR634 to maintain that record and its audit trail. I have submitted it to Code Review.

Tony (IT Testing): I see 129281 on Marge, but I have no Test Plan.

Ed: Just run it the old way and the new way and note the increase in the total on the WorkOrdersHours report.

Tony: That's your test plan? No. This affects everything in the factory. I have to have user selected Test Cases, Expected Results, documented Test Runs, and user sign-off.

2 days later:

Philip: David, tell Tony to move Ed's program to production immediately.

David: Yes sir.

Total elapsed time: 6 days.
Lines of mission critical code changed: 1.
Bytes of mission critical code changed: 1.
Excedrin eaten: 24
Pissed off hours spent on Hacker News: 14.

Dear Boss: For a programmer, 10 minutes = 3 hours

10:48

Boss: Hey Ed, Sue in Detroit says that sometimes, the wrong Invoice Part Number is showing up on the Product History Screen. Can you help us figure this out.

Ed: I'm busy with something else at the moment. Put the ticket in my queue.

Boss: This will only take 10 minutes.

Ed: Are you sure about that?

Boss: Yes. I'll just set up a web conference. Sue can show you right away, then you can look into it when you have time.

Ed: OK.

Boss: Great. Check your Outlook for an invite.

11:05

Got an Outlook invite for a web conference at 11:30. Accepted.

11:25

Called the web conference 800 number from my IP phone. Busy. Tried again twice. Busy both times. Called my cell phone from my IP phone. Busy. OK, the IP phone system is screwed up again. Called the web conference phone from my cell phone. First one there. On hold. Clicked the link in my browser to the web conference. First one there.

(Ed starts reading Hacker News in another tab.)

11:38

Boss enters conference call: Where's Sue?

Ed: I don't know.

Boss: Can you see my screen?

Ed: No.

Boss: OK, hold on. Let me be the host. Can you see it now?

Ed: Yes, but I thought Sue was going to demonstrate the problem.

Boss: That's right. I'll just transfer host mode to her.

(Ed continues to read Hacker News in another tab.)

11:47

Sue enters conference call: OK, why are we here?

Boss: So that you can show Ed what's wrong with the Product History Display.

Sue: What's wrong with the Product History Display?

Boss: You know, sometimes the wrong Invoice Part Number displays.

Sue: You mean for mil-spec orders?

Boss: I really don't know. You sent the ticket.

Sue: What's the ticket number?

Boss: Hold on, let me check.

(Ed continues to read Hacker News in another tab.)

11:53

Boss: It's ticket number 13827. Remember now?

Sue: How do I see tickets on my PC?

Boss: Just click on the I.T. dashboard on the intranet.

Sue: I can't. The web conference software went full screen.

Boss: Then just hit Alt-F4. Then go to the intranet.

(Ed continues to read Hacker News in another tab.)

11:57

Sue: OK, what was that ticket number again?

Boss: I should have written it down. Let me look it up again...

Boss: 13827.

Sue: OK, I see. This only happens once in a while. No one knows why. It always breaks on Part Number R27-83.

Boss: OK, show Ed.

Sue: How to I get back to the web conference.

Boss: You have to start all over. Alt-F4 killed it.

(Ed continues to read Hacker News in another tab.)

12:04

Sue: OK, the web conference is up again. Can you see my screen?

Boss: No, you have to click "Host".

Sue: Where?

Boss: In the little box in the upper right hand corner.

Sue: The "History" box?

Boss: No, the "Attendees" box.

Sue: OK. Can you see my screen now?

Boss: No. Try again.

Sue: I did. It said that you have to give up host mode.

Boss: OK. I didn't know that.

(Ed continues to read Hacker News in another tab.)

12:14

Boss: I gave up host mode. Try again.

Sue: OK, can you see my screen?

Boss: Yes.

Ed: Yes.

Sue: OK, if I go into the main menu, click "Operations", then click "Sales", then click "History" it takes me to the Sales History Menu. See?

Boss: Yes.

Ed: Yes.

Sue: Then I click on Sales History Display by Part. I enter "R27-93" and the main screen pops up. Then I click on Invoices, I hit F5, then F3, then F7, and the Invoice Part Number changes to "GT548". This should never happen. What gives?

Ed: OK, let me check it out and get back to you.

Boss: OK, bye.

Sue: OK, bye.

Ed is now stuck in host mode because the other two logged off. He can't get out. Windows locks. He reboots.

12:38

Ed logs back in and goes to the dev system. He goes to the main screen, clicks "Operations", then clicks "Sales", then clicks "History" it takes him to the Sales History Menu. Then he clicks on Sales History Display by Part. He enters "R27-93" and the main screen pops up. Then he clicks on Invoices, hits F5, then F3, then F7, and the Invoice Part Number remains "R27-93", just as it should. It works in dev perfectly.

12:46

Ed logs into production through his secret back door. He goes to the main screen, clicks "Operations", then clicks "Sales", then clicks "History" it takes him to the Sales History Menu. Then he clicks on Sales History Display by Part. He enters "R27-93" and the main screen pops up. Then he clicks on Invoices, hits F5, then F3, then F7, and the Invoice Part Number changes to "GT548". Sue was right.

12:57

Ed checks the Version Control System. The program has been checked out by Fred since November 11. He runs a diff and sees that Fred has found and fixed the problem in the 425 lines of code he has changed.

1:03

Ed calls Fred to see what he's been up to. Voice mail.

1:07

Ed emails Fred, explaining the problem.

Ed returns to Hacker News.

1:17

Fred calls back. Ed tells him to read his email.

(Ed continues to read Hacker News in another tab.)

1:28

Fred calls back: OK, I remember that. The program was broken by one of the offshore programmers who was changing the header on every program in the Operations directory. He accidently removed a line of code before he recompiled. Somehow, it made it through QA, and now Sue has found the bug.

Ed: Well then, can you promote it now?

Fred: I don't think so. There are 12 other changes in this mod. Let me check and call you back.

(Ed continues to read Hacker News in another tab.)

1:36

Fred calls back: I can't promote any of these changes until the XL500 mods go through first. They're on hold until QA approves the spec. So we just have to wait.

Ed: OK, thanks Fred. I'll just email my boss and tell him.

Ed emails Boss with the explanation.

(Ed continues to read Hacker News in another tab.)

1:48

Boss: OK, this sounds like a problem. It looks like I'll have to escalate this to the Steering Committee. I'm glad you had 10 minutes to spare. Thanks.

(Ed continues to read Hacker News in another tab.)

Insidious Bug or Comedy of Errors?

A client presented me with an obvious and significant problem that required immediate attention. I worked on the problem and helped them solve it. Along the way, I discovered a whole bunch of things that merit further examination by software developers.

The names and facts are changed and I will present the code as pseudo-code. I never compromise client confidences and the technology doesn’t matter: this could have happened anywhere.

This is pretty much bread-and-butter backroom application software for a large enterprise that processes lots of orders for lots of dollars...

I was presented of a pdf of a Purchase Order that had been emailed to a Vendor. The problem? No prices. Yikes. This could be a huge problem. The company emails thousands of Purchase Orders to Vendors every day, full of data supporting critical legal and mission critical transactions. The fundamental data elements are Part Number, Quantity, and Price. How could the price be missing? And how could it only be missing from one (or a few) out of thousands of Purchase Orders?

I started by doing what any digital sleuth would do: I tried to recreate the problem. Fortunately, this worked on the first try. I reprinted the Purchase Order and sure enough, no prices. I reprinted several others and there were prices.

The next step was to isolate the problem, debugging backwards. Output Record? Blank price. Variable feeding Output Record? Null value. Price on Purchase Order data base record. Fine. Hmmm. Next I examined the logic pulling the data from the data base and placing it in the output variable. It was looking for the Price in Column 22, the column for Foreign Currency Price. On an order to a California Vendor? OK, I was onto something.

I zeroed in on these two lines in the print program:

CurrencyCode = PORec[45]
if CurrencyCode = "USD" then PriceCol = 21 else PriceCol = 22

What was in Column 45 of this PO Record for this California Vendor? "USD" and a bunch of delimitters. Hmmm. That would cause PriceCol to be 22 when we obviously want it to be 21. The Price was in Column 21 but we are pulling a null out of Column 22. Bingo.

The customers are screaming. The business is suffering. Now what?

Stupid way out: Get the Currency Code from the Vendor record, not the PO Record
Lazy way out: Strip the delmitters from PORec[45].
Right way out: Find out what's putting delimitters into Column 45 of the PO Record.
Long term solution: See below.

The right way out can be very difficult with a large code base. First I isolated the 614 programs that had been promoted into production in the last 90 days. (I figured that the problem was new so the culprit program must be fresh.) I searched for the string "45". 42 hits. Nothing suspicious. Next I looked at data dictionaries and canned functions that provided potential synonyms for Column 45 of the PO Record. I found four possibilities. Then I searched the 614 programs for each of these. Nothing. Hmmm. Standards that no one follows. OK.

Then I simply scoured the list of 614 programs. One name caught my eye: "PoSplitter". Brand new. Written by a contractor who didn't know the whole application. Promoted 3 weeks ago. I read the whole program. No reference to "45", "Foreign Currency", or anything seemingly related. But one variable looked suspicious: DatasetCols. What was this? A list of columns in the PO Record that had matching multiple values, one for each Part on the PO. DatasetCols was a global variable passed down by a master routine. I read that routine and (bingo!) found 45 in the list of DatasetCols. I traced the mods back to 2005 when it was added to the list.

I double-checked the data dictionaries and the common functions. All said that Column 45 of the PO Record must be a single Foreign Currency Code defaulted from the Vendor Record and joined to a preset table. On the other hand, the master PO routine had it in a dataset list. A dataset list that had never been referenced by any other program until that contractor used it in PoSplitter. So, as soon as his program went into production, for every Purchase Order that was "split", Column 45 kept its original Foreign Currency Code along with a delimitter for each Part on the PO. Which in turn caused the PO Print program to fail to secure "USD" and automatically default to Foreign Currency (note that this bug would never affect foreign orders).

The immediate (right) solution:

1. Remove Column 45 from the variable "DatasetCols" in the master routine. Recompile all affected programs.
2. Clean up the data base.

The long term solution:

1. The data dictionary must be the Bible. Have no other code, variables, or function that can possibly say something else. Variables like "DatasetCols" must never be hardcoded, but must be populated from the data dictionary. All synonyms must also be defined in the data dictionary, not in many other routines.

2. Don't use datasets. Normalize your data. (Enough said).

3. Don't have hanging conditionals. Will If...then cover all possibilites? No? Then make a Case, catching any errors. ("USD***" is NOT a valid Foreign Currency Code!)

4. If something breaks, break it! The first time an error was encountered (see #3 above), the PO Print program should have stopped and demanded a help desk intervention. But since errors weren't being captured at the point of failure, 3500 Purchase Orders were printed without prices for three weeks before anybody who cared noticed.

5. Learn the app before you change it. I realize that this is easier said than done, but I'd like to think that the contractor should have understood what all the columns in the PO Record that he was changing. He simply trusted the variable "DatasetCols". Do you imagine that a senior developer would have caught that Column 45 was inconsistently documented in the existing code base? I don't know, but it's an interesting question.

6. Parallel test. The Split Line enhancement was big enough to run an automated parallel test. Column 45 of the POs from the test data base would not have matched those from the Control data base. This would have stuck out like a sore thumb if anyone had bothered to check.

7. Regression test. Just because the stuff that should have changed did change as expected, did everything that should not have changed stay the same? (I know, I know, how do we test for "everything else".) There's no easy answer for this, but doing nothing is the worst possible alternative.

What else would you add to my Long Term Solution list?

Why is BASIC still OK?

“It is practically impossible to teach good programming to students that have had a prior exposure to BASIC: as potential programmers they are mentally mutilated beyond hope of regeneration.” 

For what it’s worth, I have written over 1 million lines of BASIC for over 100 customers, most of it still it production, and all of it doing important work producing goods, services, and jobs in so many uncool industries we couldn’t live without. 

Maybe I’m an outlier, but I have gone on to learn algorithms, dynamic programming, database theory, client/server, and web development. I believe the elegant simplicity of BASIC and database theory, although limited in application, has provided an excellent base upon which to build. 

I know that ewd is a giant to be respected, but I think it’s a red flag when a teacher mutters “practically impossible to teach”, even in jest. IMHO, that says more about the teacher than the student. 

Thoughts like this are great for a laugh, but when you stop to think about it, all they really do is further amplify the perception of a huge gulf between theory and practice. Academics whine while those of us in the trenches are too busy to notice because our sleeves are rolled up while we build that which must be built. 


How far should automation go?

“Weak human + machine + better process was superior to a strong computer alone and, more remarkably, superior to a strong human + machine + inferior process.” 

I had to read this statement 3 times before it hit me: What’s true in chess is also often true in business. A little background… 

I recently wrote a forecasting system for a company that processes 7 million orders per year. Worse, this company was the merger of two other companies, each of which did forecasting differently. One had a very expensive Oracle based “strong comptuer” that calculated almost everything and told the planners exactly what to do. The other just dumped data into Excel files and teams of “strong humans” manipulated them until they intuitively worked out the best plan. Neither team could believe the way the other team worked. 

The system I wrote using guidance from both teams turned out to be “weak human + machine + better process” which leveraged the strengths and minimized the weaknesses of the two extremes. 


How can clever software help customers?

Just a few off the top of my head: 

1. As part of the research for requirements for a new inventory package, I noticed that every pallet was counted by 3 different people and the lowest count was recorded. I worked with plant supervisors to fix the procedures. Management then realized that there was now no need for new million dollar software. They rewarded my effort and concern for the company with lots of great project work and money. Lesson: Look for the obvious first. 

2. A user asked me to help solve her forecasting problem. The two of us sat down and designed the software to do it. I realized there was a parallel effort to do the same thing in another division (with an expensive purchased package), so I made my software work for both divisions. It took 3 weeks to write and people were very grateful. I was employee of the month and got a nice bonus. Lesson: Sometimes little things can solve big problems. 

3. I noticed that warehouse pickers were bending and climbing ladders a lot, so I suggested modifying our inventory system to place the most popular items in bins between the knees and shoulders. The change took one week and made us 10% more efficient (a lot of money after a few months). I would have never thought of it if I hadn’t been walking around, trying to understand how my software was being used. Lesson: Give yourself the chance to find opportunities.

“Did you build anything that you later spun off into a better job or a side business?” 

Yes. Everything I learned using these methods went into 2 businesses: a small business software package and a consulting practice. If I hadn’t stretched myself, who knows what cubicle I’d be sitting in today. 

What’s a minimalist coding style?

“A minimalist lifestyle does not make you a better person” 

But a minimalist coding style “does” make you a better programmer. 

I really don’t mind a few extra Philips screwdrivers, kitchen knives, or pairs of shoes in my house, but I every superfluous bit of code in my repository drives me nuts. 

Others say I go overboard and they’re probably right, but I can’t help myself. 

If a 6 character variable name can be shortened to 5 characters without losing meaning, then I do it. Same thing with labels and function names. If I find the same line of code twice, I write a function (but only after whipping myself). Complex If Statements are replaced by Case. Complex Case Statements are replaced by arrays and pointers. Two programs look alike? Replace them with one parameter-driven program. Two forms look alike? Replace them with a flexible form app. Reports? Same thing. 

Old data? Archived! Old programs? Archived! Old notes? Archived! And not one trip to Goodwill, just to my e: drive. I’m so proud of myself when I can fit the software needed to run a $100 million company on a 256K thumb drive.

There must be a 12 step program for people like me. But then, by the time I was done with it, it would be a 7 step program. 

Documentation Belongs in the Code

The advice from this post is exactly what you’d expect in theory and exactly “what not to do” in practice. For one simple reason: the source code is (hopefully) the only thing pretty much guaranteed to survive. 

I have seen countless shops where valuable history was lost because it was stored on someone’s c: drive, a network drive, or some repository that failed to survive some kind of migration. And even if these other files (digital or paper) did survive, chances are that the programmer that needed to see them never did anyway. 

Good shops practice keeping audit trails “in the source code”. This means good commenting. Which means good code review and quality control. 

I recently came across a single piece of code that had been changed back and forth 6 times in the previous 2 years. The comments looked something like:


* jeo 02/11/09 Use Ship Date, not Book Date per Sarah in Sales 

* jrm 04/15/09 Use Book Date to make military contracts balance 

* msl 08/24/09 Use Ship Date per Joe in Ops (military no longer active) 

* jrm 12/13/09 Use Book Date per Rick Smith to prepare for new contracts 

* jrm 02/14/10 Use Ship Date per Rick Smith after Ops meeting 

* jrm 05/25/10 Use Book Date per Rick Smith until Q3 migration 


I know that this is an extreme example, but this stuff happens all the time in commercial environments. How easy do you think it would be for the programmer/analyst to provide background if these comments were not in the source code, but somewhere else? 

Sure it’s a pain it the ass to maintain this, but it immediately provides the needed background to the person who needs it, when he needs, where he’s already working. For critical projects with confused users (what isn’t), the alternative is usually much more work. 


Why pre-develop?

“I’ve never seen anyone able to design something away from keyboard that doesn’t change significantly once it’s written” 

It “does” change once it’s written.

The idea is to get a clear work plan on a “close enough” design. I estimate that my first cut of anything is maybe 50% or so. 

The idea is also to avoid sitting at the computer all day and then being disappointed with how little I accomplished. Activity != accomplishment. 

A little more background… 

First term freshman year, 90% of science students took Chemistry I. On Mondays and Wednesdays, only 50% of the seats in the dining room were taken for dinner. Chem Lab started at 1:00 p.m. and dinner was at 6:00 p.m. So, most freshman chemistry students took more than 5 hours to complete their lab work. 

This never made sense to me. I took Chemistry I second term freshman year. My lab partner and I made a pact to “never” miss dinner. We did everything we possibility could to expedite lab time. We did all the reading, planning, and reviewing other people’s results “before” we entered the lab. We even wrote our reports in advance, filling in the results as we went. Our longest lab took 2 1/2 hours. Our shortest took 1 1/4 hour. (We also both got A+.) 

I still practice that methodology today. My computer is my lab and my bed or sofa is my lab prep. Preparation takes as long as it needs. Labs go fast. If they don’t it’s because I wasn’t prepared enough when I started. 


What can be optimized?

I’ve always thought there were 2 types of things that could be optimized: 

1. Things that need to be “cleaned up”.

2. Things that never should have been written in the first place.

Simple example of Type 1: You rush to get something up and running, and in your first code review, you find the exact same code multiple times. So you write a function, parameterize a few variables, tighten it up, and reference it all over the place. Cool. 

Simple example of Type 2: You have an SQL SELECT inside an iteration. At 500 iterations it runs smoothly. At 50,000 iterations, it becomes non-functional. Your only hope to scale this thing is to rethink the whole process to run with one SQL SELECT (and maybe a database redesign) outside the iteration. You basically have to start over. What were you thinking? 

You need to trust your “process” that Type 1 things will rise to the surface in due time, thus avoiding premature optimization. 

For Type 2 things, there is no such thing as “premature optimization”. They need to be designed and written properly in the first place.