I was happy. I was extremely happy. 3 months of hard work had finished and the package containing 40 units of the product I was working on was shipped to the customer.
I’ve tracked the packed and verified it arrived at the warehouse where it was to be tested before installed at the first beta site. so I checked my email every day for news about what’s going on. i’ve checked it multiple times a day. and then the phone rang. on the line was the client’s technical manager, and the system was not functioning all that great. from what he described, it sounded like it’s stuck in a loop. this shouldn’t have happened and I felt horrible.
we spent the entire evening trying to figure out what is going on. we were going over the schematics and scrutinizing every part. a previous revision had been working for us before, so the finger was pointing at the changed I’ve made. but the new circuit was actually simpler than previous version. could it be the external peripherals that were connected had changed? it was supposed to be dry contact interface, so nothing much there.
I didn’t had much sleep that night. going to sleep at 2 am, I woke up at 6:30. luckily when you’re working with counterparts on the other side of the globe, they keep working while you’re not. in my email inbox was an idea that could explain what we’re facing. I quickly made coffee and rushed to test his theory, and he was spot on. sigh of relief.
the problem was a collection of unrelated thing. the device we were interfacing had a push button that I hocked up to my cpu with an optocoupler. it also had a lot of interlock loops connecting various pins. one of those links connected the push button to another pin I was also monitoring , and when connected together one circuit triggered the other.
the fix turned out to be as simple as removing a resistor, but then I thought what could I be doing to prevent this from happening? and I figured I’ve broken some of my own rules…
Less is more
I once took my car into service at the local garage. while I was waiting for the oil change, and new flashy sports car came in. it looked beautiful, and had a strong engine and a soft-top that could be opened with the press of a button. I started fantasizing on owning such a car myself, so I’ve asked the mechanic what his opinion. his response surprised me: “its a collection of a lot of useless things that can go wrong”.
the offending circuit was part of the original design, but was not used by the software so I didn’t pay attention to it, just kept it there “just in case” and it waited there to cause trouble.
test test test
engineers are lazy. I’ll even argue that’s why we became engineers, so we could build machine that will do the work we hate for us.
in my case, testing everything is a lot of boring work. so I’ve tested most of the things I’ve thought were important. I’ve tested the main interface, and even some rare use cases. but I didn’t test everything.
now i’m going back to resurrect the automatic tester I’ve started to build but abandoned since there were more “important” things to do.
abstractions are leaky
the best thing of course would have been to test the system against real equipment. unfortunately it’s big and expensive. so I had to simulate. simulations tend to not represent reality, or at least not whole of it. and it will tend to show these imperfections at the worst possible times.
the irony is that my simulator actually had a switch to simulate that interlock, but since I wasn’t using it, i didn’t pay and attention, and left it in the off position.
Two pairs of eyes are better than one
there is a saying in yiddish “A guest for a while sees a mile”.
I’m working solo. it very efficient and it’s easy for me. but there are disadvantages. We think we can trust our eyes, but when we look at information, we are subjective. our brain automatically categorize and prioritize the data. we pay more attention to some things and ignore others. and each brain is biased.
when I went over the schematics trying to find the source of error, I overlooked some parts of it. I know the software isn’t using some signals, so what importance does the circuits behind them have ? Carl didn’t owned these preconceptions. so it jumped right out for him.