Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
What Are the Tricky Parts of Embedded Development? (embedded.fm)
130 points by ingve on Feb 21, 2017 | hide | past | favorite | 111 comments


Those are all beginner embedded problems. Big problems include

* You're developing a controller that controls something. Now you have to have the hardware it controls. This can be a sizable piece of industrial equipment. You may need a simulator for it.

* I've seen auto engine controls developed. Phase I was run connected to an engine simulator. Phase 2 was run connected to a test board with the auto components, including one spark plug, and an analog computer to simulate the vehicle power train. Phase 3 was run on a test stand with a real engine. Phase 4 was run in a car with a debug system plugged in. Yes, you need all that stuff.

* Safety issues. You may need a separate safety system to monitor the primary system and shut it down. Traffic lights have a simple hard-wired device which has inputs from all the green lamps, and a PC board wired with diodes to indicate which can't be on at the same time. If the checker detects a conflict, a relay trips, all the lights go to flashing red, and the CPU can't do anything about it. Some systems allow the CPU to try again after 30 seconds of flashing red, but usually it requires someone to come out and replace the electronics.

* JTAG is your friend, but JTAG is so low-level that it's a huge pain.


Phase 1 was running with blinky leds on breadboard. Phase 2 was prototyped version running under the seat of my motorcycle hooked up ti the engine sensors and datalogging the output of my hardware/software compared to the OEM ignition/FI while I rode it round the block and local neighbourhood a few times. Phase 3 was hooking my version up and riding it around... Safety issues, huh? ;-)

(Arduino driven ignition and fuel injection for a motorcycle a bunch of years back. Gave up on it 'cause though it worked well, I never got it reliable enough. Fun way to waste a year or so's worth of weekends/evenings...))


Look into megasquirt. I did a completely custom setup with it years ago and it was great. Put a few thousands miles on the car, as reliable as OEM. Built the thing from a blank citruit board, built my own wideband controller too. Was damn surprised that everything worked.

Support was pretty good for custom sensors, only problem is your knowledge has to run damn deep with these things before you can get the engine to start :)

I had wideband autotune for O2, idle control, knock retard, coolant and air temp compensation, basically every original sensor from the motor and a few extras I bolted on worked great

Of course, you need to know how to get an engine running from a blank fuel and spark map...And blank enrichment/retard and aux maps for all these sensors.

Anyways, was good fun. Lot more interesting than the toy stuff people are doing with Arduinos for the most part


I've heard better things about DIYEFI, mostly entirely because megasquirt isn't properly open source. Other than that I think they're both pretty good.

http://www.diyefi.org/why.htm


I went the other way - an older bike and Mikuni Flatslides ;-)


That sounds incredibly cool. Did you write anything up?


A bunch of posts in a forum that's no longer around - mostly... This was back in '08/'09 and the MegaSquirt project was getting off the ground with _way_ more capability than I had in my hardware. (And I sold that bike, right now none of my bikes run fuel injection - although I've got crazy ideas about using bits off the Honda Grom to convert my 2 stroke 125cc Cagiva Mito from carbs to FI... Maybe... One day...)


Adding a few more:

* Not giving up when your errors do lead to physical damage. Repair the heat exchange radiators that froze because you didn't test what happens when outside temperatures drop. Change the timing belts that snapped because of off-by-one mistake when estimating position of part on conveyor. Straighten the door bent by air pressure when your pressure/temperature regulator oscillates because it was tuned without taking sudden disturbances (opening the said door) into account.

* Staying motivated during the long product development cycles. Just when you think you're finished when manufacturing starts- there is still a lot of grunt work to do.

* Designing a logging system that stores enough information to pinpoint issues yet be lightweight enough.


And relatedly:

* Going back to V-model / Waterfall development.


> * JTAG is your friend, but JTAG is so low-level that it's a huge pain.

Yes, JTAG is definitely your frenemy. Hopefully you have a fast version or even better a full trace probe but those are usually vendor specific so more cost


What's the difference between JTAG and a trace probe?


For the Cortex series, at least -- a full trace probe will give you a buffer that allows essentially realtime instruction view, global, long-term recording of all executed instructions and memory accesses, etc. (I imagine the same idea is prevalent among all tracers, modulo whatever vendor-specific addons they have.)

An example are SEGGER products -- their ordinary JLink products (JTAG/SWD) vs their J-Trace series (dual Cortex Trace/JTAG) -- https://www.segger.com/jtrace-pro-cortex-m.html -- The difference is somewhere in the realm of $1000, in this case (plus some other CPU support differences). Their tools are even fancy enough to reconstruct the instruction trace back into live code coverage/profiling reports (using the DWARF information). Your IDE "populates" the report live as the trace comes in.

(As a side note, SEGGER makes an excellent and affordable JTAG probe if you're new to embedded stuff like me and using ARM systems. It's a standard probe but comes with a _lot_ of very useful features and is standard GDB-compatible -- the RTT features are much nicer and faster than printing over UART/SWO, for example. The non-commercial, basic version is only $60 too, so it's very affordable. I have one along with my BusBlaster for OpenOCD setups.)


I wondered briefly whether it would be possible to get cheaper open-source trace tools, like the very reasonable Olimex devices. Then I looked at the maximum trace output speed of 600Mbit and noped straight out again; far easier to pay the $1000 than try to design my own system to manage that.


One more:

* Just run unattended, bug free, 24/7, for a year. Very tricky.


+1. Once worked on a rail control system; building out the simulator was an entire multi-month project unto itself.


For me it was working with vendors. Coming from web dev, that was the biggest change, that you can't just open an account with Heroku/AWS/whatever and go about your business like you would if you were hacking for yourself.

You've first got to figure out what kind of SOC you want, and no it's likely not Arduino/RPi or whatever is popular for hacking projects. And no it's not the EE responsibility to choose these things; they just route them. You have to work with vendors to figure out what performance characteristics, lifespan, etc you'll need for your system along with preferences such as OS support (first class, 2nd class, community/experimental, etc, because no, you don't typically just download and install the latest Ubuntu and call it a day; drivers etc are often custom and proprietary, so unless you want to write your own driver stack...). It all doesn't "just work out of the box" like a SAAS service. You've got to put a lot of thought into "other stuff" and make some big decisions before you even write a line of code.


I'm one of those EEs, and I think you're short-selling your colleagues. Or maybe your organization hasn't found great hardware people. Where I've worked, the hardware folks are (on average) just as responsible for architecture as the embedded software folks.

Anything less than equal-and-honest cooperation tends to lead to lopsided or quirky designs.


I'll chime in with a confirmation that, in my experience, too, an EE who "just routes them" is either not very good, or working in a dysfunctional organization. Performance isn't something that exists in its own bubble, it comes with clock and power requirements and the devices that offer "more" of it than other devices have to compromise on other things. If a company insists on enforcing a strict hardware/software team separation, both teams have to be involved in these decisions, otherwise the results are borderline catastrophic -- boards that can barely be programmed and debugged, or sub-optimal component choice, bad analog chains, difficult clock distribution etc., depending on who is more involved in the decisions.

Just like, past a certain point, it's impossible to do good embedded development on the software side without at least some basic knowledge of EE, there's no way you can come up with a good hardware design if you're completely ignorant about software development. It's no coincidence that the best engineers are the ones who can do both.


Sure, you're both correct. I totally understated the EE's role. My intended point was that you can't expect the EE to just "design the board and give it to you". Coming from web / PC app dev this was a big shock. It's a team effort to figure out what hardware is necessary and form relationships with vendors, and the SWE's are by no means absolved from that process. After a few years of embedded dev now, it now seems odd that I'd have been surprised by that, but nonetheless I was.


Oops :-). Sorry if I seemed overly pedantic. I was bit by the EEs who literally "just route them" a long time ago and I'm not very sympathetic towards them. I'm one of the weirdoes who are programmers by profession but have an EE degree (not that I'm a good electrical engineer -- I just know enough to troubleshoot my way around hardware bringups and talk to the real EEs), so I find the whole "just tell me what connections you need" attitude particularly infuriating and conducive to terrible designs.


Hitting upvote as hard as I can here.

Sadly this seems to be a common problem; either the teams are siloed, or one of them is in fact outsiders. There's not enough of the tight iterative refinement loop work to put features in the right place. This seems to be why so many companies put out hardware which is handicapped by terrible drivers.


The tricky parts of embedded development? Take all the tricky parts of making all the functional elements of a computer, kill the supportive community, demolish a couple bells and whistles, burn all pertinent documentation, and, most importantly, give it to a single developer to handle all by their lonesome.


Your comment perfectly encapsulates my experience doing embedded development for a military contractor.


It can also be a fun/technically challenging niche job that avoids the paperwork endemic to many jobs in large contractors.


Seconded.


I'm a huge fan of simulations, I don't think you can develop a 'good' embedded system without it. My way of implementing something embedded is:

* Develop a capture program for the input feed/sensors, and capture as much as you can.

* Develop a software model to recreate the input for a 'target' embedded system

* Write the embedded system against the captured input. That will get you 95% there.

* Run it 'live', check anything wrong (you will). Usual debugging & tweaks.

* Do a feedback loop if problem input comes in, and keep /that/ preciously for your test unit sequence.

* Once software is done, every time you make a change, run your simulator with all the test input you have and check your output for divergence.

I very, VERY rarely need to JTAG into a board, I'd rather spend the time on the simulation model and get it accurate as I can than spend time 'debugging' on the target.

That's why I wrote simavr for example [0], but I also use qemu lot for bigger systems. Unfortunately it's next to impossible to get anything upstream in qemu, so most of the work there just is dropped eventually [1].

[0]: https://github.com/buserror/simavr

[1]: https://github.com/buserror/qemu-buserror


Do you have any suggestions on deciding where to draw the line for simulation? It seems that you are suggesting instruction-level simulation of the same binary which is going to be deployed, correct?

What is the performance you're usually seeing for simulations like that?


No I do not necessarily advocate instruction level -- sometime it's just not practical. If course if you can it's good (aka QEMU or simavr) but sometime it's not possible or not necessary.

For example in my last opens source project. I fed the signal I was receiving to a plain linux program, because I was more interested in the algorithm than the pure embedded bit. THEN I transposed it all into the embedded firmware as is (See [0], there are still remnant of that in the rather shotgun-style linux program).

So 'applicative' simulation is as good for many cases, while the instruction level is good when you want to validate the /true/ embedded responses of the real CPU...

The idea is primarily to be able to isolate the problems, and be able to develop/test them in a feedback loop individually if possible.

As for instruction level speed, well, these days QEMU is likely to be at least as fast as a good ARM CPU without any problem (very often, a lot quicker). And simavr is several hundred times quicker than a real AVR when running on a x86* host. In fact, it's often more of a problem trying to simulate 'real time' like input/timers than the other way around...

[0]: https://github.com/buserror/rf_bridge


You are wise.

This is how I develop, too.

The strategy is essentially to develop the algorithm/software outside the embedded platform, then port it to the embedded platform. I, too, record input data streams and process them via desktop. Then I stream the recorded data through the embedded system.

In the worst case, updates can be made in the embedded system, but in the best case, they can be developed and proven outside then the benefits carried out into the embedded system.

These days, I firewall as much code as possible into "flat", "pure" C/C++ files that contain no low-level calls or libraries or includes whatsoever. Even my tasks under RTX are wrappers to the actual tasks. I can run my multi-task programs under linux just fine with a pseudo task manager. My build system uses the same exact C files for both the desktop and embedded code. There are layers of API calls, but the compiler just optimizes them all out so I don't even sweat it.

You might think that makes it less "realistic," but in practice, it makes it more robust, since the tasks are (mostly!) coded to run at any frequency or pattern of switching.

You might think this massive front-loading takes time, but it's never wasted. Things come out so clean at the end. Especially because lots of documentation can be written between the desktop and the embedded porting.

I was inspired to do this by projects like MAME and MIDIBox. Both are multi-media real-time system across numerous platforms. They benefit from repeated porting. It shakes the bugs out.


What is your opinion on Model-based development? Model creation in MATLAB/SimuLink and code generation with the help of a suitable tool (ex. TargetLink). I recently came to know that this is the preferred method in the automotive industry in Germany atleast.


Bad documentation. The worst documentation is the documentation that looks professional and complete . . . and lies through its teeth. And the support staff at $VENDOR is just reading the same documentation that you are trying to decode, and it takes weeks to get a round-trip through their ticket system.

Once I submitted an issue with a workaround that had some rather nasty side-effects, and asked for a better solution. Weeks went by. They were promising a code sample. More weeks, more "Well, we're still working on this." The solution I got back, about a month later, was exactly the workaround I'd included in the support request, with some of the serial numbers filed off.

One issue took months to resolve. We finally told $VENDOR "Look, we've got to ship. But we can't if we can't get this (critical component) working, and we'll be forced to ditch you." That got all kinds of political bullshit out of the way, and in an hour we were talking to the guy who had designed the circuit we were having trouble with.

"Oh," he said, "You have to do (three simple steps)". And I tried it while one of my cow-orkers kept him on the phone, and it worked.

I never want to deal with $VENDOR again.


> documentation that looks professional and complete . . . and lies through its teeth

Like this 850 page behemoth that I wrote a driver for:

http://web.archive.org/web/20110701163842/http://www.broadco...

So many undocumented pieces and workarounds for silicon bugs, etc. Luckily there is a Linux driver to refer to, but still...yuck.


Yes, I totally agree with this. I spent weeks debugging some some unexpected behavior with a flash memory chip, only to learn that some crucial information was "explained" in one very cryptic sentence. It turns out the datasheet was originally written in Korean and the translated, poorly, into English.


My favorite translation blip came from an Advantest calibration manual:

"Adjust [setting] until the sun shines most brightly upon the screen."

Fortunately, the setting did not have any effect on our friendly neighborhood ball of fusing hydrogen, and a bit of fiddling around eventually revealed that the instrument had turned its gain way up, allowing one to easily determine the zero point by adjusting the bias until the screen filled with noise (rather than pegging at the top or bottom). If one were to squint a bit, I suppose it might look like the sun shining on the screen?


for me, its that the hardware doesn't always work or at least not as you expect. if you write workstation or server code, 99.99999% of your problems will be software bugs. But on an embedded system, especially something custom, you can have weird, intermittent hardware behavior that takes a lot of work to pin down. and sometimes you can't fix it so you workaround. It's both rewarding to get this stuff to work while at the same time extremely frustrating.

I've worked embedded systems for years but every year I tell my colleagues that I'm switching to IT so if my hardware doesn't work I can just throw it away and buy a new workstation.


"Fun" problems from my embedded systems career in no particular order:

1)Writing device drivers for a chip that I can't get to work. Has to be my code, this is a basic function of the chip; why am I so stupid??!!? Finally call the vendor: oh, yeah we know about that bug, the next revision fixes it. Why isn't everyone else screaming about this problem? Well, you guys are only the second company to sample this chip. The first one found the bug!

2) System has to autoload a tray when operator closes the door. Works fine right up until it doesn't. Turns out some of the (visually opaque) trays are transparent at the infrared wavelength of the sensor that detects them.

3) (on a call with Field Service). Intermittent problem that seems to be software: a moving carriage is reporting errors, but only during certain moves. Field tech has replaced every relevant module and problem still happens. On a hunch I have him disable the safety interlocks and open all the doors to watch what happens. Problem is that there is a weak spot in an cable in an unrelated assembly that causes it to momentarily collide with the carriage, making the carriage fail to get to its destination. By the time they both stop moving, all hardware tests fine, so no one suspected the other assembly.

4) I love telling this one: hardware caught on fire because a jammed assembly caused a motor driver to overheat. Tester filed a bug against the software because it didn't report an error to the user. The CPU was fkn toasted!


More stories, please.


This has been my experience too on the limited number of embedded projects I've done. I once had an intermittent issue with the device locking up when going into sleep mode that took me 60 hours to get to the exact cause, which was in the CPLD, where isuues with a similar level of complexity I'd usually be able to pin down in a single day in my expertise area of server software.

Also, recently having received an Arduino project to work on, it was a bit of a garden path to get to the point where I could get it running under GDB.

Now I make sure I always have two working pieces of hardware, and I verify the hardware functionality to the extent possible before trying to implement software features on top of it.


"Oh, so this feature is just totally broken, then? Cool."

Always read the errata, folks.


"Also, we're not fixing the bugs, because customers have already built products that work around the bugs. Fixing the bugs would break their products."

Case in point: Microchip ENC28J60 (http://www.microchip.com/wwwproducts/en/en022889). Less than half of the advertised features actually work.


Yeah, don't use their CAN interface IC's either. Exact same complaint.

In general, I now avoid Microchip like the plague. Which means I now avoid Atmel, Micrel, SMSC, etc. <cries>


What chips do you use then?


Generally I will go for a built-in CAN controller on a microcontroller. Those generally work fine.

Brands? A lot of TI and ST.


Or the cost of the tape-out outweighs the number of customers using the feature.


Yup. The times when I've worked on small-quantity ASICs, tapeout cost and engineer availability have been the main reason that bugs weren't fixed.

The ENC28J60 did at least four more silicon revs after the errata were released, so... yay Microchip?


Funny because I'm in IT and sometimes I feel like switching to embedded systems for almost exactly the same reason.


My answer (not mentioned in article): cycle time. So many code-bases with no tests and no useful emulator.

So each "cycle" can be as long as, flash the thing, wait for it to warm up, get the device into the right state, test the thing you wanted to test (which might involve another device etc).


I recently put together a hack to alert me when the washing machine was done, via an ESP8266 device.

By far the most frustrating part was waiting to test it. I didn't want to run the washing machine (empty) just to test it, so I had to schedule debugging and testing times when we were doing laundry for real.

The moment it worked for the first time was definitely a happy one though!

Edit - https://steve.fi/Hardware/washing-machine-alarm/


This is especially fun when your machine is a medical instrument doing biochemistry that can't be sped up. You need to test a particular behavior under a certain error condition that is only physically possible 35 minutes into a run? Gonna be a long day!


I had a wonderful Heisenbug once.

The machine was several hundred thousand £s worth of lasers, optical table and stepper motors moving a stage that must have weighed half a ton. The purpose was to print digital holograms, line by line, on glass plates. Between each line there was a delay of some seconds, during which time various calculations took place; even the smallest 3"x3" holograms took half an hour to print. Larger ones could run overnight. Once printed, the (expensive) plates needed another half an hour of darkroom wet processing, after which they could be briefly viewed (the image disappeared again until the plate was completely dry - uneven shrinking destroys the interference pattern).

The byzantine hardware control software had been written long before my time - my job was merely to write the imaging software that fed this machine as it ran. There was an elaborate startup procedure to follow, and just before you'd hit "start" you'd run my little C program, which would synchronise with a timing signal from a serial port. Then, as the stage whirred back and forth, it would feed images to a little display - 30 Hz, pixel by pixel, line by line.

One day, just as we were getting confident of the rhythm of operating the machine, a larger hologram came out "back to front" in a strange way - one of the parallax axes was reversed. Perplexed, we carefully tried again with the same file - and it came out fine the second time. Unable to reproduce the issue, the matter was dropped.

Some weeks later, just as our confidence had begun to wax again - boom. Another inverted hologram. Another large plate wasted. Another careful investigation, another failure to reproduce. A Heisenbug!

This continued happening, off and on, most vexingly - and of course, never when I was there, and only with the larger, more expensive plates. Eventually one day I came in and it had happened again, and this time I noticed something peculiar - the hologram had finished printing, but my little C program was just sitting there, waiting on seemingly the last line, for a timing signal that would never come. And the light began to dawn.

My C program had the basic structure "buffer line, wait for signal, display images, go to start". It turned out that what was happening was people were hitting "start" too soon after launching it, before it had had a chance to properly buffer the first line. It thus missed the "start line" signal of the first line - which caused it to print line 1 as line 2, line 2 as line 3, etc. And because it was a rastering stage, this meant that every line was printed in the opposite direction to what was expected.

The really treacherous thing about it was, it only manifested when I wasn't there because I knew instinctively to allow the program enough time to buffer. It only happened with big, expensive plates because they had longer lines which took longer to buffer. And it only happened at all when the operator of the machine was sufficiently confident, or in a hurry, to perform the final two tasks of the checklist within seconds of each other.

The fix? A note in the checklist.

I miss the job, but I don't miss debugging that darned machine.


Great story, thanks for that!


It is a good story, but I find it a truly disquieting thought that it was precisely this class of bug (race condition with an unwitting human) in the control software of Therac-25 that led to the deaths of several people. As I discovered, it's a tricky one to catch in the act because one tends to be slow and methodical when bughunting.


Any chance one could virtualise some or most parts of the hardware?


Yes, but the problem being solved is often a double problem. A) develop embedded software to control a system of "stuff" B) characterize the system so you know what controls to apply.

The virtualization helps you ensure that A is doing what you intend to do, but not as much with B) knowing what your intent should actually be... and even if you can build a reasonable model of 80% of the behaviour, the really critical stuff tends to be out of the envelope type constraints like, oh the sensor gives this bogus reading under conditions X, so you need to ignore it or avoid condition X.

Edit: case in point (an adjacent story in HN)

https://news.ycombinator.com/item?id=13700798 "UPS Showcases New Delivery Drone, Fucks Up"


If you're developing the firmware, use a Hardware Abstraction Layer (HAL), and having the application logic only communicate with/through that. Have a version of this layer which can run on your host computer. The majority of the code then becomes testable with very quick cycles.

The HAL layer for device must be tested separately. Ideally do it in an automated systems test jig, which has the target hardware+software, and allow programmable input stimuli and output verification. You really should have such a setup for your final (per-board) QA anyway, so better to start developing it sooner rather than later.


This only works if you're not doing low level stuff.

I recently ran into an issue where an interaction with the DMA controller and the SDRAM controller in a MCU lead to invalid reads from the SDRAM. I still don't know what is actually going on with the MCU (SAME70, in this case).

Using a HAL lets you debug your application code more easily, but I don't think I've ever had a hard problem in an embedded system where the underlying cause wasn't related to some odd hardware behaviour under a corner case stimulus. There's no way to simulate that without a full model of the entire device (which isn't available).


Yeah, the HAL itself does not help much for hardware issues. But if tooling was built for automated on-device tests (to test your HAL and app logic), then you have a much better chance of having a system which can help you reproduce spurious hardware failures. For instance using generative techniques like fuzzing on the software side, or just a predictable test-patterns which can be observed with scope/logic-analyzer. On the testbench side use temperature stressing, under/overclocking, under/overvolting, adding capacitance or inductance to signal or power lines, modifying grounding schemes, can be done to try to provoke the situation more reliably and (maybe) get a better understand of the problem.


This is money and time. Great if you have it.


Yes like everything else. HAL and host-based simulation usually lets you start developing the software earlier in the product cycle, which can make available a lot of time. If one doesn't have the capacity to invest in automated QA then things might get tricky no matter how things are done... Trying to simplify the problem / reducing scope might be best course of action.


I do agree that it is too costly though, and I think this is partly a tooling issue. Hoping that open source software and hardware can help here.

For instance, on-device testing is a first-class citizen in the microcontroller programming framework I've developed. Shameless plug: http://www.jonnor.com/2017/02/data-driven-testing-with-fbp-s...


It doesn't always make sense.

Debugging a quality control machine I inherited:

The heat sensor would trigger randomly, about twice a week.

It was a software problem, as swapping out for a brand new one didn't fix anything, despite doing it three times.

Everything seemed above board, and the review didn't take that long, it was only 8000LOC, simple C.

I found the answer on a test device: The vendored library that we could link against, but not look at, was reading the temperature from a sensor over I2C... And then reading it into the main onboard chip via an analog pinout. The docs said digital.

Which meant, when an inspector walked passed the machine, and looked at it, their own electromagnetic presence would mess with the readout. It wasn't sealed properly against this, because we believed the documentation.


It's possible in some cases, problem is that work is often harder than the thing you're actually trying to do ><

Also a lot of the time you don't really have good specs or there are closed source bits e.g. firmware blobs for wifi.


The biggest problem to virtualization or other simulation methods is just return on investment. Most applications are so niche you can't justify anything but bare bones tools. Any money spent just getting your product in front of a potential customer will be better spent than hardening your development cycle.


You don't need to virtualise if you treat the embedded hardware as a dependency - in the same way that a database or a remote web service can be dependencies and we can unit test without access to them.

That said, there are some cases in embedded where there is just no possible substitute for the real thing. I've heard of physical actuators and webcams being used to drive devices from test harnesses in those cases. I think a popular example of this that's open source is published by one of the pay tv companies.


I've done a few projects where we built simple robots to physically manipulate the device under test. It saves huge amounts of manual test time.

Many of the Android app test facilities use webcams to report what's actually on screen rather than relying on software hacks (which are different for every device).


For code that is generally in open source land, you may be able to get some traction cross-compiling for x86 and using tools like qemu/virtualbox to test higher-level functionality. However, it's usually the peripherals and whatever high-speed serial links (e.g. wifi) that you need to get better visibility into.


As a former embedded developer, there is another aspect of difficulty that hasn't been touched on. Dealing constantly with physical hardware introduces a host of challenges.

* If you or your coworkers are not organized, you can waste a lot of time looking for proper sized wrenches, proprietary screw heads, speciality crimpers, oscilloscope probes, etc. etc. The more hardware the company makes, the worse this problem can be.

* Most internal connectors are not meant to be constantly plugged and unplugged. In a testing scenario where you have to change connectors or test harnesses frequently, it is common for the connectors to break or wires to become loose. Then you have to waste time figuring out why your hardware stopped working.


Another big challenge is that most of the embedded software I have seen is written by people who aren't exactly top notch programmers.

I spent a about 15 years mostly writing server code for UNIX machines in C (before ditching it in favor of Java, and 13 years later: Go). Since embedded programming is a bit of a specialty field where things like predictable performance and robustness is important, I expected the embedded world to be pretty professional. Because any time people start using words like "guarantees" or "real-time", you tend to assume that they do some pretty amazing stuff.

I can't really say that's what I found. A lot of code is brutally ugly, many lack understanding of even the most basic defensive programming techniques and there's a lot of superstition around abstractions by people who don't really seem to understand what comes out of a compiler (Having the compiler "compile away" abstraction layers was something we often obsessed over on projects I worked on in the 90s and early 00s).

Code is often badly organized, badly formatted, badly documented and amateurishly maintained (eg bullshit commit logs -- if the source is even kept in a version control system). As a result, you constantly fight the urge to rewrite stuff because the code is just so damn hairy. Of course, any talk of rewriting code makes people nervous ("we invested a lot in order for this to work and now it does! Don't touch it". Yeah, I'm not surprised it took a lot of work)

And all of this was code by serious companies whose brands you have heard of.

I'm hoping the IoT craze is going to accomplish at least one thing: educate embedded developers. Sure, a lot of us "regular" software people are going to run around like a bunch of flatfooted morons because it is unfamiliar territory, but the embedded world is in _dire_ need of some software culture and discipline.


At my last job, the EE manager wanted to improve the quality of firmware, so one of the things he did was to ask for a software guy to show them how we did code reviews.

As an EE working in software, I volunteered. One of the comments I made was that the many magic numbers in the code should be replaced by definitions that explained what they meant.

I got back code for re-review that contained the line:

#define ZERO 0

To this day I'm not sure if the author didn't understand or was just irritated at having me review his code. Probably both, come to think of it.


I couldn't agree more!

Although coming from the microcontroller world there is practically nothing in the STL which gives me the guarantees I need to use it in ISRs and most devs have no chance at reviewing the STL piece to see if its going to deadlock in ISR or something.

Another problem is that although the abstraction compiles away in release mode the debug build usually still has to fit on the chip and/or meet realtime deadlines. Tooling is still super bad at optimizing part of a build and not other parts even though we have an optimize pragma (I brought this up in the SG14 working group but concluded that its more of a tooling than a language issue).

In short I think at least the drivers should be written in C++ with proper abstraction but for the most part those abstractions have not been written yet and we can't just borrow from other domains because we have to be deterministic in timing and RAM use and also usually use other threading models (RTC event based) and memory management models (pools, state local storage) at least in drivers.

- Odin Holmes


> lot of superstition around abstractions by people who don't really seem to understand what comes out of a compiler (Having the compiler "compile away" abstraction layers was something we often obsessed over on projects I worked on in the 90s and early 00s).

I have seen this a lot, specially like caring about bounds checking or virtual method dispatch without measuring if it really matters.

Even some modern microcontrollers are quite powerful versus those mainframes running Lisp, yet were we are still preaching Assembly and C as if it was the 80's, early 90's.


Your hope is becoming a reality, at least in my experience working at a semiconductor company (public, mid-sized).

The IoT craze has forced a lot of software on hardware companies: everything from ZigBee, Thread, and Bluetooth stacks to RTOS's to manage all of those stacks, to IDEs that are actually usable.

The companies embracing software as a vital part of their products are doing well, and will continue to do well at the expense of the companies who treat software as an after thought.


I think a lot of that is changing due to the rise of Linux in embedded systems. Many more traditional software engineering concepts are becoming mainstream in embedded.

And for embedded in high value (hundreds of millions of dollars) or human life, I think there has always been lots of proper discipline and engineering process.


I was mostly talking about stuff that runs on much smaller CPUs that have from a few hundred bytes to a few tens of kilobytes of memory. Those kinds of devices are important for applications where you have extremely tight power constraints (like powering stuff off a coin-cell battery for two years etc).


Lucky you even had commit logs / versioning :) A lot of embedded orgs are not actually using versioning even. Just zips with the date...


Stored in SAP.


* You're going back in time about 10-20 years in terms of tool chains, language support, memory/CPU power, debuggers, and for the most part programming paradigms.

* Documentation? HAHAHAHAHAHAHAHA LOL

* Debugging can be extremely challenging in real-time systems. Things like JTAG printf will slow things down enough to wreck your timings.

* You have to at least know the basics about the hardware, especially if you're doing control systems and meddling with GPIOs and such.


Ugh. Dealing some of this right now. 3 full minutes to compile and flash 430kbs to a board. And then the debug tools won't use break points properly. 400kbs are system libraries I can't avoid. I'm thinking of writing a lisp compiler with some ffi and pushing it to the remote memory via tcp. Even having to write my own debugging tools I feel like I might save time.


Oh, you have a debugger you lucky sod.

Currently working on a thing (approaching EOL thankfully, but still have to support and add new features until the manufacturer says "no seriously we're cutting you off") that uses a gcc cross compiler from the 90s, no gdb for me.

Side channel debugging is a life skill. Oh it's crashing weirdly with no diagnostic output on the UART terminal? Rewrite a timer interrupt that's still running to print your diagnostic information when you toggle a switch. It's actually fun if there's not deadline pressure or a line down situation at a customer.


A tiny favorite I remember from when I started working for real as an embedded developer ~5 years ago (before that I was in AAA game development, quite the switch in so many ways!):

I was taking over development of a new display driver for a small hand-held instrument. The display and drivers were both new to the organization, so there was no experience in-house. And we had these strange "color-flowing" bugs, that nobody could understand. Fields of greens and blues that bled across the screen in weird ways. Of course everybody thought it was a driver (=software) bug.

Weeks passed, my hair got thinner, then finally I looked once more on the schematics, traced a signal back to the CPU, and said to the hardware designer "hey, isn't this a 3.3-volt signal?" Turned out we were backfeeding the display driver from its reset line, causing it to power up due to the voltage overpowering the input circuitry and flowing into the power rail, enough to power it up but not to make it behave correctly. Yikes that was frustrating (but fun to catch, of course).


Good story.

I've come to realize software almost always gets the initial blame. Software is almost always the one painting the error screen. So even if it says "Voltage out of range", someone is going to accuse software of not working. So they pull you into the lab, watch you open the box, probe it with a multimeter, show that the voltage is out of range, and then get the hardware engineer.

I once spent over a month (nights & weekends) tracking down a memory corruption bug. Everyone accused software of course. It turned out to be poor signal integrity on the memory bus (hardware problem). It was horrible.


Real hardware regularly fails and you need to recover or people will think your device is flakey. Consider running a program for 20 years on a single chip without someone rebooting ever. Now, consider 100,000 people doing this and everyone thinking something is broken if it fails.


This is always the "fun" part. You're writing code for a medical device that is processing somebody's blood and there may be a physician waiting for the test results. 40 minutes into the test a pump jams or gets sticky from all the fluid it has to push around all day. You absolutely do not want to throw that blood sample away and start over. Hell, there may be no sample left and a nurse is going to have to go stick the baby again to get more. Now you have to figure out a sequence that can recover the hardware to keep processing that test, while not affecting the other 100 tests that are also running in parallel with it. And then design this for all the potential failure points in the system...

Yea, "fun"


My experience: working with the hardware engineers and convincing them of things like yes, you need to latch those signals because, no, I can't poll the signals often enough to avoid missing events.


"We didn't put in hardware PWM because we figured you could bitbang it in software."


Right?


Wait, your hardware engineers actually get your input before they go ahead and fab things? You're so lucky.

There's more than one project that fixes chip issues via errata and curses at software engineers.


No. It ended up being Rev C before we got to production.


One of the tricky part is documentation. The PDF for an STM32 (something way smaller than a beaglebone or a raspberry) is 1700 pages long, it includes the 69 pages long description of a timer, but doesn't include any ARM core documentation.

RTFM is a nightmare, the time you look at another page for a related device, you have forgotten what you read previously.


The STM32s have an odd documentation structure where you have a datasheet (showing pinouts and capabilities of a specific part or subfamily), a reference manual (showing the detailed structure and function of all peripherals, common to the part family) and a programming manual (documenting the core, common to all parts using that core). So in this case, you're looking at the wrong document.


I think a lot of the larger devices do this, otherwise there would be a lot of repetition across the family.


I wasn't really an embedded developer ever but I worked at an industrial IoT company for awhile and used to just get handed devices which I was supposed to connect to the internet and figure out how to make them send useful data to us.

Besides the undocumented proprietary protocols which isn't embedded specific as a backend engineer I used to struggle heavily with development environment setup. As a JVM and Python guy I'm not used to fucking with weird compiler tool chains at all.


I found that doing endian conversion for a system using both PCI and VME buses was quite a challenge.

I also found it a challenge writing Linux device drivers for chips with an external 32 bit bus and an internal 16 bit one. The 32 bit data had to be split into two, set the data pins for the first half, set a bit saying the data was available, wait for it to be read in and repeat with the second half. Do the reverse for reading.

Also challenging was setting the bits in a register for a chip with multiple commands per register. My program checked the input data for an error, e.g. trying to set three bits to 128, read the register, masked out the bits to be changed, changed them, wrote the value, read the register, masked out the bits and checked them to make sure they had been changed.

Then there was the time the board manufacturer changed the memory map of the board without telling us. Boards made before April would boot, newer ones didn't.

I also found that some device drivers from SOC chip manufacturers had to be debugged before I could use them.


Don't you mean "Internal 32 bit bus/external 16 bit"?


Isn't the "Mbed compiler" just GCC running on their servers?

I've successfully made code that worked just fine with the online MBed IDE and gcc-none-arm-eabi, I don't recall anything having to be done differently, just more to setup locally (linker scripts and so on).


I think they've added a fair bit of stuff to it, and are unwilling to release their source, because they don't actually ship the modified GCC, they can not release their changes.

IIRC, most of the stuff they've added are surrounding libraries and such.

It's basically a canonical example of why the Affero GPL exists.


The big benefit of using mbed is there really nice API which abstracts the hardware. The code for the API is Apache, and available on github[1].

Furthermore, you don't even need to use their online IDE to do anything. The online IDE has an option to export your project such that it can be built with arm-none-abi-gcc and make[2], or you can use their mbed-cli tool, which is a python based command line tool, and is also on github[3].

The mbed guys have way embraced open source and working with existing tools, and I wished the community would embrace them more, since I think their API's beat the snot out of Arduino.

[1]: https://github.com/ARMmbed/mbed-os [2]: https://developer.mbed.org/handbook/Exporting-to-offline-too... [3]: https://github.com/ARMmbed/mbed-cli


Huh, I guess that's changed. I remember looking at mbed at one point, and at that time it had no offline option (which basically made it a non-option for me).

I'll have to re-evaluate it the next time I need a little mcu board!


I used the mbed for my embedded class. You can talk to the mbed using the ARM Keil ide and it's so much better. You can actually debug code on the device with breakpoints .


mbed applications can actually be debugged by any IDE that supports GDB, including Eclipse and Visual Studio Code. More info here [1].

[1] https://docs.mbed.com/docs/debugging-on-mbed/en/latest/


The mbed online compiler runs ARMCC - which is a commercial compiler. But mbed applications can be compiled locally using GCC, ARMCC and IAR.


Here is this moderately complex embedded system.

Simple challenge: the system must run 24/7, without interruption, for at least a year. Bug free.

Quite the tricky thing to achieve.


Is embedded work, in general, well paid? And is it relatively easy to get new work once you've broken into it?

It seems to be not very visible. I'm under the impression that some "not very visible" work is well paid and not going away any time soon. I'm thinking of some cobol or pascal developer called out of retirement at great expense.

Or is this like being at the edge of new/exotic web 2.0 where you are continually looking for the next job, re-learning tech over and over.

I know this is a big topic and perhaps there is no correct answer overall.

If anyone has any resources to job sites or further articles I'd love to read them.


Generally not, compared with the difficulty of it. (And the first dozen posts are all bang on the money, and you have to deal with all of those things AT ONCE.) You have to be a multidisciplinary wizard and you will never get paid what some VBScripter gets paid at $bank.

When you're developing a product, hardware iterations take most of the budget. As 'the software guy' your job is to be handed a piece of hardware and some vague requirements, and to make it work the way the end user (who you probably never get to meet) expects it to. The product will already be over budget and behind schedule, so you don't get funding and you're probably already late in delivering the product (no, your deadline does not move due to this).

It's hard to find embedded work because the people who 'need a software guy' for their hardware product don't know any software developers so you have to be lucky to meet potential employers.

That said, there is opportunity for some COBOL-style big bucks later on, when suddenly they need to make another production run of the product you worked on back in the day, and they want a few tweaks, and you're the only person on the planet who has the faintest clue how to build the software and program the hardware, so you get to pull your hair out all over again but this time for a more reasonable wage.


> Is embedded work, in general, well paid?

The trick is breadth of knowledge. If you can just program a microcontroller, you've got a lot of competition. If you can program a microcontroller AND an FPGA, your competition dropped by orders of magnitude (this is the biggest step and probably the easiest to tack on--Verilog isn't that hard but you WILL foul up until you get race conditions and concurrency properly beaten into your head--do everything synchronously and synchronizers are your friends). If you can program a microcontroller, program an FPGA, AND design the board--you are in rare company. If you are good at debugging these boards, people will worship you as a god. If you have domain knowledge on top of that, you should be starting a company.

> And is it relatively easy to get new work once you've broken into it?

Well, it's networking like anything else. If you've been doing it for 10 years, things seem to just drop into your lap. Otherwise, you have to beat the bushes.

However, if you do a good job, the good people around will notice quickly. And those good people are often at capacity so they will throw stuff over to you.

Obviously, being near a tech hub helps.

> I'm under the impression that some "not very visible" work is well paid and not going away any time soon.

The problem with "not very visible" is that it also means "executive level may not appreciate it". I've seen medical device companies trying very hard to get rid of the single person who actually understands their hardware.

> Or is this like being at the edge of new/exotic web 2.0 where you are continually looking for the next job, re-learning tech over and over.

No and yes. :)

ARM is dominating the embedded space currently. So, especially at the Cortex M end of the spectrum, the base programming environment looks pretty much the same. This hasn't changed for quite a lot of years.

In addition, 8-bit development is mostly going away. The delta between an 8-bit micro and a 32-bit micro is now so small that it makes no sense to use anything other than 32-bits unless you have a very specific use case.

For FPGA, the tools are similarly stable over long time frames.

The peripherals, on the other hand, completely differ from manufacturer to manufacturer. And have bugs from revision to revision. So, that is like learning stuff over and over :(

However, what the peripheral are supposed to DO is stable. Once you understand that, you will be very good at ferreting out the small differences and your life gets some easier.

> If anyone has any resources to job sites or further articles I'd love to read them.

You can read a lot, but doing is better. Go get a Nordic BLE development kit for $39 and build something.

http://www.digikey.com/product-detail/en/nordic-semiconducto...


Thanks, this is by far the most comprehensive reply I've ever had on HN. I appreciate you taking the time, I'll look into your suggestions. I actually have an atmega 8515 but I've been a little distracted learning the finer (if that's the right word) points of c++ recently. Do you think the 8515 is "appropriate" ?


I'm not the original poster, but starting with bare MCU might be too tricky at first. A better approach imho would be to get some of many available kits (i.e. Beaglebone) and try to make it live from scratch: don't use prepared bootloader/OS, but try to build everything yourself. Setting up a toolchain, grasping cross-compiling, downloading and debugging "doesn't start" kind of problems are the key and every day problems in embedded programing. When you are familiar with the build-download-debug process, get into peripherals: GPIO, UART, SPI, memory interfaces etc. After this, you should have enough experience and spent enough time on reading obscure datasheets to start designing your first, simple boards. Sure, you also need some electronics knowledge to read, understand and layout schematics, but provided you don't get into RF or analog electronics, it's not too difficult.


I'm going to disagree here about the Beaglebone (or RPi) first. There is a LOT of complexity in those that simply isn't in the Cortex-M (especially M0) series. For example, try to figure out what the maximum bit banging rate is on a Beaglebone Black. Good luck.

Getting up and running on an Eclipse/ARM/Segger toolchain sucks. No doubt. But you have to chew through it.

After that, it's about 3 lines to get an LED to blink.


I don't think there's a general answer.

From what I could see pay was on par with most enterprise development. Well above basic web stuff though.

Like the other poster said, if all you do is code, then your position is pretty weak. I have an EE background and I'm pretty good with mechanical things, so I could debug the electronics and figure out issues with mechanisms, etc. (most of my programming career has involved writing code for things that move) My skillset pretty much goes from low-level assembly code up to writing simple web apps, so I was generally able to handle anything thrown at me. I'm also pretty good with creating documentation, understanding the business and built up my domain knowledge at any company I've worked for.

In this field, the more you know, the more valuable you become because it tends to require a wide range of expertise. IME, the people who have done well are always generalists.


Yes my impression was EE would be extremely valuable, thanks for the input, no idea why HN is downvoting this reply.


Maybe not the hardest part, but the most consistently annoying is that you are dealing with really low level issues, and you can't take anything for granted.

Memory allocation, compilers, threading, libraries etc. - you have to be careful at each and every step.

You're probably going to end up working with C or C++ - which is fine, and they are common enough, but you don't get a lot of nice 'VM-ish' things along with them.

You end up often having to know specific things about chipsets etc..

It's great because you 'learn a lot' and will come out a better Engineer - but sometimes that knowledge can be arcane.

You'll spend a lot of time doing things that 'everyone else' in software takes for granted and it can make it feel like you're moving slowly.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: