[chirp_devel] How to brick an FT-60

Fri Mar 28 16:42:50 PDT 2014

On Mar 25, 2014, at 7:48 PM, Dan Smith - dsmith at danplanet.com wrote:
>> Chirp does not modify the checksum in the .img file after editing
>> and saving. The checksum is recomputed on the fly on upload, so it's
>> not like the radio will see a bad checksum as a result,
> 
> Unless of course, the cable is bad... :)

Good point, and I'd overlooked that, even having ordered a new cable.
But read below; the symptoms have gotten way stranger than that.

> If it _was_ right out of the radio, then that tells me that the data got
> corrupted between the radio and the computer (i.e. in the cable, USB
> adapter, etc). Sounds like we should make CHIRP check the checksum after
> a clone for at least that radio.
> 
> ...or the cable is silently corrupting data on the way in and this is
> the first time it has corrupted something that mattered.
> 
> If I were you, I'd modify the driver to check the checksum after
> download, and then do a bunch of downloads of a good radio with the
> original cable and see if they occasionally don't match.

Excellent idea, and I will do that. It will be awhile until there's any information
on that since I don't have an FT-60 at the moment. No word yet from Yaesu
wrt the radio I shipped them Monday. 

I might also implement an optional check for a couple of bits in the image
being set. See below.

============
I've continued to look at the "bad checksum" files in my collection,
and there's a very interesting pattern. No conclusion yet, but quite
a bit of information:

- Of the 21 image files with bad checksums (out of 248), 16 were
created in one consecutive period spanning Feb 7-8. There were no
images downloaded in that period that do not have bad checksums.
The image that bricked my two radios on March 22 is the 10th in
that sequence of 16.

- The other 5 images with bad checksums do look like files I had
edited, and that's more in line with the number of such I remember.

- Of those 16 files, all 16 have the same difference between computed
and actual checksum, 0x30.

- Examining the diffs between the first 'bad' file and the 'good' file
that immediately preceeded it, there are a few differences I recognize
as related to the radio state I was examining. There is one difference
that I do not: The two bits 0x30 in byte 56 are set. I've pretty well
mapped all the feature settings and a lot of nonvolitile radio operating
state; these two bits are still "unknown" in my map, I hadn't seen them
set before this.

- These two bits correspond to the checksum discrepancy. As if the
radio computed the checksum thinking they were 0.

- None of the other 232 image files in my collection have either of
these bits set.

I don't know what happened between 15:32 and 15:53 on Feb 7 that caused
this. Looking at the file names involved, I don't believe I did an upload,
but I can't be positive of that almost two months later.

Similarly, I don't know what happened between 17:40 and 20:22 on Feb 8
to clear this up. My calendar and email history offer no clues.
Break for dinner is probably all. Diffing the last 'bad' file and the first 'good'
file at 20:22, the only differences are the couple of expected bits for the
settings I was mapping, and the "bits of death" are cleared. So I didn't
upload an operational image for some reason, such as to use the radio.

==========
I'm struggling to fit the observations into a sensible model of
how this happens. I'm willing to believe a bad cable caused this somehow,
maybe injecting the two spurious bits on an upload and also miraculously
making the checksum match so the radio accepted it one time. I want to
say I don't recall any uploads failing, which statistically you'd expect
some of if this is your scenario, but there might have been a couple I
took as operator error and just retried.

I have much more trouble seeing how a faulty download causes this.
A flakey serial interface does not corrupt only two consecutive bits,
in the same position, out of ~229,000 bits in a serial stream, 16 times in a row.

I'm starting to think an internal glitch on the first radio isn't out of the
question as the root cause, now that there's a (still very fuzzy) plausible
model for how it resulted in killing the second radio.

For the radio to return these two bits in the same position on every
one of 16 consecutive downloads, I expect they're actually stored in
the flash. If I were designing this, I can see a few possible models
for how the checksum is computed and used:

1) Don't use it internally (!), just compute it on the fly when doing
a clone write (chirp's download). But then it wouldn't be repeatably
wrong on chirp downloads the way it is.

2) Check it internally at boot, and update it on every power down
by summing all of flash. But then a bad checksum won't survive power-off.

3) Check it internally at boot, but never actually sum memory on write,
except maybe on a clone Rx.
Update the checksum on every byte write as a difference.
I.e.: Read old byte X, subtract from new byte X, write new byte X,
read old checksum, add (newX - oldX), write new checksum.
Not as robust, but saves reading all 32KB every power-off.
But again, the error we're seeing would cause a bad checksum
error on the next boot, wouldn't it?

4) Compute it as in (3) but don't check it, use it only to send along
with the data on a clone write. But we have to read all the flash
anyway to send it out the serial port, so why not compute it then,
instead of complicating every eeprom byte write?
I reject this as too brain-damaged to be real.

So I haven't yet hit on a model for the radio's use of the checksum
that explains the observations. Any ideas?

I also note that the radio booted up at least 15 times Feb 7-8 with
these two bits apparently set. (Or 31 of you caunt the clone mode
power-on to do the downloads).

Why didn't it brick at that time? Why wait until they were uploaded as
set, then power off/on, on Feb 22?  What's the difference?

-dan