[chirp_devel] [PATCH] [New Model] Support for the BTECH Mobile Radios, fixes issue #3015

Wed Mar 23 12:52:34 PDT 2016

> Yes, good to know about the performance issue. Also how can I measure
> that performance impact to compare for future improvements?

It's just a general thing: making lots of syscalls for very small bits
of data is inefficient, and it means you're more likely to be blocked on
the kernel during periods of high load (which you're contributing to).
It's far more expensive to make a syscall than a regular function call.
Also, with the tight timing you've got going on here, it could
definitely cause you to miss another deadline you're expecting to make
and manifest itself as instability under load and other varying conditions.

> We originally wrote the driver that way but we found that the driver was
> pretty unstable and not because of the driver it self but the radios
> having and not consistent delays on the answer to PC commands.

I'm not sure why this would be, but I didn't see your other code. In
general on a modern multitasking system, doing small reads and doing
your own timing like this is a recipe for instability.

> Most of the time we got reads with not the amount of data we must have
> and the driver failing, the other alternative was to use big
> time.sleep() here and there and we have then a long delayed reads and
> uploads with occasional resets of the radio.

Right, you can never depend on a read returning the whole thing you're
expecting for. That's why you may need a queue which always reads as
much as the radio has available, from which your code processes the
frames as soon as they are whole.

> And that non consistent delays (pauses between write data from radio to
> PC, mostly) are worst once we got different variants of the *same*
> radios to test (pre-production, alphas, 1G, 2G) not to mention other models.
> 
> The only way we found to manage that was the _rawrecv() procedure that
> you see now with precise time.sleep() in some places calculated from a
> timing profile for every radios to found the sweet spot for all of them.

You really should only need to slow yourself down for writes though.

>>> +def _do_magic(radio, status):
>>> +    """Try to put the radio in program mode and get the ident string
>>> +    it will make multiple tries"""
>> I think maybe renaming this at some point would be good.
> 
> From Aladdin story, this procedure manages the magic phrase to "open"
> the radios to programming mode.
> 
> Any suggestion?

Something like "start_clone_mode" or "initialize_radio" or similar would
be good.

> But by the way the _rawrec() works we must have the serial timeout to a
> minimum value, that's why have used a time.sleep() trick instead.
> 
> If we use direct serial reads from radio.pipe it can work with the
> serial timeout, but will require more testing and time.

I think that would be where we should head.

>> Yikes, five seconds? I guess it seems like we should be able to avoid
>> this with different model subclasses.
> 
> Believe it or not it don't work below 3 secs for ones and 5 secs for
> others, but at this point you don't know about who is who... so 5 secs.
> (this has a trick, keep reading)
>
> 
> Even so, there is a UV-5001 G2v2 (marked as G22 in the code, and yes,
> second revision of the second generation) that uses the magic from the
> Waccom Mini 8900 instead the former one for the other members of the
> UV-5001 designation.  So we have to test both magic words to open it,
> and the users is unaware of t
> he version of the radio.

Okay, but we could just separate them out in the model list. A user may
not know if they have a v1 or v2 radio, but if you call them out like
that, they can try both and then continue to use that one in the future.
Making people with radios that aren't first in the list wait many
multiples of 5 seconds is not reasonable, IMHO.

> It's like the radios has a serial buffer that timeout after 3/5 secs,
> when you send a wrong magic you have to wait for the serial buffer on
> the radio to timeout before sending the correct one, otherwise the radio
> will ignore the correct magic....

Yeah, I'm sure they never expected the radio to be used with something
like chirp. Really unfortunate, but I think we can (and must) do better
than this long wait.

> As a workaround we have arranged the order of the tries from the most
> used units first to the least used units last (read the comments at the
> end before the declaration of the specific models) to get a higher
> chance of get it on the first tries; just for the UV-5001 brand there is
> 5 radio variants in the market now.

Right, but that's really bad for the poor guy that has a radio late in
the list, and over time that list will be unmanageable and people will
be waiting around for minutes before they can start clone mode :)

> The radios has an uneven (not constant time) delays in the response of
> our queries starting to count from the end of the sent data, and even on
> the segments of the replies (ack, pause, header, pause, data...) even on
> the data stream the radio make pauses!
> 
> This has give us some big head aches, if you use serial timeouts then
> you get from time to time a shorter data block and the driver fails,
> that's no reliable, for one radio it _/could/_ work, but for 10 variants
> of the same radio structure the thing get nasty.
> 
> No matter how you tweak the serial timeouts or use time.sleep(), to much
> and the radio resets, to low and the radio give a short block, when you
> test different generations of the same radios and different models you
> get an very unstable driver, as all they have different time requirements.

Right, so the way you need to do this is construct a buffer in your
code, where you always fill the buffer up to a reasonable serial timeout
(like 100ms). From that buffer, you pull frames out when they're ready
(i.e. complete). That way, you insulate yourself from the timing
characteristics of the radio and your driver on top of this will be much
more resilient to changes in platform, USB drivers, etc.

> This way and with the few pre-calculated time.sleep() distributed on the
> _do_magic, _download & _upload (from statistics from real data for all
> the radio models/variants), we have a very stable driver for all models
> and with a single tune point for any needed time tweak; the constant in
> this line on the _rawrecv() procedure:
> 
> maxtime = amount * 0.020

Well, first thing this morning there was already a bug about timing issues:

  http://chirp.danplanet.com/issues/3507

> A resume after reading all this looking for typos: the _rawrecv()
> procedure implements it's own kind of "software serial timeout" that
> allow us to be correct and fast and works with all radio models from
> Btech and the Waccom Mini 8900; but for that to works the real serial
> timeout must be low enough to pass a single char.

That's just never going to produce reliable results though. Your control
of the timeout can be delayed by many seconds on a loaded system. The OS
is designed to provide you larger batches of data and expects you to do
process them in bulk wherever possible.

>>> +    # cleaning the serial buffer, try wrapped
>>> +    try:
>>> +        radio.pipe.flushInput()
>>> +    except:
>>> +        msg = "Error with a serial rx buffer flush at _do_ident"
>>> +        LOG.error(msg)
>> This should be a LOG.exception() right? Presumably this means something
>> very strange happened. Also, why do you think this is necessary?
> 
> The radio.pipe.flushInput() and the flushOutput() makes the test bed of
> Chirp fail unless it's try wrapped, note the end of the comment on the
> start of the block of code. "try wrapped".

Then we should make the tests mock that out, if needed. However:

> The radio can send garbage if you wait to long listening on the serial
> line, it's not the interface but the radio, I have serial logs to prove
> that it does it with the OEM software, from where I copied the trick of
> cleaning the serial buffer.

Sure, just do a read() until you get nothing back and then start. Not
only does flushInput() imply things about internal buffers, but it also
isn't supported in pyserial 3.0 and later. If you want to do this,
please don't wrap it so that it passes tests, but make the test stub
support it. However, I'd prefer you just read until you get nothing back.

See:

 https://github.com/pyserial/pyserial/blob/master/serial/serialposix.py#L565-L569

 http://linux.die.net/man/3/tcflush

which says:

  TCIFLUSH
    flushes data received but not read.

and

 https://msdn.microsoft.com/en-us/library/windows/desktop/aa363428(v=vs.85).aspx

Which says:

  Terminates all outstanding overlapped read operations and returns
immediately, even if the read operations have not been completed.
  Clears the output buffer (if the device driver has one).

The POSIX example is talking about the serial line discipline in the
kernel, and the win32 one appears to be talking about the actual buffer
on the device (if any). I don't think you should conflate these two, nor
do I think you need to. If you need to quiesce the line, I think you
should just read until you're confident it's quiet. However I'm *really*
suspecting that you're just not reading everything from the previous
block and interpreting it as garbage in front of the next one.

> If you don't make a periodic flushInput() before each block of reads you
> get from time to time garbage in the next read block, and that makes the
> driver fail.

That doesn't make any sense to me at all. I feel like something else
must be going on here, like you're not reading all the data from a
previous round. This is probably a big source of instability, actually.

If you're really getting random data, then it's got to be a bad cable or
something.

>>> +    # basic check for the ident
>>> +    if len(ident) != 49:
>>> +        msg = "Radio send a sort ident block, you need to increase maxtime."
>> This goes to the UI, right? What is maxtime? Does it refer to the value
>> in _rawrecv()?
> 
> Yes it's a debug message for the developer, not the user.
> 
> Remember, I code here in Cuba, send the code by email to Jim in USA, Jim
> test with the radios and fix what he can (help him from explicit debug
> messages like this), he then pass logs, code & comments to me bye email,
> I code... (loop)
> 
> This was very useful then, must be changed now, will be changed.

Okay, yeah, just want to get it cleaned up for the in-tree stuff. Users
will freak out when they seen this :)

> Good & sorry, I come from web development (ASP & PHP with auto learned
> skills) and now into Python, I must learn some tricks yet, thanks for
> guiding me with details like this. I will correct it.

Thanks :)

>>> +    @classmethod
>>> +    def get_prompts(cls):
>>> +        rp = chirp_common.RadioPrompts()
>>> +        rp.experimental = \
>>> +            ('This driver is experimental and for personal use only.\n'

>> You can't make this requirement of the user based on the license. Please
>> remove it.

> Right, will do it.

I already made the change so that today's build would not have it in it.

> No, this value is NOT mutable.
> 
> This radios has a REAL mem span of 0x4000 (you can safely read up to
> there, and write too) but in the read the OEM software get just up to
> 0x3200 (we get full span in chirp) and in the write just up to 0x3100.
> 
> We respect the write boundary and this driver will never write beyond
> 0x3100, see _upload() procedure comments

Okay, in previous models, there was a reserved region at the end which
appeared to be immutable, until we found an extra tool that let a vendor
customize what was in this last block...

> If the JetStream is a clone of this (or vice versa) get support for it
> is as fast as Jim can get it on it's hands to test it and we will work
> on that direction ASAP.

David Fannin has the radio already and will be looking into the changes
needed for this. I'm hoping he'll also be able to help with some of the
stability issues.

Thanks Pavel and Jim!

--Dan