The Coding Flow

Code Clean Or Die Tryin'

FT-450 Hamlib Timeout Glitch

I control my FT-450D using Hamlib. This worked fine if only a single program was connected to the rig. To be able to connect multiple programs at the same time, I wanted to run the rigctld server as a service and connect the programs using the model #2 (“NET rigctl”). This setup works fine, but every now and then the connection to the FT-450D stopped working. Here is how I found out what’s going on and how I fixed it.

Analysis

The problem only occurred when WSJT-X or Fldigi where running in parallel with CQRLog. To find out more about the root cause I started rigctld with tracing and timestampts enabled:

rigctld -m 127 -Z -vvvvv

This showed that every now and then there was no answer from the FT-450 after switching off the PTT:

2019-05-13:18:57:16.688467: client lock engaged
2019-05-13:18:57:16.688555: rig_strvfo called
2019-05-13:18:57:16.688578: rigctl(d): T 'currVFO' '0' '' ''
2019-05-13:18:57:16.688614: rig_set_ptt called
2019-05-13:18:57:16.688637: newcat_valid_command called
2019-05-13:18:57:16.688656: newcat_valid_command TX
2019-05-13:18:57:16.688680: newcat_set_ptt: cmd_str = TX0;
2019-05-13:18:57:16.688701: serial_flush called
2019-05-13:18:57:16.688737: cmd_str = TX0;
2019-05-13:18:57:16.688758: write_block called
2019-05-13:18:57:16.693892: write_block(): TX 4 bytes
2019-05-13:18:57:16.693992: 0000    54 58 30 3b                                         TX0;
2019-05-13:18:57:16.694020: cmd_str = ID;
2019-05-13:18:57:16.694045: write_block called
2019-05-13:18:57:16.699202: write_block(): TX 3 bytes
2019-05-13:18:57:16.699302: 0000    49 44 3b                                            ID;
2019-05-13:18:57:16.699327: read_string called
2019-05-13:18:57:18.701045: read_string(): Timed out 2.1690 seconds after 0 chars
2019-05-13:18:57:18.701105: serial_flush called
2019-05-13:18:57:18.701136: cmd_str = TX0;
2019-05-13:18:57:18.701147: write_block called
2019-05-13:18:57:18.706296: write_block(): TX 4 bytes
2019-05-13:18:57:18.706385: 0000    54 58 30 3b                                         TX0;
2019-05-13:18:57:18.706398: cmd_str = ID;
2019-05-13:18:57:18.706410: write_block called
2019-05-13:18:57:18.711580: write_block(): TX 3 bytes
2019-05-13:18:57:18.711641: 0000    49 44 3b                                            ID;
2019-05-13:18:57:18.711656: read_string called
2019-05-13:18:57:18.711709: read_string(): RX 7 characters
2019-05-13:18:57:18.711724: 0000    49 44 30 32 34 34 3b                                ID0244;
2019-05-13:18:57:18.711734: newcat_set_cmd: read count = 7, ret_data = ID0244;
2019-05-13:18:57:18.711747: client lock disengaged

The default timeout in Hamlib is 2 seconds when reading from the serial line. If there is no answer within this 2 seconds, Hamlib sends the command again. As you can see in the trace, the second attempt is successful. Unfortunately, WSJT-X is a bit impatient and stops the communication with an error message after 2 seconds. You have to restart the communication by manually clicking into the error dialog.

First Try: Let’s use the big gun

Being a software engineer, I’m a bit prone to over-engineering. So I decided instead of trying to fix the problem with Hamlib itself, I will build a proxy that gives me full control over the communication between my programs and Hamlib. Sounded like a lot fun with network programming and all that. It was a lot of fun indeed, except that it did not fix my problem at all.

Solution

After scratching the network-programming-itch I stepped back and did what a good engineer should do: look at all the facts including the Hamlib documentation and use my brain. It is possible to tweak the communication parameters in rigctld, like for example the timeout for the serial line. And this was the solution: just set the timeout on the serial line below the timeout of WSJT-X:

rigctld -m 127 -C timeout=500

Since Hamlib retries to send the command and the second attempt is always successful, WSJT-X will not notice the glitch and stay happy.

That was easy.