FT-450 Hamlib Timeout Glitch
I control my FT-450D using Hamlib. This worked fine if only a single program was connected to the rig. To be able to connect multiple programs at the same time, I wanted to run the rigctld
server as a service and connect the programs using the model #2 (“NET rigctl”). This setup works fine, but every now and then the connection to the FT-450D stopped working. Here is how I found out what’s going on and how I fixed it.
Analysis
The problem only occurred when WSJT-X or Fldigi where running in parallel with CQRLog. To find out more about the root cause I started rigctld
with tracing and timestampts enabled:
rigctld -m 127 -Z -vvvvv
This showed that every now and then there was no answer from the FT-450 after switching off the PTT:
2019-05-13:18:57:16.688467: client lock engaged
2019-05-13:18:57:16.688555: rig_strvfo called
2019-05-13:18:57:16.688578: rigctl(d): T 'currVFO' '0' '' ''
2019-05-13:18:57:16.688614: rig_set_ptt called
2019-05-13:18:57:16.688637: newcat_valid_command called
2019-05-13:18:57:16.688656: newcat_valid_command TX
2019-05-13:18:57:16.688680: newcat_set_ptt: cmd_str = TX0;
2019-05-13:18:57:16.688701: serial_flush called
2019-05-13:18:57:16.688737: cmd_str = TX0;
2019-05-13:18:57:16.688758: write_block called
2019-05-13:18:57:16.693892: write_block(): TX 4 bytes
2019-05-13:18:57:16.693992: 0000 54 58 30 3b TX0;
2019-05-13:18:57:16.694020: cmd_str = ID;
2019-05-13:18:57:16.694045: write_block called
2019-05-13:18:57:16.699202: write_block(): TX 3 bytes
2019-05-13:18:57:16.699302: 0000 49 44 3b ID;
2019-05-13:18:57:16.699327: read_string called
2019-05-13:18:57:18.701045: read_string(): Timed out 2.1690 seconds after 0 chars
2019-05-13:18:57:18.701105: serial_flush called
2019-05-13:18:57:18.701136: cmd_str = TX0;
2019-05-13:18:57:18.701147: write_block called
2019-05-13:18:57:18.706296: write_block(): TX 4 bytes
2019-05-13:18:57:18.706385: 0000 54 58 30 3b TX0;
2019-05-13:18:57:18.706398: cmd_str = ID;
2019-05-13:18:57:18.706410: write_block called
2019-05-13:18:57:18.711580: write_block(): TX 3 bytes
2019-05-13:18:57:18.711641: 0000 49 44 3b ID;
2019-05-13:18:57:18.711656: read_string called
2019-05-13:18:57:18.711709: read_string(): RX 7 characters
2019-05-13:18:57:18.711724: 0000 49 44 30 32 34 34 3b ID0244;
2019-05-13:18:57:18.711734: newcat_set_cmd: read count = 7, ret_data = ID0244;
2019-05-13:18:57:18.711747: client lock disengaged
The default timeout in Hamlib is 2 seconds when reading from the serial line. If there is no answer within this 2 seconds, Hamlib sends the command again. As you can see in the trace, the second attempt is successful. Unfortunately, WSJT-X is a bit impatient and stops the communication with an error message after 2 seconds. You have to restart the communication by manually clicking into the error dialog.
First Try: Let’s use the big gun
Being a software engineer, I’m a bit prone to over-engineering. So I decided instead of trying to fix the problem with Hamlib itself, I will build a proxy that gives me full control over the communication between my programs and Hamlib. Sounded like a lot fun with network programming and all that. It was a lot of fun indeed, except that it did not fix my problem at all.
Solution
After scratching the network-programming-itch I stepped back and did what a good engineer should do: look at all the facts including the Hamlib documentation and use my brain. It is possible to tweak the communication parameters in rigctld
, like for example the timeout for the serial line. And this was the solution: just set the timeout on the serial line below the timeout of WSJT-X:
rigctld -m 127 -C timeout=500
Since Hamlib retries to send the command and the second attempt is always successful, WSJT-X will not notice the glitch and stay happy.
That was easy.