Your comments

That makes sense - Thank you

I would consider it resolved for our case. Thanks for your help

We have been testing this for the last few weeks and haven't seen the issue arise again

Hi Vladimir,

We have now split the server in two; one controlling just the KNX, the other controlling everything else.

With the KNX server using the version you sent me above, we haven't seen this issue in quite a while.

we are also seeing a problem is that the when the driver goes offline, sometimes it stays offline line until the server is restarted

Hi

We've run the server without a script and it fails in the same way 

here's screenshot of Wireshark showing the disconnect

Just a thought, is it possible to run one instance of KNX IP interface on a different machine (maybe one of your embedded servers), but run our main server on the PC

That way we would isolate the real-time aspects of the KNX Tunnel protocol from other aspects of the iridium server

When the fault happens, it then repeats every minute or so, and is always triggered by an internal write on the KNX bus that the Iridium driver takes too long to acknowledge

When the server is in this state, then it doesn't recover - we have to restart the server

That is the ONLY thing that we do - restart the Iridium server

Everything else remains untouched

So it does point to fact that something has gone wrong in the server that is causing this issue


I dont believe there is some other PC service that is causing this issue, as it is resolved immediately on restarting the iridium  server - it seems it is a state the Iridium server gets in after being running for a while

If the KNX IP device sends a request disconnect, then the Iridium server MUST handle it as the Tunnel is no longer valid

Currently you carry on trying (for 40-50 seconds) during which time all reads / writes fail


In this case, you can reconnect immediately 

Polling all devices doesn't cause the problem

Its not that a device isn't responding

The problem is that the Iridium server is slow in sending an ACK to the KNX bus when there is a write transaction between 2 devices on the bus



Are there any advanced diagnostics that can be retrieved from your driver to help diagnose what is going wrong ?


Extract from the KNX/IP Tunnelling specification

The BAOS IP device is behaving exactly as expected

Looking at this in more detail, I believe the following is happening....

Line 901 - KNX bus does a write, which should be acknowledged by tunnel connection 

Line 1023 - 1 second later KNX bus repeats this (as unacknowledged  by tunnel) 

Line 1102 - a further 1 second later (2 seconds after original write) the Tunnel connection issues a Disconnection request - since the internal write has been unacknowledged

Lines 6590 / 6591 - a further 200ms later the iridium server responds with ACK for both the transmissions 

The Iridium server then doesn't respond to the tunnel disconnection request for 30 seconds

So there are 2 problems

1. The Iridium Server is slow to respond sometimes, resulting in tunnel disconnection. 

** NOTE** the problem isn't with the KNX bus - its with the Iridium KNX driver

The fault occurs when the Tunnel connection reports an internal write on the bus, and the IP device is waiting for an ACK from the Iridium server - which is delayed

This problem repeats every minute (when there is an internal write on the bus)

If I restart the Iridium Server, then the behaviour goes away to 1-2 days and then re-occurs - so its clearly a fault in the Iridium KNX driver

2. The Iridium Server doesn't honour the KNX Disconnection request - it should handle this and open a new tunnel connection if it fully implements the KNX/IP protocol