KNX IP Router Driver polling
I've got a server project that is using the KNX IP Router driver for lights, climate, blinds and invixium controls. We were seeing a problem where the driver seemed to be continually flooding the KNX bus with status requests, leading to severely delayed control.
I noticed the 'Delay between polls' property set to 100 - Changing this to 0 seemed to do the trick and disabled polling (the desired behaviour). This is giving a much more reliable experience, but every 30-60 minutes I see the severely delayed control again - and when I look at the iRidium logs, the KNX driver appears to be flooding the KNX bus with status requests again.
Does setting 'Delay between polls' to 0 disable polling? If not, how can I disable all polls except that at the server launch?
Customer support service by UserEcho
I've set the debug level on the KNX Driver to 'Debug', and noticed the following lines in the log before it looked like a poll started. I see this every 60 seconds.
[15-11-2019 09:15:29.455] DEBUG KNX KNXnetIP(4): S/N: 00c501016d7c
[15-11-2019 09:15:29.434] DEBUG KNX KNXnetIP(4): Connect OK
[15-11-2019 09:15:29.299] DEBUG KNX KNXnetIP(4): StartConnect()
[15-11-2019 09:15:29.277] ERROR KNX KNXnetIP(4): The KNXnet/IP server device could not find an active data connection with the given ID, Offline
I don't believe that the KNX connection cannot be made this frequently - what reasons could there be for this seeming connection timeout?
Running a continuous ping on the KNX Router, I don't see Any lost packets.
Edit: The server was rebooted, and I haven't seen the KNX driver disconnect. So it looks like this issue is also related to the server running for an extended amount of time.
KNX - event driver. DelayBetweenPolls is intended only to determine the polling interval at the start of the project or connecting the KNX driver, if it was disabled during the work and then turned on again. If set to 0, requests go without delay. Latency can be useful if data is lost. They can get lost if you have a lot of addresses in the project. This partially corrects the delay between polls at startup. This delay is defined in DelayBetweenPolls (in MS). The SendTime parameter is used to delay before sending a command. If you have a constant polling addresses, you need to find out the reason. How many feedbacks in the project? How often does the KNX driver disconnect from the hardware?
So to confirm, after the KNX driver is connected, DelayBetweenPolls is the delay until the poll starts? And this poll would only run once - correct?
The project has about 2800 feedbacks. When the server had been running overnight, the driver was disconnecting every minute (equal to the PingTime). After the server was restarted about half an hour ago, I haven't seen the driver disconnect yet.
What would you recommend as values for the delay and send time? Ideally we want as small delay as possible before sending a command.
1) Yes, this is a delay before polling the next address. All addresses in Feedback that have the "Read on start" option enabled are polled. It is recommended to enter DelayBetweenPolls from 100 MS with the number of addresses more than 100 PCs - to reduce the load on the bus. When polling more than 1000 addresses, the optimal delay is 500 MS.
2) In your case, it is necessary to find out why the driver is disconnected. With each new connection, it polls all addresses again.
Ok, I've set DelayBetweenPolls to 500ms, and SendTime to 200ms. I will leave the server running for a few hours and see if the driver disconnects at any point.
With finding out why it disconnected, before I restarted the server this is the exact behaviour that was repeating;
1. KNX driver disconnects (The KNXnet/IP server device could not find an active data connection with the given ID, Offline)
2. KNX driver is able to connect and start immediately following this
3. Polling of the status feedbacks is started - I see the responses from this for anywhere between 10-40 seconds
4. No statuses are received for the remainder of the time, until 60s since the KNX driver connected.
5. KNX driver disconnects (and repeat the above steps).
The entire time this was happening I was able to ping the KNX device with no lost packets, and didn't lose connection in ETS.
Does the KNX driver work in a server project or a panel project? What is the iRidium server/client version? How many iRidium panels/servers are connected simultaneously to KNX? Iridium client/server is on the same subnet as KNX IP Router?
The KNX driver is working on the server project.
Panel project is synchronised to the server, and the clients send commands to the KNX with
IR.GetDevice("iRidium Server").Set("KNX IP Router." + knx_send, value);
Editor version is 22.214.171.12400.
Only one server is running, so there is only one connection to KNX at a time.
In my testing there's only been 1 or 2 clients connected to the server at a time, but this site could have upwards or 5 clients connected simultaneously.
Yes - all on the same subnet.
Make sure that the server version is up-to-date (1.3.10 at the time of posting this message). It is necessary to collect statistics: how often the kNX driver on the server goes offline, what preceded it, etc.
I downloaded the latest version earlier this week (iridium_pro_setup_126.96.36.1992), and installed it, but iRidium studio was still reporting version 1.3.8.
Running the 1.3.10 installer again, it's telling me that v1.3.10 is already installed?
At the moment, 2 versions of the Studio are available: the old (1.3.8) and the new. The old one is no longer updated and bugs in it are not fixed, but you can use it if it is convenient for you. Apparently you are installing a package that contains the current server and client versions, but the old Studio version. Also available is a package that includes a new Studio version (32 and 64 bit) and current client and server versions. You can see the server version in the web interface, in the upper left corner.
Ah I see what you mean - server build is 188.8.131.5280.
For now, I will be sticking with the old version of studio, as I understand that the structure of saved projects is changed completely, and I am using a few of my own scripts to parse the old structure, allowing multiple people to simultaneously work on the same project.
The server has been running for an hour, and control via KNX seems a lot more stable. In the server logs, the KNX failed to connect once around 20 minutes ago - so will keep an eye on it for today. Thanks for your help so far
Your question is still left open, let us know if there will be problems with the server again.
The server was running perfectly for close to 24 hours, but from Saturday evening we were seeing the same problems where the KNX driver was regularly disconnecting, preventing control of any devices through KNX from iRidium.
I'm setting up a KNX address that changes value every 30 seconds, to which the iRidium server responds when noticing this change by setting unix time to a 2nd new address - essentially acting as a ping to detect when these problems start happening, but would appreciate any more advice you might have
Is the Host parameter an IP address or a domain name?
Does KNX IP Router have a static IP address or a dynamic one?
KNX IP Router has an IP that was assigned via DHCP, but has been reserved via MAC.
Host is an IP address
There is no parameter in the KNX driver that could cause it to be disabled. If the device is unavailable for a few seconds, the iridium is turned off. On KNX IP Router how many simultaneous connections are allowed? KNX IP Router handles connection breaks with it?
We are using a Weinzierl KNX IP BAOS 777, which allows up to 8 simultaneous connections.I'm not sure how the KNX Router handles connections breaks.
Are you saying that if the device is unavailable for a few seconds, the driver automatically disconnects? If this is the case, why does it not attempt to re-connect until its tries to send its next ping?
The next time we see this issue (likely tomorrow morning, I will restart the KNX device instead of the iRidium server, see what happens.
The driver in iRidium disconnects when the connection is disconnected and tries to connect again. However, if the connection is successful again, the KNX address polling starts. You've got going on. Does anyone else connect to BAOS at this time?
Ok, thanks for the clarification. Nothing else connects to the BAOS at this time.
I tried running the server using the new iRidium 2019 server (without saving the server project through the new editor), and we are at almost a day without seeing this problem!
Will keep you updated, but fingers crossed the new server has a fix.
On arriving to the property today, the same problem was seen again. KNX Driver disconnecting and re-connecting every minute, with around 30s of polling going through correctly before we see no more activity from the KNX driver.
We will try restarting the BAOS later this morning without restarting the server, to see what effect that has.
BAOS was restarted, with no change to the issue we are seeing.
Just realised my colleague was running a server that was converted to using the new iRidium studio - and in the conversion process, the 'delayBetweenPolls' property was lost, and set to 0. I've now set this back to 500, but no change to the problem was seen.
Check the BAOS version and the firmware version on it.
Start from another device in the same subnet infinite ping to iRidium server and another infinite ping to BAOS. If an error occurs, stop both pings and see if there are losses to the iRidium server and losses to the BAOS. At the moment, a strong suspicion of hardware problems (cable, port on the switch, etc.).
We're gathering some network logs using traceviewer. Will get back to you tomorrow.
Apologies for the delay
To clarify, what is happening is that after the server has been running for a couple of days, its gets into a state where it disconnects and makes a new tunnel connection every minute - obviously in this state the system is completely unusable
Here's what the Server console reports
And here's a Wireshark capture
Use this filter
PC Server IP. 192.168.14.25
KNX IP interface 192.168.14.111
Line 14627 - new tunnel connection setup
data requests are then OK until line 16016
here the Iridium server writes to 10/6/0, and the bus is slow to respond, so it retries
but the IP interface sends a disconnect request (line 16224)
followed by ACKs for the writes
The server then carries on sending commands, but gets errors, until line 27442, when it checks the connection state and opens a new tunnel
This sequence then repeats (different group address that is retried each time)
So it seems its the IP interface that requests the disconnection - but I don't understand why
and the server then doesn't honour this and tries to carry on (with multiple failures) for approx 30 seconds, until it opens a new tunnel
Thank you for the detailed analysis. As far as we know, iRidium does not process the BAOS disconnect request in any way. We will clarify this issue separately and let you know if we are wrong in understanding the driver. However, even handling the BAOS disconnect request by the driver will not solve your problem. Because after disconnecting, the connection will be reconnected again and this will again poll all addresses. In your case, you need to check the network status: connect BAOS directly to the iRidium server, replace the cable, exclude other packets from the traffic. Another solution is to use multiple iRidium servers, each of which will have disjoint KNX address groups. Then disconnecting and reconnecting one of them will not cause the entire bus to fail, only those addresses that are in the server project will be polled.
Looking at this in more detail, I believe the following is happening....
Line 901 - KNX bus does a write, which should be acknowledged by tunnel connection
Line 1023 - 1 second later KNX bus repeats this (as unacknowledged by tunnel)
Line 1102 - a further 1 second later (2 seconds after original write) the Tunnel connection issues a Disconnection request - since the internal write has been unacknowledged
Lines 6590 / 6591 - a further 200ms later the iridium server responds with ACK for both the transmissions
The Iridium server then doesn't respond to the tunnel disconnection request for 30 seconds
So there are 2 problems
1. The Iridium Server is slow to respond sometimes, resulting in tunnel disconnection.
** NOTE** the problem isn't with the KNX bus - its with the Iridium KNX driver
The fault occurs when the Tunnel connection reports an internal write on the bus, and the IP device is waiting for an ACK from the Iridium server - which is delayed
This problem repeats every minute (when there is an internal write on the bus)
If I restart the Iridium Server, then the behaviour goes away to 1-2 days and then re-occurs - so its clearly a fault in the Iridium KNX driver
2. The Iridium Server doesn't honour the KNX Disconnection request - it should handle this and open a new tunnel connection if it fully implements the KNX/IP protocol
Extract from the KNX/IP Tunnelling specification
The BAOS IP device is behaving exactly as expected
Drivers do not have priority over other server operations. Is there at the moment, the scripts and schema in the project? If the cause of the error is in the driver, the project without scripts and schemas will repeat the same error after the same time.
On request disable: how do you think this should be handled by the driver? To Do Disconnect? Then work with the KNX bus stop. After Disconnect, do Connect again? After what time? Reconnecting again will poll all addresses, i.e. your question will not be solved.
When the fault happens, it then repeats every minute or so, and is always triggered by an internal write on the KNX bus that the Iridium driver takes too long to acknowledge
When the server is in this state, then it doesn't recover - we have to restart the server
That is the ONLY thing that we do - restart the Iridium server
Everything else remains untouched
So it does point to fact that something has gone wrong in the server that is causing this issue
I dont believe there is some other PC service that is causing this issue, as it is resolved immediately on restarting the iridium server - it seems it is a state the Iridium server gets in after being running for a while
If the KNX IP device sends a request disconnect, then the Iridium server MUST handle it as the Tunnel is no longer valid
Currently you carry on trying (for 40-50 seconds) during which time all reads / writes fail
In this case, you can reconnect immediately
Polling all devices doesn't cause the problem
Its not that a device isn't responding
The problem is that the Iridium server is slow in sending an ACK to the KNX bus when there is a write transaction between 2 devices on the bus
Are there any advanced diagnostics that can be retrieved from your driver to help diagnose what is going wrong ?
Just a thought, is it possible to run one instance of KNX IP interface on a different machine (maybe one of your embedded servers), but run our main server on the PC
That way we would isolate the real-time aspects of the KNX Tunnel protocol from other aspects of the iridium server
We will examine whether the driver can handle a device disconnect request. It makes sense for you to find out the reason for the performance decline. To do this, upload the project to the server without diagrams and scripts and check whether the error will be repeated.
We've run the server without a script and it fails in the same way
here's screenshot of Wireshark showing the disconnect
we are also seeing a problem is that the when the driver goes offline, sometimes it stays offline line until the server is restarted
Can you attach a link to the documentation for your KNX device?
The archive contains a test version of the server for Windows. Added DisconnectReq processing. Check that this version works with your KNX device. Build only for tests, not for industrial use.
Any news with the new build?
We have now split the server in two; one controlling just the KNX, the other controlling everything else.
With the KNX server using the version you sent me above, we haven't seen this issue in quite a while.
Do you need time for tests?
We have been testing this for the last few weeks and haven't seen the issue arise again
Do we need to continue to investigate this issue? Or do we consider it resolved in part of your case?
I would consider it resolved for our case. Thanks for your help
Happy to help.