I have a binding component that I would like to operate in an active/passing manner. The binding component actively creates it's own connection to another system, so having more than one running is not desirable. When deployed into a petals cluster I would like only one node at any one time to actually have the component started and running. If the node running the component should stop then another node in the cluster would then start the component.
When you says the connection is broken, does it means the BC component is down? the container hosting the component is down?
Different approach are possible according where the detection of the connection failure is done...
If you want to handle a server crash, my idea is to create an agent ( a script) that ping the PEtALS nodes.
This agent have an algorithm that elect a pinged node as the host of your component, then trigger the installation with proper ant tasks.
When the ping failed, the agent reelect a new node to install the component.
The PEtALS domain is configured with a distributed JNDI.
The first time the BC component is started, it creates an identifier set it in the JNDI and persist it in its installation directory.
If the container that has crashed is restarted, the BC component is recovered, the BC component checks if its identifier is the JNDI context. If this identifier is not its own but another one, it means that another component has been started in another container, so it doesn't start the connection.
The agent still ping the nodes and elect the restarted node as a "pinged node". It knows that a BC component has been previously installed on it but is not active; it uninstalls it with the proper ant tasks.
The problem on that if is that if there is network problem between the agent and the remote PEtALS nodes, the agent can assume a crash on a PEtALS node although the node is still running.
The splitting of nodes from a cluster is always a difficult issue to solve. In my scenario I would have to protect against duplicates coming in from the binding components within my service chain.
I like your idea of using an external monitor but I'm trying to avoid a single point of failure. It would just seem to move the problem into having an active/passive monitor
I had started to look at using JGroups and using the "coordinator" node to determine which binding component should be the active one within the cluster. Then I thought this could be something the kernel might support, hence my post.