I am in the process of redesigning our OVSD environment. I have two blades and a 100GB LUN (for DB and Attachments). I have and idea of what it is I plan on doing, I'm just curious to see what others would recomend as far as setup?
My Question is this: If one were setting up multiple servers how would they then set up the CMDB so if one server died the OVSD service could still be used? OVSD doesn't support clustering (I was informed, and I'm NO DBA... :)
We had the pleasure of benefitting from the automatic load balancing/failover feature of OVSD today. This feature enhances availability and performance at the level of the application server. We have two physical servers, each with two instances of the application server. When one of the servers had to be rebooted (and failed), connection to the instances on the other server was completely transparent and automatic.
When you refer to the CMDB, I presume you mean the OVSD database. We have one Oracle instance which is run on a logical partition in a UNIX server. The physical database is itself on an HDS disk server. Should the logical partition of the database server fail, we have a cold standby partition on a different physical server which automatically takes over.
The question to which I do not have the answer is whether the application server(s) know how to automatically reconnect to the database if the database manager fails and is then restarted on an other cluster.
Multiple server instances are as simple as copying server_settings.xml and changing a few parameters, I understand (instance 1 takes care of userA instance 2 serves userB based on weight) Do you see much benefit doing this? I have only tested and have not had the opp to see what it runs like in prod... (pulling teeth to make necessary changes is no fun :)
What config needed to be done on the client side so that if serverA fails it hits serverB without users having to do anything?
CMDB=OVSD Db yes. I'm fuzzy on how to get both servers to talk to the DB at the same time without causing problems or is at as easy as pointing serverA and B to the DB during setup?
as I understand the first time a client connects to an Openview server it gets the details of all the available OVSD application servers. So if one application server is available the client connection should default to the next available one. If a new server has been introduced since the client first connected I think that their local "srv.dat" file needs to be updated /recreated with the new server details.
The local "user_settings.xml" has a value as follows
As long as the second value is set to false (default I think) the client should fail over to one of the other servers listed in the srv.dat file provided that one is available.
We are not as advanced as Josiah (Congrats on being promoted). We have a database that is backed up nightly. So if the production database fails we have to restore the previous nights backup. There is potential for data loss but the business here accepts that.
The other option being set up at the moment is to have another database which replicates from the original. If the original one fails we change the settings on the OVSD App Servers to point to the backup database. What is being tested is replicating the backup to the production and switching the settings again (this is still in development)
As you know there are two components to OVSD that need to be considered. The App server is easy as it natively supports load balancing and seemless client failover as discussed by those responses above.
The harder piece is making the DB highly available. If you want rapid failover, this is usually done by clustering.
My approach would be to cluster the two blades and setup the DB & FTP servers in one cluster package.
I would then run the SD App on each node and point SD database connection to the cluster alias of the DB server.
I am assuming you are using windows but this is just as applicable to unix.
We are actually using the twin App server and clustered Unix DB server approach as described by Josiah.
A lower cost option would be to use log shipping or backup/restore approach but it would mean recovery takes significantly longer and would require manual steps to be carried out. This is how we plan to handle DR situations where we loose the whole data centre.
Both Oracle and SQL server can be run on a cluster in an active/passive configuration. Failover should usually be achieved in 1-5 minutes depending on the size of your environment. It depends on your availablity requirements as to which way you go.
We have the database on a WIndows IA64 SQL Cluster. Users did not notice the Node failover the other day and I could see a SQL connection reset in the server log. Unfortunately Rules stopped firing but there is some evidence that they had stopped prior to the Node failover but we can't be sure. Other failovers have not resulted in any issues.