Standby Servers Deliver (5/6/96)
by David Strom

Few things can be more troublesome to a network administrator than a server crash: servers lie at the core and center of our network universes, and small changes in their operations can have large effect on great numbers of end users. So improvements in reliable operations and "up time" can make a big difference in terms of how a network -- and its administrative staff -- are perceived by users and upper management.

Two products have been introduced in the past year that can help improve reliability: Standby Server 32 from Vinca Corp. and Lantegrity from Network Integrity. Vinca has versions for both OS/2 and NetWare at present with a version for NT promised by the end of this month. Lantegrity is just for NetWare right now. To put them through their paces we tested both products on the G.Neil Companies' network in Sunrise, Florida. G.Neil is a human resources direct marketing vendor, selling a wide range of products from office forms to motivational posters to plaques and video tapes. The company has seen a fairly steady growth in both its employees and revenues, and the IS shop is stretched thin dealing with a number of issues, including the ability to supply reliable server performance.

The company has an Ethernet network with a wide variety of servers on it, including a small FT250 Netframe and several other servers running NetWare, a Unisys mainframe and an IBM AS/400. The company runs the usual suite of office applications including Lotus cc:Mail and PowerBuilder, and has a mixed network of 200 Windows 3.1 PCs and Macintoshes.

G.Neil bought their Netframe on the promise of having better reliability, but found that it still wasn't running perfectly. "A server abend would knock us out for an entire day sometimes," said Chip DiComo, micosystems manager at the company. Abends can happen from a variety of factors: badly behaved NetWare loadable modules (NLMs), power problems, and heavy network traffic are all possible causes. They are the bane of most NetWare managers' existence: while the path to server recovery is well understood, it can be time-consuming, especially if you have to restore or repair large volumes of data. G.Neil's Netframe has 8 gigabytes of data, and at best can take a few hours to bring back on line.

This is potential downtime that DiComo was looking to avoid. The two products come at the solution to improving reliablity from very different perspectives: Vinca uses the concept of Novell's disk mirroring and extends it to a complete redundant or mirrored server. This server is connected both to the enterprise network and to its twin server via a separate communications link, and only comes on-line if the primary server fails. "All that Vinca is doing is just mirroring disks, it just so happens that they are in another box," said DiComo.

Network Integrity has another method which uses a spare NetWare 4.1 server to protect multiple servers. This server should not be used for anything else, since the Lantegrity software takes over complete control and runs it at close to 100 % utilization. NLM agents are loaded on each server to be protected: when these servers go down, the spare 4.1 server sends out broadcasts mimicking the failed servers so that users still think their server is on-line. For both products, once the protected server comes back on line, you switch back over manually.

Before you dive into this swamp that represents redundant file servers, realize what problem you are trying to solve. Are you dissatisfied with the level of reliability of your NetWare server itself? Is the problem the hardware or software configuration of the server? Do you have mission and time-critical applications running that will require constant availability of your file servers? Does it take you too long to recover from file server abends because of the amount of data storage? Then one of these products might be of help. But you'll need to spend a great deal of time in testing them out, and then training your staff on the proper procedure to implement their advanced features.

DiComo wanted to see if both products delivered on their promises, and we set them up in a test lab with several test servers, connected alternately to their own network and to the enterprise backbone.For StandbyServer, we used two HP machines: a VE pentium 75 and XU pentium 120, both with 1 gigabyte hard disks. For Lantegrity, we used an HP Netserver LC with two gigabytes of disk running NetWare 4.1, along with an HP tape autoloader 12000 SureStore. You need the tape autoloader since you store the protected files on tapes. We also had another NetWare server that was the one we were protecting under Lantegrity: this was another HP pentium with a small 168 megabyte disk. And we had several Windows 3.1 and 95 workstations -- Lantegrity requires one to run various administrative tasks, while StandbyServer is operated completely from the server's own console.

The Lantegrity server needs an extra 1 gigabyte hard disk and an extra 16 megabytes of RAM than the largest server you are protecting -- this is to handle the caching requirements. If you do the math, that means to protect G.Neil's Netframe would require a machine with 9 gigabytes of disk and 144 megabytes of RAM -- that comes out to about $15,000 worth of hardware, according to DiComo. "It still is alot cheaper than buying an identical Netframe -- although they don't sell that model anymore and we would have to shop around for a used one for something like $40,000. Indeed, just buying Novell's own System Fault Tolerance Level 3 software would be $18,000 for the software alone."

Our tests were relatively straightforward: we alternately pulled the power and network connections from the servers under protection while we copying files from a Windows workstation and observed what happened. G.Neil wanted more than just having their server up and running: they wanted their end users to keep their network connection and continue working. That turned out to be a more challenging situation.

Both products are a bear to install, and will require some calls to the vendor's support lines: part of the problem is that they are complex products that require a deep understanding of different portions of NetWare that aren't usually commonly known areas. With Vinca's StandbyServer, you'll need to understand disk and server mirroring concepts and be able to use the Novell commands to reconstruct the mirrored volumes in case of a problem. Think of the Vinca product as assembling two mirrored and duplexed disk drives -- the only difference from this tried and tested situation is that the drives happen to be housed in separate servers and are connected via two wires: their ordinary network cabling and a special high-speed (160 M bps) cable provided by Vinca. This means that for every server you wish to protect, you'll need to buy an additional standby server. Included in the package is a copy of runtime NetWare 3.12 that is installed on the standby machine.

Unlike Novell's duplexing and mirroring requirements, you don't need the same exact equipment for both servers: for example, a 3.12 server could stand in for a 4.1 server, and you don't need the same network adapter and disk controllers in both machines. You do need to ensure that the standby's hard disk is set up with the same volume structure as the protected server, however.

With Lantegrity, you'll need to have a solid understanding of Novell's Directory Services and trees, and be able to manipulate bindery objects inside the directory trees. Here the idea is to build a huge data repository, using a combination of more disk, a tape autoloader, and more RAM, to shadow several different servers across a single network connection. You'll also need to understand how the AUTOEXEC.NCF and STARTUP.NCF files work and where they are located: NetWare can load these either from the DOS startup partition or the NetWare \SYSTEM partition, and that needs to be sorted out before installing Lantegrity.

Lantegrity has some other caveats as well: for example, when the protected server is down, you can't rename directories -- the folks at G.Neil liked this feature, which could prevent the servers from synchronizing their file systems. And while it will protect the actual name spaces for OS/2 and Mac clients, it doesn't transparently provide the files themselves: meaning that G.Neil's Mac users will have to do another login to the Lantegrity server itself when the primary server fails.

Both NDS and mirroring skills were in short supply at G.Neil: they are just getting started with NetWare 4.1, and only had begun to get training on directory trees. They had never put together a mirrored server before, and needed to spend time learning how that was accomplished while we were setting up the Vinca software. This could be typical of many NetWare shops.

Luckily, technical support from both vendors was very forthcoming and helpful: Ron Keindl was able to get StandbyServer configured and Kelly Connor and I got Lantegrity up and running. In both cases, vendor representatives knew that Infoworld was calling them, so you might not get the same level of service. "However, I got lots of help from Vinca -- they were teaching me disk mirroring," said Keindl, a consultant in the microsystems department. Both manuals are fairly dense and will require careful reading to understand the various subtleties involved in setup. For example, Lantegrity requires a NetWare 4.1 server to be setup with some non-standard parameters, such as specifying NetWare file format when the server's volume is formatted. Connor, an analyst in the microsystems department, didn't see this caveat and had to format her volume a second time.

We had other slight hiccups along the way: we needed to download an update of StandbyServer from their BBS -- some syntax errors in the installation script that have since been corrected, and we needed server drivers for the Kingston/Atlantic NE2000-style adapters. During the Network Integrity installation, we had to make a run to the local computer superstore for some parts that should have been supplied in our evaluation (but normal customers would have purchased separately). Also vexing was a power outage right in the middle of the installation of NetWare 4.1 -- driving home how critical these products really are.

The test server we were running for the Lantegrity scenario was a real baby -- it only had a 168 megabyte hard disk and eight megabytes of RAM. Nevertheless, we found that even this was inadequate -- we had to bring up the memory to 12 megabytes before Lantegrity would work properly. This is because the NLM-based agents need lots of room to do their work.

We found out that they did work as intended: StandbyServer took about 20 seconds to switch automatically from a failed primary to the standby machine. Lantegrity took about 50 seconds to do the switch -- realizing that we were using an undersized server and didn't need to restore any files from tape, which would take longer.

With the Vinca product, we weren't able to maintain a connection under Windows 3.1, even after upgrading to the latest series of Virtual Loadable Module drivers (1.20). But under Windows95, running the Microsoft network client, we were able to keep connected while the StanbyServer switched over -- that was impressive.

We could not maintain a connection with either our Windows 3.1 or 95 client with Lantegrity during the standby operations. According to their technical support, we should have been able to do this if we were using Novell's 32 bit client on 95 and had configured our VLMs correctly. After pouring over the Novell manuals, we still couldn't get it working for Windows 3.1 -- we think Network Integrity should do more to document how this works for those customers like G.Neil that want to maintain their connections. One thing we found annoying was that the administrator's screen doesn't automatically refresh itself -- several times we started out to do something, only to realize that if had pressed F5 we would have seen the current status of the server. This will be added in a future upgrade, according to company representatives.

However, we found plenty of caveats on both products. For example: Vinca's product wouldn't work to protect G.Neil's Netframe because this server has its own proprietary hardware bus and can't make use of standard ISA or EISA adapters. Vinca sells its own MCA, ISA or EISA adapter to connect the mirrored servers: unfortunately, none of these cards will fit inside a Netframe. (On its NT and OS/2 products, Vinca uses a standard 100 megabit network adapter, making these products more flexible.) Network Integrity's product isn't all that useful for Macintosh usersas another example.

Vinca's product documentation doesn't mention anything about protecting print queues, but in theory it should work since you are just duplicating the queue on the standby machine. We didn't have time to test our theory, however. Network Integrity has all sorts of information about how to replicate the queues on its server.

One problem we had with StandbyServer is that there is no information about the condition of one critical link: the network connection of the standby server itself. The software monitors the network link of the primary server, as well as the proprietary link between the two servers -- if either of these go down, the software will automatically switch operations over from the primary to the standby machine. Vinca representatives said that several third-party products were available to monitor this link, although DiComo and his crew clearly weren't happy with the notion of having to look around and test yet another add-on.

Another issue for StandbyServer is that you need to keep a cool head when it comes time to restore operations. To bring the protected server back online, you need to type in a few commands at both the standby and protected server's consoles: the commands are slightly different. If you type the wrong one, then the servers won't synch up properly and you have to delete the mirrored partition.

"For our environment, the hardware dependency of Vinca is show stopper, since we can't use it with our Netframe.Lantegrity is not hardware dependent, and something we definitely want to pursue," said DiComo.



Data box

Standby Server 32 v 1.60
Vinca Corp.
Orem UT 84058
801 223 310
801 223 3107 fax
price: $2599 to $2999, depending on type of network adapter required

Lantegrity 3.22c
Network Integrity
Marlboro, MA 01752
508 460 6670
508 460 6771 fax
price: $4950 for 100 users, $1600 for an additional 100 users

© Infoworld Publishing Co.