WIthout knowing specifics of the hardware and software config and components, it's really hard to say. As weird as it sounds, the hardware level is the easiest part to do. Making the application work is what's hard.

Generally, in that configuration each controller can access all of the disks equally well. A SCSI bus (probably LVD SCSI) isn't like the IDE/ATA bus, and it's normally not a problem to have multiple controllers on the same bus -- at a base functional level -- as long as they have different SCSI ids, or the storage device has exposes the same SCSI id on two different buses. Each controller will be able to do whatever it needs to.

The problem is that each controller knows =nothing= about what the other controller is doing. And so, commonly, you only ever see one controller on a bus.

In a single controller setup, like just about every computer you've ever used, the operating system has complete control of the disk controller and the content that is on the disk. So there's coherency between all processes running on that computer and the contents of the disk. If one user deletes a file, the operating system actually does the delete and any cached knowledge about the contents of the disk are consistent with what's actually on the disk.

In a shared storage situation each controller (and the operating system that has access to that controller) can access the disks equally. So here's a scenerio:

Code:
Computer A                                          Computer B

========== ==========
1. Email account created -
2. Email recieved and stored for user -
3. - User reads email and deletes it
4. Another email received, and \ -
writen to the disk location \
immediately after previous \
email
5. BOOM! Neither server really knows what's going on



There are a number of solutions to prevent these kind of race conditions.
1. Fine grained locking internal to each instance of the application that is stored on the shared disk, but even then it's really damn hard to prevent a race for creating the lock

2. Have both/all instances of the application that accesses the disk communicate with each other constantly about who's accessing what part of the shared storage. Oracle's Distributed Lock Manager is a form of this, which is what makes Oracle Parallel Server/RAC work. Active-active cluster.

3. Only one node is actively running the application at a time, and the disks are only mounted on the active node. (Both nodes are up and running other applications that talk to each other with heartbeats and all sorts of stuff.) Active-passive/standby cluster. When there's a problem with one machine, the other machine notices this, maybe because it doesn't get the heartbeat message or whatever, and the node state transitions from passive to active, forcibly taking control of the disks and the application starts up, recovering the shared data store. Veritas Cluster Server frequently works like this, and the application doesn't need to be cluster aware.

That's a very basic description. It's reasonably easy to write a cluster solution that works 80%-90% of the time, writing one that you can count on 100% of the time is very, very hard.

Generally, installations of this type will also have a high level of redundancy in the disk storage. (You're more likely to have a disk failure than a server failure that wouldn't corrupt data.) RAID 5 with multiple hot spares if performance isn't a major concern, or two disk arrays & two servers that are connected to each other.

Code:
serverA    serverB

| \ / |
| \ / |
| \ |
| / \ |
| / \ |
diskA diskB



And do RAID 1 (either software with two seperate SCSI controllers OR hardware on a dual port controller) so that when either server does a disk write it's dispatched to both arrays.

edit: And it's probably worth mentioning that the disk arrays you'd use in a situation like this aren't just simple external SCSI disks. They tend to have many different SCSI buses -- one SCSI bus that the disks are actually on which is seperate from the SCSI bus(es) that the external connectors connect to the servers on.

--Nathan


Edited by Mataglap (22/10/2005 00:16)