All our web applications (search, website, checkout, portal, editor, etc) have run as a clustered set of redundant servers since the beginning.
The onCourse admin application has not, but we are quite close to making this possible and having zero downtime upgrades. Right now, upgrading onCourse requires a 20 second outage of the admin application.
Support database cluster
Schema upgrades locked to one instance
Session management moved to database
All client-server interactions RESTful
Support load balancer in front of instances
Health check URL
Cron jobs locked to one instance
Replication thread locked to one instance
Message sending thread locked to one instance
Print job data synced across instances - not complete
invoice/student number generation