Castle Learning Blog

Castle Learning Blog

Home / Foreign Language

inner-sub-banner.png

STAY UPDATED!

Get notified about latest updates & promotions. Please enter your email address below:

Our Commitment to Excellence

January 21, 2013

/ by

castlelearning

/ 0 Comments

Greetings!I would like to keep you up to date on Thursday's service outage and apologize once again for the disruption.

There are a number of reasons why a web site may not be accessible or operate properly, which can range from issues with your local internet provider to programming errors ("bugs") introduced to the web site inadvertently by us.

We have full responsibility and control over programming issues and our staff has always quickly addressed them as they are discovered. However, network issues are a good example of circumstances beyond our control. Our network infrastructure is managed by one of the country's most highly rated web hosting organizations, INetU. Over the years, they have maintained a remarkable "up-time" record and always are very responsive when our technical staff needs them to investigate various networking and server issues that occur from time to time. On Thursday, they experienced an internal failure that took them a few hours to correct. This affected all of their clients - not just the Castle web site. They kept us informed of what was going on and we in turn did our best to let you know what was happening.

INetU let us know that they carefully analyzed all of the switches in their data center (which is where the failure began) and have taken steps to make sure that a similar failure will not have the same negative effect. Here is part of their message sent to us early Saturday night:

"The engineering team has re-enabled all switches in the management layer. During the maintenance early this morning our team, along with our vendors, carefully analyzed each switch and ran tests to ensure that they were operating properly.  As we turned up the switches one by one, we found a few degraded paths within a VLAN that exhibited instability and some packet loss. This was corrected by toggling ports which immediately cleared the issue. Engineering adjusted certain error detection parameters to be less aggressive, and the entire system was released back into production mode.

We are now confident that should a loop occur again it will not affect the network as it did on Thursday. Your sites and applications are taking full advantage of the resources again and INetU is at 100% functionality for our clients.
We have learned from this. We are going to immediately begin addressing the following: 1) Accelerate our investment in some of our internal communication systems, to add another level of business continuity for us.  2) Correct some fault tolerant shortcomings in the way we provision and monitor parts of our network by architecting more out-of-band systems. We lost some critical visibility during the early parts of the failure which required us to spend time actually visiting dozens of devices in person. This slowed us down in our effort to restore connectivity. 3) Engage a third-party analysis to make sure that we have addressed every challenge and to verify our work."
As the new President of Castle Software, I cannot guarantee 100% reliability - not even Google or Amazon can totally avoid loss of some services on occasion - but I can tell you that the Castle Learning team remains dedicated to providing excellent educational tools for your teachers and students, continued outstanding customer service, and we will do our very best to make sure that our technology partners do everything possible to minimize service interruptions.Please do not hesitate to contact me if you have any questions.

Thank you again,
Scott Fischer
President, Castle Learning Online
Castle Software, Inc.

Stay Updated!

Get notified about latest updates and promotions. Please enter your email address below:

Subscribe Here!