Skip to content

Commit

Permalink
fix last PR Vonng#166, fix redundant spaces in references in ch1.md also
Browse files Browse the repository at this point in the history
  • Loading branch information
yingang committed Jan 1, 2022
1 parent 8e1aaab commit a0de71a
Showing 1 changed file with 5 additions and 5 deletions.
10 changes: 5 additions & 5 deletions ch1.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,9 +107,9 @@

我们通常认为硬件故障是随机的、相互独立的:一台机器的磁盘失效并不意味着另一台机器的磁盘也会失效。大量硬件组件不可能同时发生故障,除非它们存在比较弱的相关性(同样的原因导致关联性错误,例如服务器机架的温度)。

另一类错误是内部的**系统性错误(systematic error)**【8】。这类错误难以预料,而且因为是跨节点相关的,所以比起不相关的硬件故障往往可能造成更多的**系统失效**9】。例子包括:
另一类错误是内部的**系统性错误(systematic error)**【8】。这类错误难以预料,而且因为是跨节点相关的,所以比起不相关的硬件故障往往可能造成更多的**系统失效**5】。例子包括:

* 接受特定的错误输入,便导致所有应用服务器实例崩溃的BUG。例如2012年6月30日的闰秒,由于Linux内核中的一个错误,许多应用同时挂掉了。
* 接受特定的错误输入,便导致所有应用服务器实例崩溃的BUG。例如2012年6月30日的闰秒,由于Linux内核中的一个错误【9】,许多应用同时挂掉了。
* 失控进程会用尽一些共享资源,包括CPU时间、内存、磁盘空间或网络带宽。
* 系统依赖的服务变慢,没有响应,或者开始返回错误的响应。
* 级联故障,一个组件中的小故障触发另一个组件中的故障,进而触发更多的故障【10】。
Expand Down Expand Up @@ -378,12 +378,12 @@
1. Brian Beach: “[Hard Drive Reliability Update – Sep 2014](https://www.backblaze.com/blog/hard-drive-reliability-update-september-2014/),” *backblaze.com*, September 23, 2014.
1. Laurie Voss: “[AWS: The Good, the Bad and the Ugly](https://web.archive.org/web/20160429075023/http://blog.awe.sm/2012/12/18/aws-the-good-the-bad-and-the-ugly/),” *blog.awe.sm*, December 18, 2012.
1. Haryadi S. Gunawi, Mingzhe Hao, Tanakorn Leesatapornwongsa, et al.: “[What Bugs Live in the Cloud?](http://ucare.cs.uchicago.edu/pdf/socc14-cbs.pdf),” at *5th ACM Symposium on Cloud Computing* (SoCC), November 2014. [doi:10.1145/2670979.2670986](http://dx.doi.org/10.1145/2670979.2670986)
1. Nelson Minar: “[Leap Second Crashes Half the Internet](http://www.somebits.com/weblog/tech/bad/leap-second-2012.html),” *somebits.com*, July 3, 2012.
1. Amazon Web Services: “[Summary of the Amazon EC2 and Amazon RDS Service Disruption in the US East Region](http://aws.amazon.com/message/65648/),” *aws.amazon.com*, April 29, 2011.
1. Nelson Minar: “[Leap Second Crashes Half the Internet](http://www.somebits.com/weblog/tech/bad/leap-second-2012.html),” *somebits.com*, July 3, 2012.
1. Amazon Web Services: “[Summary of the Amazon EC2 and Amazon RDS Service Disruption in the US East Region](http://aws.amazon.com/message/65648/),” *aws.amazon.com*, April 29, 2011.
1. Richard I. Cook: “[How Complex Systems Fail](http://web.mit.edu/2.75/resources/random/How%20Complex%20Systems%20Fail.pdf),” Cognitive Technologies Laboratory, April 2000.
1. Jay Kreps: “[Getting Real About Distributed System Reliability](http://blog.empathybox.com/post/19574936361/getting-real-about-distributed-system-reliability),” *blog.empathybox.com*, March 19, 2012.
1. David Oppenheimer, Archana Ganapathi, and David A. Patterson: “[Why Do Internet Services Fail, and What Can Be Done About It?](http://static.usenix.org/legacy/events/usits03/tech/full_papers/oppenheimer/oppenheimer.pdf),” at *4th USENIX Symposium on Internet Technologies and Systems* (USITS), March 2003.
1. Nathan Marz: “[Principles of Software Engineering, Part 1](http://nathanmarz.com/blog/principles-of-software-engineering-part-1.html),” *nathanmarz.com*, April 2, 2013.
1. Nathan Marz: “[Principles of Software Engineering, Part 1](http://nathanmarz.com/blog/principles-of-software-engineering-part-1.html),” *nathanmarz.com*, April 2, 2013.
1. Michael Jurewitz:“[The Human Impact of Bugs](http://jury.me/blog/2013/3/14/the-human-impact-of-bugs),” *jury.me*, March 15, 2013.
1. Raffi Krikorian: “[Timelines at Scale](http://www.infoq.com/presentations/Twitter-Timeline-Scalability),” at *QCon San Francisco*, November 2012.
1. Martin Fowler: *Patterns of Enterprise Application Architecture*. Addison Wesley, 2002. ISBN: 978-0-321-12742-6
Expand Down

0 comments on commit a0de71a

Please sign in to comment.