-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
One-pager (pitch) why companies would want to join our foundation #3
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,25 @@ | ||||||
# Why Support TLA+ | ||||||
|
||||||
### What TLA+ Does | ||||||
|
||||||
One of the costliest elements of software development — second only to finding out what to build — is ensuring the system does what it is meant to do. That is why developers spend much of their time on quality assurance through code reviews, testing (functional, performance, security), and a great deal of bug-fixing. Some bugs are costlier than others because: | ||||||
|
||||||
- They are difficult to reproduce during the development phase (**discovery**). | ||||||
- They are *design* flaws that require significant rework to fix (**correction**). | ||||||
- They cause significant damage, such as downtime, data loss, or security vulnerabilities (**impact**). | ||||||
|
||||||
Because they are non-deterministic and complex, concurrent and distributed systems, tend to suffer from bugs that are costly for all those reasons. | ||||||
|
||||||
TLA+ is a specification language and a set of tools that offer a cost-effective way of finding and fixing flaws early in development when those flaws are cheaper to fix. Users write a precise and concise description of the system and their expectations from its behavior in TLA+, and the tools then check whether the expectations hold and, if not, how they might break. | ||||||
|
||||||
### Why TLA+ | ||||||
|
||||||
TLA+ is particularly attractive for industry use compared to other formal methods. Its unique strengths in specifying concurrent and distributed systems make it applicable precisely where mainstream software needs help the most. It can specify arbitrary properties of systems of arbitrary complexity and "ordinary" developers can still learn it in a few weeks. Automated verification with a model checker makes the cost of verification low enough to compare favorably with traditional assurance methods. | ||||||
|
||||||
For those reasons, TLA+ is among the most popular formal methods in industry, employed successfully by Amazon, Oracle, Microsoft, ElasticSearch, MongoDB, and others. This [technical report from Amazon](http://lamport.azurewebsites.net/tla/amazon-excerpt.html) relates how engineers learned TLA+ in just a few week. They then found and fixed potentially catastrophic and hard-to-reproduce problems in several AWS services. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. CrowdStrike -- sure. But we should only specifically list companies worth over $1B and have a Wikipedia entry (or, at least, are so famous that everyone has heard of). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I thought you wanted to have a proof of TLA+ being used in industry. I only have seen TLA+ specs publicly available from Microsoft, ElasticSearch, MongoDB, and Informal Systems. Amazon's specs are hidden under NDA. Oracle? Never seen anything. Do we all agree about this criterion: But we should only specifically list companies worth over $1B and have a Wikipedia entry (or, at least, are so famous that everyone has heard of)? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Docker can be added There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Most companies won't open-source their TLA+ specs. For example, the only spec ever published by Microsoft was specifically written to be public user documentation. All other specifications are confidential. To keep the paragraph short, why don't we lobby our peers to have their employers/companies added to https://github.com/tlaplus/awesome-tlaplus#users and link to it here? If say, an Oracle employee, created the PR that added Oracle to https://github.com/tlaplus/awesome-tlaplus#users, I'd assume this to be good enough to convince execs. |
||||||
|
||||||
### Where TLA+ Has Been and Where It's Going | ||||||
|
||||||
Turing Award-winner Leslie Lamport invented TLA+ in the '90s as the culmination of decades of research in concurrent and distributed systems. It has gained a measure of popularity in the industry since the 2010s, as distributed systems started growing in size and use, and it's become clear that traditional quality assurance methods fail to catch some of the costliest bugs. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are we going all in on saying "this is a tool for distributed systems"? Do we want people to think that they should use other tools if they don't have a concurrent system? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Because the main pain-point vis-à-vis correctness and its cost that established companies face is distributed systems, I believe this is where TLA+ gives the most bang for the buck for the intended audience. I could add something about other kinds of systems, but we need to keep an eye out for the length. |
||||||
|
||||||
As Lamport is nearing retirement from Microsoft Research, he wishes to entrust TLA+ to a foundation to ensure TLA+'s relevance and growth are supported by donations of funds and personnel. The foundation will also invest in promising avenues to make TLA+ easier to use or more powerful, such as tools that monitor if a production system deviates from its specification at runtime. Another avenue are tools that could generate tests from the specifications. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is the only part that talks about why companies would want to join the foundation, as opposed to use (or appreciate) TLA+. This covers the benefits to us if they join; what about the benefits to them if they join? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You're right. My thinking here was that once an executive is convinced that TLA+ is useful and should survive (and evolve), they will then be interested in the precise contribution levels and benefits, which could be the subject of another document. But if there's something short and powerful we could add here -- we should. I don't know what it is, though. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Amazon paper was published in Communications of ACM too: https://cacm.acm.org/magazines/2015/4/184701-how-amazon-web-services-uses-formal-methods/fulltext