YAML | The Norway Problem

YAML | The Norway Problem 注本文为 “YAML | The Norway Problem” 相关合辑。英文引文机翻未校。略作重排如有内容异常请看原文。The Norway Problem - why StrictYAML refuses to do implicit typing and so should you挪威问题 - 为什么 StrictYAML 拒绝隐式类型化你也应该这样做A while back I met an old coworker and he started telling me about this interesting bug he faced:不久前我遇到了一位老同事他开始告诉我他遇到的一个有趣的错误“So, we started internationalizing the website by creating a config file. We added the UK, Ireland, France and Germany at first.”“所以我们开始通过创建一个配置文件来国际化网站。起初我们添加了英国、爱尔兰、法国和德国。”countries: - GB - IE - FR - DE“This was all fine. However, one day after a quick configuration change all hell broke loose. It turned out that while the UK, France and Germany were all fine,Norwaywasnot…”“这都很好。然而有一天在一次快速的配置更改后一切都崩溃了。结果发现虽然英国、法国和德国都没问题但挪威却不是……”“While the website went down and we were losing money we chased down a number of loose ends until finally finding the root cause.”“当网站瘫痪我们在亏损时我们追踪了许多松散的环节直到最终找到根本原因。”“If turned out that if feed this configuration file into pyyaml:”“如果将这个配置文件输入到 pyyaml 中结果如下”countries: - GB - IE - FR - DE - NO“This is what you got in return:”“你得到的结果是” from pyyaml import load load(the_configuration) {countries: [GB, IE, FR, DE, False]}It snows alotin False.在 False 中下了很多雪。When this is fed to code that expects a string of the form ‘NO’, then the code will usually break, often with a cryptic error, Typically it would be a KeyError when trying to use ‘False’ as a key in a dict when no such key exists.当这个值传递给期望字符串形式为 ‘NO’ 的代码时代码通常会崩溃通常会出现难以理解的错误通常在尝试使用 ‘False’ 作为字典中的键时会出现 KeyError因为没有这样的键。It can be “quick fixed” by using quotes - a fix for sure, but kind of a hack - and by that time the damage is done:可以通过使用引号进行“快速修复”——这确实是一个修复但有点像黑客行为——而到那时损害已经造成countries: - GB - IE - FR - DE - NOThe most tragic aspect of this bug, however, is that it isintendedbehavior according to the YAML 1.2 specification. The real fix requires explicitly disregarding the spec - which is why most YAML parsers have it.然而这个错误最惨痛的方面是根据 YAML 1.2 规范这是预期的行为。真正的修复需要明确无视规范——这就是大多数 YAML 解析器之所以存在的原因。StrictYAML sidesteps this problem by ignoring key parts of the spec, in an attempt to create a “zero surprises” parser.StrictYAML 通过忽略规范的关键部分来规避这个问题试图创建一个“零惊讶”的解析器。Everythingis a string by default:所有内容默认为字符串 from strictyaml import load load(the_configuration).data {countries: [GB, IE, FR, DE, NO]} from strictyaml import load load(the_configuration).data {国家: [GB, IE, FR, DE, NO]}String or float?字符串还是浮点数Norway is just the tip of the iceberg. The first time this problem hit me I was maintaining a configuration file of application versions. I had a file like this initially - which caused no issues:挪威只是冰山一角。第一次遇到这个问题时我正在维护一个应用程序版本的配置文件。起初我有一个这样的文件——没有造成任何问题python: 3.5.3 postgres: 9.3.0python: 3.5.3 postgres: 9.3.0However, if I changed itveryslightly:然而如果我稍微更改一下python: 3.5.3 postgres: 9.3python: 3.5.3 postgres: 9.3I started getting type errors because it was parsed like this:我开始收到类型错误因为它被解析为这样 from ruamel.yaml import load load(versions) [{python: 3.5.3, postgres: 9.3}]# oops those *both* should have been strings from ruamel.yaml import load load(versions) [{python: 3.5.3, postgres: 9.3}]# 哎呀这两个*都*应该是字符串Again, this led to type errors in my code. Again, I ‘quick fixed’ it with quotes. However, the solution I really wanted was:再次这导致我的代码中出现类型错误。再次我用引号“快速修复”了它。然而我真正想要的解决方案是 from strictyaml import load load(versions) [{python: 3.5.3, postgres: 9.3}]# thats betterThe world’s most buggy name世界上最麻烦的名字Christopher Null has a name that is notorious for breaking software code - airlines, banks, every bug caused by a programmer who didn’t know a type from their elbow has hit him.Christopher Null 的名字因破坏软件代码而臭名昭著——航空公司、银行以及每个由于程序员对类型一无所知而导致的错误都影响了他。YAML, sadly, is no exception:可悲的是YAML 也不例外first name: Christopher surname: Null # Is it okay if we just call you Christopher None instead? load(name) {first name: Christopher, surname: None}名字: Christopher 姓: Null # 我们可以称呼你为 Christopher None 吗 load(name) {名字: Christopher, 姓: None}With StrictYAML:使用 StrictYAML from strictyaml import load load(name) {first name: Christopher, surname: Null} from strictyaml import load load(name) {名字: Christopher, 姓: Null}Type theoretical concerns类型理论问题Type theory is a popular topic with regards to programming languages, where a well designed type system is regarded (rightly) as a yoke that can catch bugs at an early stage of development whilepoorlydesigned type systems provide fertile breeding ground for edge case bugs.类型理论是一个与编程语言相关的热门话题设计良好的类型系统被认为正确地是一个可以在开发早期阶段捕获错误的束缚而设计不良的类型系统则为边缘情况错误提供了肥沃的滋生土壤。(it’s equally true that extremely strict type systems require a lot more upfront and the law of diminishing returns applies to type strictness - a cogent answer to the question “why is so little software written in haskell?”).同样极其严格的类型系统需要更多的前期工作并且收益递减法则适用于类型的严格性——这是对“为什么如此少的软件是用 Haskell 编写的”这个问题的有力回答。A less popular, although equally true idea is the notion that markup languages like YAML have the same issues with types - as demonstrated above.一个不太流行但同样真实的观点是像 YAML 这样的标记语言在类型上也存在相同的问题——如上所示。User Experience用户体验In a way, type systems can be considered both a mathematical concern and a UX device.在某种程度上类型系统可以被视为数学问题和用户体验设备的结合。In the above, and in most cases, implicit typing represents a major violation of the UX principle of least astonishment.在上述情况中以及大多数情况下隐式类型化代表了对用户体验最少惊讶原则的重大违反。What design trade-offs led to the “Norway problem” in YAML, and when are they worthwhile?什么设计权衡导致了 YAML 中的“挪威问题”何时是值得的A well-known problem in YAML is a type-inference issue in parsing where a string is misinterpreted as a boolean. This is known as the “Norway problem”, because it occurs when a field or array entry intended to hold an ISO-3166-2 country code as a string is given the code for Norway —NO.一个众所周知的 YAML 问题是解析中的类型推断问题其中字符串被误解为布尔值。这被称为“挪威问题”因为它发生在一个字段或数组条目旨在作为字符串存储ISO-3166-2 国家代码时却给出了挪威的代码 —NO。countries:-SE-NO-FIYAML acceptsyesandnoas true and false values, and so when processed the expected string value in the field is instead a boolean. The syntax highlighting aboveshouldindicate that.YAML 将yes和no视为真和假值因此在处理时字段中预期的字符串值反而变成了布尔值。上面的语法高亮应该表明这一点。Norway is the most noted source of these issues, but it would also happen for Ontario, and for values expected to include the strings “yes”, “true”, “false” (Unix commands), “off” (not leg), “null” (as a surname), “nan” (a relative), and some others. A similar, but separate, issue can arise with version numbers, where1.2.1and1.2are different types, or ratios, where1:12is the float 1.2 but5:60is a string.挪威是这些问题最显著的来源但它也会发生在安大略省以及期望包含字符串 “yes”、“true”、“false”Unix 命令、“off”不是腿、“null”作为姓氏、“nan”亲属等值的情况下。与版本号相关的类似但不同的问题也可能出现其中1.2.1和1.2是不同的类型或者比例其中1:12是浮点数 1.2但5:60是字符串。While YAML is a human-editable serialization format, reminiscent issues could arise in Perl, PHP, shell scripts, and a number of other unequivocal programming languages with unmarked string literals, as well as some other configuration and transfer formats. Later versions of YAML attempted to remove much of this overloading, and it and similar constructions are generally frowned upon — but still present in real-world systems.虽然 YAML 是一种人可编辑的序列化格式但在 Perl、PHP、Shell 脚本以及一些其他没有标记字符串文字的明确编程语言中可能会出现类似的问题以及一些其他配置和传输格式。后来的 YAML 版本试图消除许多这种重载通常会对其及类似构造表示反感——但在现实世界的系统中仍然存在。There are variations, but versions of this sort of issue have arisen enough that the choices leading to them must have value. What conditions lead to these decisions, in YAML or elsewhere, and when (if ever) is this sort of “smart” value parsing a worthwhile trade-off?虽然存在变体但这种问题的版本已经出现得足够多以至于导致这些选择必然有其价值。是什么条件导致了这些决定无论是在 YAML 还是其他地方以及何时如果有这种“智能”值解析是值得的权衡asked May 26, 2023 at 23:44Michael Homer♦It’s not clear to me what you mean by “‘off’ (not leg)”. Are you referring to cricket terminology, perhaps?我不太明白你所说的“‘off’不是腿”是什么意思。你是在指板球术语吗– Karl Knechtel评论于 2024 年 5 月 21 日 21:52Being on one or other side of a line drawn through both middle stumps, yes.是的站在通过两个中间柱子画的线的任一侧。– Michael Homer ♦评论于 2024 年 5 月 21 日 22:03not leg 指板球术语在板球中投球手的投球方向分为“off”球和“leg”球。“off”球是指投向击球手的身体右侧而“leg”球则是投向身体左侧。因此“not leg”可以理解为投球不是针对击球手的左侧在运动中选择不同的方向或策略。Answers回答It is a somewhat reasonable idea to provide convenient facilities for presumed common cases, but when done carelessly, it tends to backfire in unexpected situations. Javascript is notoriously plagued by this problem. Introducing features like this makes the language generally simpler to use. That’s why you can see this in scripting and configuration languages, where it had been intended to facilitate simple tasks. However, the benefits tend to eventually get outweighed by the confusing and inconvenient experiences with unaccounted use-cases. I don’t believe there’s any further motive behind such features other than the near-sightedness of the language authors.提供方便的设施以处理假定的常见情况是一个相对合理的想法但如果做得不够仔细往往会在意想不到的情况下适得其反。JavaScript 因这一问题而臭名昭著。引入这样的特性使得语言通常更易于使用。这就是为什么你可以在脚本和配置语言中看到这一点因为它们旨在简化简单任务。然而随着未考虑的用例带来的困惑和不便利益往往最终被抵消。我不相信这些特性背后还有其他动机除了语言作者的短视。answered May 27, 2023 at 16:01abel1502“It makes easy things easier, and hard things impossible.”“它让简单的事情变得更简单而让困难的事情变得不可能。”– John Gordon评论于 2023 年 7 月 27 日 2:54JohnGordon is that a quote you’ve found elsewhere? I like it! I’d love the source, and if it’s you, to quote you :JohnGordon这是你从其他地方找到的引用吗我喜欢它我想知道来源如果是你我想引用你 :– 0atman评论于 2023 年 12 月 4 日 9:150atman It’s a half-remembered quote about something on Windows – possibly Visual Basic? It’s a deliberate snarky take on an earlier quote something like “the easy things should be easy, and the hard things should be possible”.0atman这是一个关于 Windows 的半记忆引用——可能是 Visual Basic这是对早期一句类似“简单的事情应该简单而困难的事情应该可能”的引用的故意讽刺。– John Gordon评论于 2023 年 12 月 4 日 13:31JohnGordon brilliant, that’s VB in a nutshell! thank you!JohnGordon太棒了这就是 VB 的精髓谢谢你– 0atman评论于 2023 年 12 月 13 日 16:43It’s derived from Larry Wall’s design philosophy for Perl: “Easy things should be easy and hard things should be possible”.它源自 Larry Wall 对 Perl 的设计哲学“简单的事情应该简单而困难的事情应该可能。”– Frank Kusters评论于 2024 年 8 月 30 日 12:15All programming languages (in a very broad sense, including markup and configuration languages, domain-specific languages, etc) can be seen as a way for humans to express ideas in a way that a computer can act on.所有编程语言在非常广泛的意义上包括标记和配置语言、领域特定语言等都可以被视为人类以计算机可以执行的方式表达思想的一种方式。One of the challenges this poses is that human thought is imprecise, fuzzy, and contextual; whereas electronics are (at least by design) precise, logical, and deterministic. This leads to an endless series of design decisions:when do we insist that the programmer must “think like a computer”, and when do we try to teach the computer to “think like a human”?这一挑战在于人类的思维是不精确的、模糊的和上下文相关的而电子设备至少在设计上是精确的、逻辑的和确定的。这导致了一系列无尽的设计决策我们何时坚持程序员必须“像计算机一样思考”而何时又试图教计算机“像人类一样思考”For example, an assembly language might assign mnemonics to locations in memory, but a higher-level language allows the programmer to ignore memory layout completely, and just “declare a variable”. To allocate the correct memory for that variable, the compiler needs to know its type, but may provide facilities toinferthat type from its surrounding context.例如汇编语言可能会将助记符分配给内存中的位置但高级语言允许程序员完全忽略内存布局只需“声明一个变量”。为了为该变量分配正确的内存编译器需要知道它的类型但可以提供从其周围上下文推断该类型的功能。Most of the time, these are powerful features that help programmers to express their ideas; but sometimes they hide details thatareimportant. The Java manual links to this style guide discussing the pros and cons of using thevarkeyword introduced in Java 10. One of its examples of thedangersof the feature is where a literal could represent multiple types:大多数情况下这些都是强大的功能可以帮助程序员表达他们的想法但有时它们会隐藏重要的细节。Java 手册链接到这个风格指南讨论使用 Java 10 中引入的var关键字的优缺点。它的一个示例是该特性的危险其中一个字面量可能代表多种类型// ORIGINAL byte flags 0; short mask 0x7fff; long base 17; // DANGEROUS: all infer as int var flags 0; var mask 0x7fff; var base 17;This is essentially what is happening in the “Norway problem”: rather than requiring the user to mark all string values with quote marks (or all non-string values with some other syntax), YAML requires parsers toinferthe type of expressions based on specific values. When the values are unambiguous, this makes the syntax look a lot more like what a human would naturally write - a list of strings can be as simple as:这基本上就是“挪威问题”中发生的事情YAML 不要求用户用引号标记所有字符串值或用其他语法标记所有非字符串值而是要求解析器根据特定值推断表达式的类型。当值不模糊时这使得语法看起来更像人类自然书写的内容——字符串列表可以简单到- item one - item two - item three- 项目一 - 项目二 - 项目三And a boolean setting can be:而布尔设置可以是spline_reticulation: onBut this syntax isinherently ambiguous. The language specification defines which interpretation should be followed, but it’s not always the interpretation that a human would choose with knowledge of the context.但这种语法本质上是模糊的。语言规范定义了应该遵循哪种解释但这并不总是人类在了解上下文时会选择的解释。Automatic type coercion can have similar problems. For instance, PHP is designed for use with HTTP, a string-based protocol, so allows users to treat numeric strings directly as numbers:1 2 3. But exactlywhento apply that coercion is not always clear-cut, e.g. if0 0and0.0 0are both true, is0 0.0also true? Neither answer is going to match intentions in 100% of cases.自动类型强制转换可能会有类似的问题。例如PHP 是为 HTTP 设计的这是一种基于字符串的协议因此允许用户将数字字符串直接视为数字1 2 3。但究竟何时应用这种强制转换并不总是明确的例如如果0 0和0.0 0都为真0 0.0也为真吗这两个答案都不会在 100% 的情况下符合意图。There is something of a paradox:the more you try to hide the complexity from the user, the more complexity you have to introduce. Every time we allow the computer toassumethe intention of the human, rather than demanding explicit information, we run the risk of it making awrongassumption.这有些矛盾你越是试图向用户隐藏复杂性就越需要引入更多的复杂性。每次我们允许计算机假设人类的意图而不是要求明确的信息时我们都面临着它做出错误假设的风险。edited Jun 15, 2024 at 13:56answered Jun 15, 2024 at 12:51IMSoPreferenceThe Norway Problem - why StrictYAML refuses to do implicit typing and so should you - HitchDevhttps://hitchdev.com/strictyaml/why/implicit-typing-removed/types - What design trade-offs led to the “Norway problem” in YAML, and when are they worthwhile? - 2023https://langdev.stackexchange.com/questions/1123/what-design-trade-offs-led-to-the-norway-problem-in-yaml-and-when-are-they-woISO - ISO 3166 — Country Codeshttps://www.iso.org/iso-3166-country-codes.htmlYAML挪威问题(The Norway Problem)根据 iso-3166-country-codes 标准每个国 - 2023https://juejin.cn/post/7195414234115620901