一、简介
ABNF全称是Augmented Backus-Naur Form,广泛用于很多的互联网文档说明中。主要作用就是以简洁的字符串来描述某些规范。使用了ABNF的标准说明有:电子邮件的标准说明[RFC733]和之后的[RFC822],HTTP1.1协议的[RFC7230]。因此,要想阅读这些文档,必须了解ABNF的格式。ABNF在RFC5234中进行了详细的说明。
二、规则定义
2.1 规则命名
ABNF中规则的命名是大小写不敏感的,由字母开头,后面跟上字母、数字或连字符
2.2 规则格式
一个规则是如下格式定义的:
name = elements crlf
name指的是规则名,elements是一个或多个规则名,或者是终端字符,crlf也就是我们常说的\r\n
2.3 终端值
一个规则被解释成一个字符串。每个字符都是一个非负的数字(比如ASCII码中a对应十进制的97)。终端值就是这些数字。目前定义了以下几种进制:
b = binary ;二进制
d = decimal ;十进制
x = hexadecimal ;十六进制
因此:
CR = %d13
CR = %x0D
使用”.”号来分割字符
CRLF = %d13.10
2.4 额外的编码
根据编码不同,所显示的值可能也不同。比如7-bit的US-ASCII和16-bit的unicode编码,结果是截然不同的。目前7-bit的US-ASCII编码是最常用的。
三、 运算符
3.1 连接: Rule1 Rule2
连接的意思就是值一个规则可能由其他规则连接而成。比如
foo = %x61 ; a
bar = %x62 ; b
mumble = foo bar foo
因此规则mumnle = aba
3.2 选择: Rule1 / Rule2
选择就是多选一的意思。比如
rule = foo / bar
那么rule是foo或者bar都接受的
3.3 扩展的选择: Rule1 =/ Rule2
ruleset = rule1 / rule2
ruleset =/ rule3
ruleset =/ rule4 / rule5
那么ruleset最终为
ruleset = rule1 / rule2 / rule3 / rule4 / rule5
3.4 范围选择: %c##-##
DIGIT = %x30-39
等价于
DIGIT = "0" / "1" / "2" / "3" / "4" / "5" / "6" / "7" / "8" / "9"
3.5 序列组: (Rule1 Rule2)
序列组主要是为了阅读上的方便
elem (foo / bar) blat
等价于
(elem foo blat) or (elem bar blat)
而
elem foo / bar blat
等价于
(elem foo) or (bar blat)
3.6 变量重复: *Rule
完整的格式为:<a>*<b>element
<a>
和<b>
是可选的数字值,代表最少a个,最多b个
因此:
*<element> 0到任意多个
1*<element> 至少1个
3*3<element> 只能是3个
1*2<element> 1到2个
3.7 指定的重复: nRule
n<element>等价于n*n<element>
3.8 可选的序列: [Rule]
[Rule]代表这个规则可有可无。因此[foo bar]等价于*1[foo bar]
3.9 注释: ;Comment
使用;
来表示注释
3.10 运算符优先级
运算符优先级从上往下排序如下:
规则名, 单值, 终端值
注释
范围取值
重复
组, 可选
连接
选择
四、使用ABNF定义ABNF
rulelist = 1*( rule / (*c-wsp c-nl) )
rule = rulename defined-as elements c-nl
; continues if next line starts
; with white space
rulename = ALPHA *(ALPHA / DIGIT / "-")
defined-as = *c-wsp ("=" / "=/") *c-wsp
; basic rules definition and
; incremental alternatives
elements = alternation *c-wsp
c-wsp = WSP / (c-nl WSP)
c-nl = comment / CRLF
; comment or newline
comment = ";" *(WSP / VCHAR) CRLF
alternation = concatenation
*(*c-wsp "/" *c-wsp concatenation)
concatenation = repetition *(1*c-wsp repetition)
repetition = [repeat] element
repeat = 1*DIGIT / (*DIGIT "*" *DIGIT)
element = rulename / group / option /
char-val / num-val / prose-val
group = "(" *c-wsp alternation *c-wsp ")"
option = "[" *c-wsp alternation *c-wsp "]"
char-val = DQUOTE *(%x20-21 / %x23-7E) DQUOTE
; quoted string of SP and VCHAR
; without DQUOTE
num-val = "%" (bin-val / dec-val / hex-val)
bin-val = "b" 1*BIT
[ 1*("." 1*BIT) / ("-" 1*BIT) ]
; series of concatenated bit values
; or single ONEOF range
dec-val = "d" 1*DIGIT
[ 1*("." 1*DIGIT) / ("-" 1*DIGIT) ]
hex-val = "x" 1*HEXDIG
[ 1*("." 1*HEXDIG) / ("-" 1*HEXDIG) ]
prose-val = "<" *(%x20-3D / %x3F-7E) ">"
; bracketed string of SP and VCHAR
; without angles
; prose description, to be used as
; last resort
附录:核心规则
ALPHA = %x41-5A / %x61-7A ; A-Z / a-z
BIT = "0" / "1"
CHAR = %x01-7F
; any 7-bit US-ASCII character,
; excluding NUL
CR = %x0D
; carriage return
CRLF = CR LF
; Internet standard newline
CTL = %x00-1F / %x7F
; controls
DIGIT = %x30-39
; 0-9
DQUOTE = %x22
; " (Double Quote)
HEXDIG = DIGIT / "A" / "B" / "C" / "D" / "E" / "F"
HTAB = %x09
; horizontal tab
LF = %x0A
; linefeed
LWSP = *(WSP / CRLF WSP)
; Use of this linear-white-space rule
; permits lines containing only white
; space that are no longer legal in
; mail headers and have caused
; interoperability problems in other
; contexts.
; Do not use when defining mail
; headers and use with caution in
; other contexts.
OCTET = %x00-FF
; 8 bits of data
SP = %x20
VCHAR = %x21-7E
; visible (printing) characters
WSP = SP / HTAB
; white space
RFC5234地址:https://tools.ietf.org/html/rfc5234