从 XSS Payload 学习浏览器解码

k1   ·   发表于 2019-08-15 09:06:19   ·   漏洞文章

前几天看了浏览器解码看XSS,没有看得很明白,又找了这篇深入理解浏览器解析机制和XSS向量编码,翻译的文章,有些地方翻译的怪怪的,需要看下原文,啃了2天终于搞明白了

原文里给出了几个XSS Payload,也给出了答案演示地址,有答案但没解析,下面一个个分析

有点像上学的时候,看书看不懂,做题不会做,看答案解析做题就懂了

Basics

1

<a href="%6a%61%76%61%73%63%72%69%70%74:%61%6c%65%72%74%28%31%29"></a>

URL encoded "javascript:alert(1)"

Answer: The javascript will NOT execute.

里面没有HTML编码内容,不考虑,其中href内部是URL,于是直接丢给URL模块处理,但是协议无法识别(即被编码的javascript:),解码失败,不会被执行

URL规定协议,用户名,密码都必须是ASCII,编码当然就无效了

A URL’s scheme is an ASCII string that identifies the type of URL and can be used to dispatch a URL for further processing after parsing. It is initially the empty string.
A URL’s username is an ASCII string identifying a username. It is initially the empty string.
A URL’s password is an ASCII string identifying a password. It is initially the empty string.

from https://url.spec.whatwg.org/#concept-url

2

<a href="&#x6a;&#x61;&#x76;&#x61;&#x73;&#x63;&#x72;&#x69;&#x70;&#x74;:%61%6c%65%72%74%28%32%29">

Character entity encoded "javascript" and URL encoded "alert(2)"

Answer: The javascript will execute.

先HTML解码,得到

<a href="javascript:%61%6c%65%72%74%28%32%29">

href中为URL,URL模块可识别为javascript协议,进行URL解码,得到

<a href="javascript:alert(2)">

由于是javascript协议,解码完给JS模块处理,于是被执行

3

<a href="javascript%3aalert(3)"></a>

URL encoded ":"

Answer: The javascript will NOT execute.

同1,不解释

4

<div>&#60;img src=x onerror=alert(4)&#62;</div>

Character entity encoded < and >

Answer: The javascript will NOT execute.

这里包含了HTML编码内容,反过来以开发者的角度思考,HTML编码就是为了显示这些特殊字符,而不干扰正常的DOM解析,所以这里面的内容不会变成一个img元素,也不会被执行

从HTML解析机制看,在读取<div>之后进入数据状态,&#60;会被HTML解码,但不会进入标签开始状态,当然也就不会创建img元素,也就不会执行

5

<textarea>&#60;script&#62;alert(5)&#60;/script&#62;</textarea>

Character entity encoded < and >

Answer: The javascript will NOT execute AND the character entities will NOT
be decoded either

<textarea>RCDATA元素(RCDATA elements),可以容纳文本和字符引用,注意不能容纳其他元素,HTML解码得到

<textarea><script>alert(5)</script></textarea>

于是直接显示

RCDATA元素(RCDATA elements)包括textareatitle

6

<textarea><script>alert(6)</script></textarea>

Answer: The javascript will NOT execute.

同5,不解释

Advanced

7

<button onclick="confirm('7&#39;);">Button</button>

Character entity encoded '

Answer: The javascript will execute.

这里onclick中为标签的属性值(类比2中的href),会被HTML解码,得到

<button onclick="confirm('7');">Button</button>

然后被执行

8

<button onclick="confirm('8\u0027);">Button</button>

Unicode escape sequence encoded '

Answer: The javascript will NOT execute.

onclick中的值会交给JS处理,在JS中只有字符串和标识符能用Unicode表示,'显然不行,JS执行失败

In string literals, regular expression literals, template literals and identifiers, any Unicode code point may also be expressed using Unicode escape sequences that explicitly express a code point's numeric value.

from https://www.ecma-international.org/ecma-262/10.0/index.html#sec-ecmascript-language-source-code (这个链接很卡)

标识符(identifiers)
代码中用来标识变量、函数、或属性的字符序列。
在JavaScript中,标识符只能包含字母或数字或下划线(“_”)或美元符号(“$”),且不能以数字开头。标识符与字符串不同之处在于字符串是数据,而标识符是代码的一部分。在 JavaScript 中,无法将标识符转换为字符串,但有时可以将字符串解析为标识符。

from https://developer.mozilla.org/zh-CN/docs/Glossary/Identifier

9

<script>&#97;&#108;&#101;&#114;&#116&#40;&#57;&#41;&#59</script>

Character entity encoded alert(9);

Answer: The javascript will NOT execute.

script属于原始文本元素(Raw text elements),只可以容纳文本,注意没有字符引用,于是直接由JS处理,JS也认不出来,执行失败

原始文本元素(Raw text elements)有<script><style>

10

<script>\u0061\u006c\u0065\u0072\u0074(10);</script>

Unicode Escape sequence encoded alert

Answer: The javascript will execute.

同8,函数名alert属于标识符,直接被JS执行

11

<script>\u0061\u006c\u0065\u0072\u0074\u0028\u0031\u0031\u0029</script>

Unicode Escape sequence encoded alert(11)

Answer: The javascript will NOT execute.

同8,不解释

12

<script>\u0061\u006c\u0065\u0072\u0074(\u0031\u0032)</script>

Unicode Escape sequence encoded alert and 12

Answer: The javascript will NOT execute.

这里看似将没毛病,但是这里\u0031\u0032在解码的时候会被解码为字符串12,注意是字符串,不是数字,文字显然是需要引号的,JS执行失败

13

<script>alert('13\u0027)</script>

Unicode escape sequence encoded '

Answer: The javascript will NOT execute.

同8

14

<script>alert('14\u000a')</script>

Unicode escape sequence encoded line feed.

Answer: The javascript will execute.

\u000a在JavaScript里是换行,就是\n,直接执行

Java菜鸡才知道在Java里\u000a是换行,相当于在源码里直接按一下回车键,后面的代码都换行了

ECMAScript differs from the Java programming language in the behaviour of Unicode escape sequences. In a Java program, if the Unicode escape sequence \u000A, for example, occurs within a single-line comment, it is interpreted as a line terminator (Unicode code point U+000A is LINE FEED (LF)) and therefore the next code point is not part of the comment. Similarly, if the Unicode escape sequence \u000A occurs within a string literal in a Java program, it is likewise interpreted as a line terminator, which is not allowed within a string literal—one must write \n instead of \u000A to cause a LINE FEED (LF) to be part of the String value of a string literal. In an ECMAScript program, a Unicode escape sequence occurring within a comment is never interpreted and therefore cannot contribute to termination of the comment. Similarly, a Unicode escape sequence occurring within a string literal in an ECMAScript program always contributes to the literal and is never interpreted as a line terminator or as a code point that might terminate the string literal.

from https://www.ecma-international.org/ecma-262/10.0/index.html#sec-ecmascript-language-source-code

Bonus

15

<a href="&#x6a;&#x61;&#x76;&#x61;&#x73;&#x63;&#x72;&#x69;&#x70;&#x74;&#x3a;&#x25;&#x35;&#x63;&#x25;&#x37;&#x35;&#x25;&#x33;&#x30;&#x25;&#x33;&#x30;&#x25;&#x33;&#x36;&#x25;&#x33;&#x31;&#x25;&#x35;&#x63;&#x25;&#x37;&#x35;&#x25;&#x33;&#x30;&#x25;&#x33;&#x30;&#x25;&#x33;&#x36;&#x25;&#x36;&#x33;&#x25;&#x35;&#x63;&#x25;&#x37;&#x35;&#x25;&#x33;&#x30;&#x25;&#x33;&#x30;&#x25;&#x33;&#x36;&#x25;&#x33;&#x35;&#x25;&#x35;&#x63;&#x25;&#x37;&#x35;&#x25;&#x33;&#x30;&#x25;&#x33;&#x30;&#x25;&#x33;&#x37;&#x25;&#x33;&#x32;&#x25;&#x35;&#x63;&#x25;&#x37;&#x35;&#x25;&#x33;&#x30;&#x25;&#x33;&#x30;&#x25;&#x33;&#x37;&#x25;&#x33;&#x34;&#x28;&#x31;&#x35;&#x29;"></a>

Answer: The javascript will execute.

先HTML解码,得到

<a href="javascript:%5c%75%30%30%36%31%5c%75%30%30%36%63%5c%75%30%30%36%35%5c%75%30%30%37%32%5c%75%30%30%37%34(15)"></a>

在href中由URL模块处理,解码得到

javascript:\u0061\u006c\u0065\u0072\u0074(15)

识别JS协议,然后由JS模块处理,解码得到

javascript:alert(15)

最后被执行

总结

  1. <script><style>数据只能有文本,不会有HTML解码和URL解码操作

  2. <textarea><title>里会有HTML解码操作,但不会有子元素

  3. 其他元素数据(如div)和元素属性数据(如href)中会有HTML解码操作

  4. 部分属性(如href)会有URL解码操作,但URL中的协议需为ASCII

  5. JavaScript会对字符串和标识符Unicode解码

根据浏览器的自动解码,反向构造 XSS Payload 即可

转自先知社区

打赏我,让我更有动力~

0 条回复   |  直到 2019-8-15 | 1110 次浏览
登录后才可发表内容
返回顶部 投诉反馈

© 2016 - 2024 掌控者 All Rights Reserved.