-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Expand file tree
/
Copy pathReDoS.qhelp
More file actions
70 lines (66 loc) · 2.78 KB
/
ReDoS.qhelp
File metadata and controls
70 lines (66 loc) · 2.78 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
<!DOCTYPE qhelp PUBLIC
"-//Semmle//qhelp//EN"
"qhelp.dtd">
<qhelp>
<overview>
<p>
Some regular expressions take a very long time to match certain input strings to the point where
the time it takes to match a string of length <i>n</i> is proportional to <i>2<sup>n</sup></i>.
Such regular expressions can negatively affect performance, or even allow a malicious user to
perform a Denial of Service ("DoS") attack by crafting an expensive input string for the regular
expression to match.
</p>
<p>
The regular expression engines provided by many popular JavaScript platforms use backtracking
non-deterministic finite automata to implement regular expression matching. While this approach
is space-efficient and allows supporting advanced features like capture groups, it is not
time-efficient in general. The worst-case time complexity of such an automaton can be exponential,
meaning that for strings of a certain shape, increasing the input length by ten characters may
make the automaton about 1000 times slower.
</p>
<p>
Typically, a regular expression is affected by this problem if it contains a repetition of the
form <code>r*</code> or <code>r+</code> where the sub-expression <code>r</code> is ambiguous in
the sense that it can match some string in multiple ways. More information about the precise
circumstances can be found in the references.
</p>
</overview>
<recommendation>
<p>
Modify the regular expression to remove the ambiguity.
</p>
</recommendation>
<example>
<p>
Consider this regular expression:
</p>
<sample language="javascript">
/^_(__|.)+_$/
</sample>
<p>
Its sub-expression <code>"(__|.)+?"</code> can match the string <code>"__"</code> either by the
first alternative <code>"__"</code> to the left of the <code>"|"</code> operator, or by two
repetitions of the second alternative <code>"."</code> to the right. Thus, a string consisting
of an odd number of underscores followed by some other character will cause the regular
expression engine to run for an exponential amount of time before rejecting the input.
</p>
<p>
This problem can be avoided by rewriting the regular expression to remove the ambiguity between
the two branches of the alternative inside the repetition:
</p>
<sample language="javascript">
/^_(__|[^_])+_$/
</sample>
</example>
<references>
<li>
OWASP:
<a href="https://www.owasp.org/index.php/Regular_expression_Denial_of_Service_-_ReDoS">Regular expression Denial of Service - ReDoS</a>.
</li>
<li>Wikipedia: <a href="https://en.wikipedia.org/wiki/ReDoS">ReDoS</a>.</li>
<li>Wikipedia: <a href="https://en.wikipedia.org/wiki/Time_complexity">Time complexity</a>.</li>
<li>James Kirrage, Asiri Rathnayake, Hayo Thielecke:
<a href="http://www.cs.bham.ac.uk/~hxt/research/reg-exp-sec.pdf">Static Analysis for Regular Expression Denial-of-Service Attack</a>.
</li>
</references>
</qhelp>