HTTP CSRF token#2182
Conversation
a25bba5 to
9621f6d
Compare
…o conversion logic; add: csrf token to supported inventory types
9621f6d to
0a7cd7d
Compare
| regexp.MustCompile(`(?im)^(?:x-)?(?:csrf|xsrf)(?:[a-z0-9_.-]*token):\s+([a-zA-Z0-9+/=_-]{16,128})\b`), | ||
|
|
||
| // HTML Tag: 'name' comes before 'value'. | ||
| regexp.MustCompile(`(?i)<input[^>]+name=["'][^"'>]*(?:csrf|xsrf)[^"'>]*["'][^>]+value=["']([a-zA-Z0-9+/=_-]{16,128})["']`), |
There was a problem hiding this comment.
Why [^"'>]* before and after (?:csrf|xsrf)? What are some sample names? [^"'>]* seems a bit broad, and the first 2 regex are using (?:x-)? for prefix, which is different from the target here.
There was a problem hiding this comment.
Why [^"'>]* before and after (?:csrf|xsrf)? What are some sample names?
[^"'>]* matches any character. The reasoning behind it is that inside the HTML we have more context, so I thought of using a broader regex to match unknown patterns (since there isn't a clear standard on how to store CSRF tokens inside HTML). A few examples of names are:
- csrf
- csrfmiddlewaretoken
- csrf_token_form
- csrf_token
- _csrf
- csrf_cookie
- csrf.token
the first 2 regex are using (?:x-)? for prefix, which is different from the target here.
This was already included in the [^"'>]* bit, which allows for any character to precede the csrf or xsrf keywords.
That said I agree that I can definitely make the regex stricter and add these examples as testcases
There was a problem hiding this comment.
Thanks. Yes, I feel make the regex strict is better here:
- There is no validation in general, so better to reduce false positives by using the context.
- When using relaxed regex, it's possible we match something unrelated/not a CSRF token (
csrfas part of some acronym in input field).
| // Quoted key value pairs (Logs, JSON, Configs, standard variables). | ||
| // | ||
| // Note: the value must be contained inside `'` or `"` to reduce false positive in case of a variable assignment in source code | ||
| regexp.MustCompile(`(?i)(?:x-)?(?:csrf|xsrf)(?:[a-z0-9_.-]*token)["']?\s*[:=]\s*["']([a-zA-Z0-9+/=_-]{16,128})["']`), |
There was a problem hiding this comment.
For (?:[a-z0-9_.-]*token) , what are some targeting keywords?
There was a problem hiding this comment.
Here's a few examples:
- csrf-token
- csrfmiddlewaretoken
- csrf_form_token
- csrf.token (which may happen in cases like this: https://grep.app/adbar/trafilatura/master/tests/eval/iwr.de.IWRpressedienst.Nordex.html?q=%22csrf.token%22#L27)
|
Hi @hanqiuzh Thanks for the review, I left some replies for your comments |
|
Hello @hanqiuzh The regexes should be aligned with your comments now. I still decided to use Thanks in advance! |
| regexp.MustCompile(`(?im)^(?:x[-_])?(?:csrf|xsrf)(?:[-_]?middleware)?(?:[-_]?token)?:\s+([a-zA-Z0-9+/=_-]{16,128})\b`), | ||
|
|
||
| // HTML Tag: 'name' comes before 'value'. | ||
| regexp.MustCompile(`(?i)<input[^>]+name=["'][\w-]*(?:csrf|xsrf)[\w-]*["'][^>]+value=["']([a-zA-Z0-9+/=_-]{16,128})["']`), |
There was a problem hiding this comment.
Thanks for updating the first two regex.
I'm still not sure about the two HTML check.
- the
[\w-]*keywords, would you mind to add some comments about what are the targeting patterns, and why we have to add wildcards before and after to capture it? - I'm thinking about what are the scenarios we are expecting html tag matching to work? I can imagine of two possible scenarios:
a. Log of all http requests (body + header mode), and we find the input fields in body
b. source code static html files - the input value will less likely be sensitive as it's likely just a placeholder? as the token should be generated dynamically (static won't help).
I haven't done much research on it, so it's hard to tell how many times we will end up in scenariob, but I feelbare most likely always false-positives. Again, the goal is to reduce false-positives as there is no validation. I guess one possible solution is that we can add a check to don't report some tokens base on keywords or entropy, but that could be additional efforts. Or maybe we don't capture the<input>if that's not feasible? WDYT?
There was a problem hiding this comment.
-
Sure
-
Yes option
bis probably the more common. I would argue that it might happen thatcsrftoken are hardcoded in source files (even though they shouldn't).Ideally the number of false positive should be reduced since the regex only matches for a base64 string contained inside doubles quotes (so no template placeholder should be matched).
It might catch false positives in cases such as documentation or example files, see example:
How to write a form: ```html <form method="POST" action="/my-account/change-email"> <input name="csrf" value="AbCdEf123456"> <input name="email" value="victim@email.com"> </form> ```
Or maybe we don't capture the if that's not feasible? WDYT?
The benefits of finding a CSRF token inside an HTML file are little regardless, so we can probably remove html detection altoghether
There was a problem hiding this comment.
Thanks. yes, if b is more common, and if there is no good ways to detect obvious false-positives (e.g. sample/placeholder), let's just don't include the HTML matching for now? and add some comments at top for the potential limitation. 👍
|
Hi @hanqiuzh Every conversation should be resolved now. Thanks again for the review! |
|
Note: I've also modified the |
| @@ -0,0 +1,1556 @@ | |||
| 'use strict'; | |||
There was a problem hiding this comment.
This file has a lot of lint errors as it's a js file, and blocking the PR from submitted. Can this file be simplified to only include the lines we want to test?
Some errors:
Using var (prefer const or let).
Bad type annotation. for /** @this */
| @@ -0,0 +1,79 @@ | |||
| import reducer, { fetchApi, initialState } from "./apiSlice"; | |||
| import { configureStore } from "@reduxjs/toolkit"; | |||
There was a problem hiding this comment.
@reduxjs/toolkit failed linter as no dependencies for this js file, maybe comment this line or simplify this file?
| @@ -0,0 +1,12 @@ | |||
| async function get<T>( | |||
There was a problem hiding this comment.
failed linter. Is this file suppose to be typescript? same for the promise<>
|
Hello @hanqiuzh All of the files under They've been helpful to reduce false positives during development, but if they cause troubles I can remove them. |
Yes, unfortunately they are failing the linter on our side as they are |
|
Hi @hanqiuzh Every linting error should be resolved now as I removed the src code files. |
This PR adds the logic to support CSRF token detection in:
Depends on:
Note
I added real source code files for the true-negative tests and cited their sources. I can remove them if there are any licensing issues.