Skip to content

HTTP CSRF token#2182

Open
alessandro-Doyensec wants to merge 12 commits into
google:mainfrom
doyensec:http-csrf-token
Open

HTTP CSRF token#2182
alessandro-Doyensec wants to merge 12 commits into
google:mainfrom
doyensec:http-csrf-token

Conversation

@alessandro-Doyensec

@alessandro-Doyensec alessandro-Doyensec commented Jun 3, 2026

Copy link
Copy Markdown
Collaborator

This PR adds the logic to support CSRF token detection in:

  • JSON/source code
  • Logs
  • HTTP Dumps
  • HTML

Depends on:


Note

I added real source code files for the true-negative tests and cited their sources. I can remove them if there are any licensing issues.

Comment thread veles/secrets/http/csrf.go Outdated
regexp.MustCompile(`(?im)^(?:x-)?(?:csrf|xsrf)(?:[a-z0-9_.-]*token):\s+([a-zA-Z0-9+/=_-]{16,128})\b`),

// HTML Tag: 'name' comes before 'value'.
regexp.MustCompile(`(?i)<input[^>]+name=["'][^"'>]*(?:csrf|xsrf)[^"'>]*["'][^>]+value=["']([a-zA-Z0-9+/=_-]{16,128})["']`),

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why [^"'>]* before and after (?:csrf|xsrf)? What are some sample names? [^"'>]* seems a bit broad, and the first 2 regex are using (?:x-)? for prefix, which is different from the target here.

@alessandro-Doyensec alessandro-Doyensec Jun 9, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why [^"'>]* before and after (?:csrf|xsrf)? What are some sample names?

[^"'>]* matches any character. The reasoning behind it is that inside the HTML we have more context, so I thought of using a broader regex to match unknown patterns (since there isn't a clear standard on how to store CSRF tokens inside HTML). A few examples of names are:

  • csrf
  • csrfmiddlewaretoken
  • csrf_token_form
  • csrf_token
  • _csrf
  • csrf_cookie
  • csrf.token

the first 2 regex are using (?:x-)? for prefix, which is different from the target here.

This was already included in the [^"'>]* bit, which allows for any character to precede the csrf or xsrf keywords.


That said I agree that I can definitely make the regex stricter and add these examples as testcases

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Yes, I feel make the regex strict is better here:

  1. There is no validation in general, so better to reduce false positives by using the context.
  2. When using relaxed regex, it's possible we match something unrelated/not a CSRF token (csrf as part of some acronym in input field).

Comment thread veles/secrets/http/csrf.go Outdated
// Quoted key value pairs (Logs, JSON, Configs, standard variables).
//
// Note: the value must be contained inside `'` or `"` to reduce false positive in case of a variable assignment in source code
regexp.MustCompile(`(?i)(?:x-)?(?:csrf|xsrf)(?:[a-z0-9_.-]*token)["']?\s*[:=]\s*["']([a-zA-Z0-9+/=_-]{16,128})["']`),

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For (?:[a-z0-9_.-]*token) , what are some targeting keywords?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a few examples:

@alessandro-Doyensec

Copy link
Copy Markdown
Collaborator Author

Hi @hanqiuzh

Thanks for the review, I left some replies for your comments

@alessandro-Doyensec

alessandro-Doyensec commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator Author

Hello @hanqiuzh

The regexes should be aligned with your comments now. I still decided to use \w* in the HTML patterns to match more edge cases. Feel free to suggest any changes you may deem more correct.

Thanks in advance!

Comment thread veles/secrets/http/csrf.go Outdated
regexp.MustCompile(`(?im)^(?:x[-_])?(?:csrf|xsrf)(?:[-_]?middleware)?(?:[-_]?token)?:\s+([a-zA-Z0-9+/=_-]{16,128})\b`),

// HTML Tag: 'name' comes before 'value'.
regexp.MustCompile(`(?i)<input[^>]+name=["'][\w-]*(?:csrf|xsrf)[\w-]*["'][^>]+value=["']([a-zA-Z0-9+/=_-]{16,128})["']`),

@hanqiuzh hanqiuzh Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating the first two regex.
I'm still not sure about the two HTML check.

  1. the [\w-]* keywords, would you mind to add some comments about what are the targeting patterns, and why we have to add wildcards before and after to capture it?
  2. I'm thinking about what are the scenarios we are expecting html tag matching to work? I can imagine of two possible scenarios:
    a. Log of all http requests (body + header mode), and we find the input fields in body
    b. source code static html files - the input value will less likely be sensitive as it's likely just a placeholder? as the token should be generated dynamically (static won't help).
    I haven't done much research on it, so it's hard to tell how many times we will end up in scenario b, but I feel b are most likely always false-positives. Again, the goal is to reduce false-positives as there is no validation. I guess one possible solution is that we can add a check to don't report some tokens base on keywords or entropy, but that could be additional efforts. Or maybe we don't capture the <input> if that's not feasible? WDYT?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Sure

  2. Yes option b is probably the more common. I would argue that it might happen that csrf token are hardcoded in source files (even though they shouldn't).

    Ideally the number of false positive should be reduced since the regex only matches for a base64 string contained inside doubles quotes (so no template placeholder should be matched).

    It might catch false positives in cases such as documentation or example files, see example:

    How to write a form:
    
    ```html
    <form method="POST" action="/my-account/change-email">
      <input name="csrf" value="AbCdEf123456">
      <input name="email" value="victim@email.com">
    </form>
    ```

    Or maybe we don't capture the if that's not feasible? WDYT?

    The benefits of finding a CSRF token inside an HTML file are little regardless, so we can probably remove html detection altoghether

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. yes, if b is more common, and if there is no good ways to detect obvious false-positives (e.g. sample/placeholder), let's just don't include the HTML matching for now? and add some comments at top for the potential limitation. 👍

@alessandro-Doyensec

Copy link
Copy Markdown
Collaborator Author

Hi @hanqiuzh

Every conversation should be resolved now. Thanks again for the review!

@alessandro-Doyensec

alessandro-Doyensec commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator Author

Note:

I've also modified the "Quoted key value pairs (Logs, JSON, Configs, standard variables)" pattern to not match variable assignment which may leak in the same places where HTML bodies might leak

@hanqiuzh hanqiuzh left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment thread veles/secrets/http/testdata/src/http.js Outdated
@@ -0,0 +1,1556 @@
'use strict';

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file has a lot of lint errors as it's a js file, and blocking the PR from submitted. Can this file be simplified to only include the lines we want to test?
Some errors:
Using var (prefer const or let).
Bad type annotation. for /** @this */

@@ -0,0 +1,79 @@
import reducer, { fetchApi, initialState } from "./apiSlice";
import { configureStore } from "@reduxjs/toolkit";

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reduxjs/toolkit failed linter as no dependencies for this js file, maybe comment this line or simplify this file?

Comment thread veles/secrets/http/testdata/src/test.js Outdated
@@ -0,0 +1,12 @@
async function get<T>(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

failed linter. Is this file suppose to be typescript? same for the promise<>

@alessandro-Doyensec

Copy link
Copy Markdown
Collaborator Author

Hello @hanqiuzh

All of the files under veles/secrets/http/testdata/src/** have been copy-pasted from public projects, their sources are listed here:

https://github.com/doyensec/osv-scalibr/blob/523e0c6ff680adac0617919a859cbd41c0586edc/veles/secrets/http/csrf_test.go#L169

They've been helpful to reduce false positives during development, but if they cause troubles I can remove them.

@hanqiuzh

Copy link
Copy Markdown
Collaborator

Hello @hanqiuzh

All of the files under veles/secrets/http/testdata/src/** have been copy-pasted from public projects, their sources are listed here:

https://github.com/doyensec/osv-scalibr/blob/523e0c6ff680adac0617919a859cbd41c0586edc/veles/secrets/http/csrf_test.go#L169

They've been helpful to reduce false positives during development, but if they cause troubles I can remove them.

Yes, unfortunately they are failing the linter on our side as they are js files. Agree these are good test cases. Maybe change the test cases into the go test directly with the line interested, or remove them 😢 .

@alessandro-Doyensec

Copy link
Copy Markdown
Collaborator Author

Hi @hanqiuzh

Every linting error should be resolved now as I removed the src code files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants