HTTP CSRF token by alessandro-Doyensec · Pull Request #2182 · google/osv-scalibr

alessandro-Doyensec · 2026-06-03T12:37:40Z

This PR adds the logic to support CSRF token detection in:

JSON/source code
Logs
HTTP Dumps
HTML

Depends on:

HTTP Bearer Detector #2126

Note

I added real source code files for the true-negative tests and cited their sources. I can remove them if there are any licensing issues.

…o conversion logic; add: csrf token to supported inventory types

hanqiuzh · 2026-06-08T20:31:47Z

+	regexp.MustCompile(`(?im)^(?:x-)?(?:csrf|xsrf)(?:[a-z0-9_.-]*token):\s+([a-zA-Z0-9+/=_-]{16,128})\b`),
+
+	// HTML Tag: 'name' comes before 'value'.
+	regexp.MustCompile(`(?i)<input[^>]+name=["'][^"'>]*(?:csrf|xsrf)[^"'>]*["'][^>]+value=["']([a-zA-Z0-9+/=_-]{16,128})["']`),


Why [^"'>]* before and after (?:csrf|xsrf)? What are some sample names? [^"'>]* seems a bit broad, and the first 2 regex are using (?:x-)? for prefix, which is different from the target here.

Why [^"'>]* before and after (?:csrf|xsrf)? What are some sample names?

[^"'>]* matches any character. The reasoning behind it is that inside the HTML we have more context, so I thought of using a broader regex to match unknown patterns (since there isn't a clear standard on how to store CSRF tokens inside HTML). A few examples of names are:

csrf

csrfmiddlewaretoken

csrf_token_form

csrf_token

_csrf

csrf_cookie

csrf.token

the first 2 regex are using (?:x-)? for prefix, which is different from the target here.

This was already included in the [^"'>]* bit, which allows for any character to precede the csrf or xsrf keywords.

That said I agree that I can definitely make the regex stricter and add these examples as testcases

Thanks. Yes, I feel make the regex strict is better here:

There is no validation in general, so better to reduce false positives by using the context.

When using relaxed regex, it's possible we match something unrelated/not a CSRF token (csrf as part of some acronym in input field).

hanqiuzh · 2026-06-08T20:33:09Z

+	// Quoted key value pairs (Logs, JSON, Configs, standard variables).
+	//
+	// Note: the value must be contained inside `'` or `"` to reduce false positive in case of a variable assignment in source code
+	regexp.MustCompile(`(?i)(?:x-)?(?:csrf|xsrf)(?:[a-z0-9_.-]*token)["']?\s*[:=]\s*["']([a-zA-Z0-9+/=_-]{16,128})["']`),


For (?:[a-z0-9_.-]*token) , what are some targeting keywords?

Here's a few examples:

csrf-token

csrfmiddlewaretoken

csrf_form_token

csrf.token (which may happen in cases like this: https://grep.app/adbar/trafilatura/master/tests/eval/iwr.de.IWRpressedienst.Nordex.html?q=%22csrf.token%22#L27)

alessandro-Doyensec · 2026-06-09T09:46:44Z

Hi @hanqiuzh

Thanks for the review, I left some replies for your comments

alessandro-Doyensec · 2026-06-10T09:22:34Z

Hello @hanqiuzh

The regexes should be aligned with your comments now. I still decided to use \w* in the HTML patterns to match more edge cases. Feel free to suggest any changes you may deem more correct.

Thanks in advance!

hanqiuzh · 2026-06-10T13:37:17Z

+	regexp.MustCompile(`(?im)^(?:x[-_])?(?:csrf|xsrf)(?:[-_]?middleware)?(?:[-_]?token)?:\s+([a-zA-Z0-9+/=_-]{16,128})\b`),
+
+	// HTML Tag: 'name' comes before 'value'.
+	regexp.MustCompile(`(?i)<input[^>]+name=["'][\w-]*(?:csrf|xsrf)[\w-]*["'][^>]+value=["']([a-zA-Z0-9+/=_-]{16,128})["']`),


Thanks for updating the first two regex.
I'm still not sure about the two HTML check.

the [\w-]* keywords, would you mind to add some comments about what are the targeting patterns, and why we have to add wildcards before and after to capture it?

I'm thinking about what are the scenarios we are expecting html tag matching to work? I can imagine of two possible scenarios:
a. Log of all http requests (body + header mode), and we find the input fields in body
b. source code static html files - the input value will less likely be sensitive as it's likely just a placeholder? as the token should be generated dynamically (static won't help).
I haven't done much research on it, so it's hard to tell how many times we will end up in scenario b, but I feel b are most likely always false-positives. Again, the goal is to reduce false-positives as there is no validation. I guess one possible solution is that we can add a check to don't report some tokens base on keywords or entropy, but that could be additional efforts. Or maybe we don't capture the <input> if that's not feasible? WDYT?

Sure

Yes option b is probably the more common. I would argue that it might happen that csrf token are hardcoded in source files (even though they shouldn't).

Ideally the number of false positive should be reduced since the regex only matches for a base64 string contained inside doubles quotes (so no template placeholder should be matched).

It might catch false positives in cases such as documentation or example files, see example:

How to write a form: ```html <form method="POST" action="/my-account/change-email"> <input name="csrf" value="AbCdEf123456"> <input name="email" value="victim@email.com"> </form> ```

Or maybe we don't capture the if that's not feasible? WDYT?

The benefits of finding a CSRF token inside an HTML file are little regardless, so we can probably remove html detection altoghether

Thanks. yes, if b is more common, and if there is no good ways to detect obvious false-positives (e.g. sample/placeholder), let's just don't include the HTML matching for now? and add some comments at top for the potential limitation. 👍

alessandro-Doyensec · 2026-06-11T09:48:34Z

Hi @hanqiuzh

Every conversation should be resolved now. Thanks again for the review!

alessandro-Doyensec · 2026-06-11T11:13:45Z

Note:

I've also modified the "Quoted key value pairs (Logs, JSON, Configs, standard variables)" pattern to not match variable assignment which may leak in the same places where HTML bodies might leak

hanqiuzh

LGTM

hanqiuzh · 2026-06-11T16:43:09Z

@@ -0,0 +1,1556 @@
+'use strict';


This file has a lot of lint errors as it's a js file, and blocking the PR from submitted. Can this file be simplified to only include the lines we want to test?
Some errors:
Using var (prefer const or let).
Bad type annotation. for /** @this */

hanqiuzh · 2026-06-11T16:45:07Z

@@ -0,0 +1,79 @@
+import reducer, { fetchApi, initialState } from "./apiSlice";
+import { configureStore } from "@reduxjs/toolkit";


@reduxjs/toolkit failed linter as no dependencies for this js file, maybe comment this line or simplify this file?

hanqiuzh · 2026-06-11T16:51:35Z

@@ -0,0 +1,12 @@
+async function get<T>(


failed linter. Is this file suppose to be typescript? same for the promise<>

alessandro-Doyensec · 2026-06-11T17:22:08Z

Hello @hanqiuzh

All of the files under veles/secrets/http/testdata/src/** have been copy-pasted from public projects, their sources are listed here:

https://github.com/doyensec/osv-scalibr/blob/523e0c6ff680adac0617919a859cbd41c0586edc/veles/secrets/http/csrf_test.go#L169

They've been helpful to reduce false positives during development, but if they cause troubles I can remove them.

hanqiuzh · 2026-06-11T17:28:14Z

Hello @hanqiuzh

All of the files under veles/secrets/http/testdata/src/** have been copy-pasted from public projects, their sources are listed here:

https://github.com/doyensec/osv-scalibr/blob/523e0c6ff680adac0617919a859cbd41c0586edc/veles/secrets/http/csrf_test.go#L169

They've been helpful to reduce false positives during development, but if they cause troubles I can remove them.

Yes, unfortunately they are failing the linter on our side as they are js files. Agree these are good test cases. Maybe change the test cases into the go test directly with the line interested, or remove them 😢 .

alessandro-Doyensec · 2026-06-11T17:47:43Z

Hi @hanqiuzh

Every linting error should be resolved now as I removed the src code files.

alessandro-Doyensec added 4 commits June 8, 2026 19:56

init

91fc414

add: nginx log csrf token

d1f95ec

edit: remove csrf_token to reduce false positive rate

e9a392d

add: real source code files

41a5f6f

alessandro-Doyensec force-pushed the http-csrf-token branch from a25bba5 to 9621f6d Compare June 8, 2026 17:57

alessandro-Doyensec added 3 commits June 8, 2026 20:02

add: plugin registration; add: secret to scan result proto; add: prot…

837c741

…o conversion logic; add: csrf token to supported inventory types

edit: make http dumps and logs and quoted assignemnt key stricter

073587e

fix: remove tab before CSRF header in dumps

0a7cd7d

alessandro-Doyensec force-pushed the http-csrf-token branch from 9621f6d to 0a7cd7d Compare June 8, 2026 18:03

hanqiuzh reviewed Jun 8, 2026

View reviewed changes

alessandro-Doyensec added 2 commits June 10, 2026 11:19

edit: make regexes stricter

3f651af

edit: fix regex to exclude csrf__ as a valid value

a1a34bd

hanqiuzh reviewed Jun 10, 2026

View reviewed changes

edit: remove html tag detection

73a9b3a

edit: remove csrf_token assignments

523e0c6

hanqiuzh approved these changes Jun 11, 2026

View reviewed changes

hanqiuzh reviewed Jun 11, 2026

View reviewed changes

edit: remove src file to avoid linting errors

0316837

		@@ -0,0 +1,79 @@
		import reducer, { fetchApi, initialState } from "./apiSlice";
		import { configureStore } from "@reduxjs/toolkit";

Conversation

alessandro-Doyensec commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alessandro-Doyensec Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alessandro-Doyensec commented Jun 9, 2026

Uh oh!

alessandro-Doyensec commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hanqiuzh Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alessandro-Doyensec commented Jun 11, 2026

Uh oh!

alessandro-Doyensec commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hanqiuzh left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alessandro-Doyensec commented Jun 11, 2026

Uh oh!

hanqiuzh commented Jun 11, 2026

Uh oh!

alessandro-Doyensec commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

alessandro-Doyensec commented Jun 3, 2026 •

edited

Loading

alessandro-Doyensec Jun 9, 2026 •

edited

Loading

alessandro-Doyensec commented Jun 10, 2026 •

edited

Loading

hanqiuzh Jun 10, 2026 •

edited

Loading

alessandro-Doyensec commented Jun 11, 2026 •

edited

Loading