Skip to content

Sensitive Information - SSN Detection#2208

Open
SzymonDrosdzol wants to merge 26 commits into
google:mainfrom
doyensec:US-SSN
Open

Sensitive Information - SSN Detection#2208
SzymonDrosdzol wants to merge 26 commits into
google:mainfrom
doyensec:US-SSN

Conversation

@SzymonDrosdzol

Copy link
Copy Markdown

This PR introduces a first detector for sensitive information. It uses the sensitiveInformation/common/simpleregex to detect Social Security Numbers.

As it is a first entry using the sensitive information simple regex, I had to introduce some changes and patterns. I'll highlight them below in code comments.

SensitiveInformationDetectors = initMapFromVelesPlugins([]velesPlugin{
{ssn.NewDetector(), "sensitiveinformation/ssn", 0},
})

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created a new collection for sensitiveinformation plugins

type Detector struct {
// The maximum length of the sensitive information.
maxLen uint32
MaxLen uint32

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified the Detector struct to have all the properties public

Sensitivity: sensitiveinformation.SensitivityLevelModerate,
},
Likelihood: sensitiveinformation.LikelihoodLikely,
Raw: bytes.Clone(b),

@SzymonDrosdzol SzymonDrosdzol Jun 11, 2026

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Up for discussion:
Assigning the incoming b byte slice directly to the Raw property breaks tests. This happens because slices share underlying memory and any subsequent modifications to b by the detector will also alter the Raw value.

Given that we need to Clone the bytes anyway, maybe we could store strings instead of byte arrays in the SensitiveInformation struct?

func NewDetector() veles.Detector {
return simpleregex.Detector{
MaxLen: maxSecretLength,
Re: ssnRe,

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the format of SSNs is pretty distinctive, we decided against using additional KeywordsRe filtering.
Open to changing our mind.

@SzymonDrosdzol SzymonDrosdzol marked this pull request as ready for review June 11, 2026 15:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant