← Back to slim.io
Detection · Scoring + LLM Assist

Context-aware scoring
with LLM Assist

Base confidence per entity type, boosted or penalized by surrounding keywords. SSN near "social security" scores 0.95. The same pattern near "product code" drops to 0.42 and gets skipped. Threshold: 0.6 minimum to flag. An optional LLM Assist pass then removes false positives before any action fires.

Overview

The confidence score

Every potential detection is assigned a score from 0.0 to 1.0. Only detections that meet or exceed the minimum threshold of 0.6 are flagged. Actions (masking, tokenizing, blocking) only trigger above this line.

Score range
0.0
0.6 threshold
1.0
Below threshold: ignored
Score < 0.6. No action taken, no log entry produced.
At or above threshold: flagged
Score ≥ 0.6. Policy is evaluated and the configured action fires.
Step 1

Base confidence

When a pattern matches, each entity type starts with a base confidence score reflecting how reliably that pattern identifies genuine sensitive data. Structural validators (the Luhn algorithm for credit cards, check-digit math for IBANs) can push base confidence to 0.90 or higher before any contextual analysis occurs.

Entity Base Confidence Validator
SSN 0.80 Pattern only
Credit Card 0.90 Luhn + BIN range
IBAN 0.88 Country check digit
Email 0.85 RFC pattern
Phone 0.72 Format matching
MRN 0.78 Pattern only
Step 2

Context scoring

The text surrounding a match is analyzed for keywords that signal whether the match is genuine sensitive data or an incidental pattern hit. Nearby keywords boost the score; terms associated with sample, test, or non-sensitive contexts apply a penalty. The same raw pattern can land on opposite sides of the threshold depending on context.

Example A: Boosted
"Please update the patient's
social security number: 078-05-1120
in the enrollment form."
Base score · 0.80
Context boost + +0.15 ("social security number")
Final score 0.95 ✓ FLAGGED
Example B: Penalized
"Use product code 078-05-1120
when referencing this item
in the sample catalog."
Base score · 0.80
Context penalty −0.30 ("product code", "sample")
Final score 0.50 ✗ SKIPPED
BOOST KEYWORDS
ssn social security taxpayer id national id patient id member id account number card number date of birth dob
PENALTY KEYWORDS
sample test example product code order number placeholder dummy lorem reference demo
Step 3

Format validation

For entity types where the data format has mathematically verifiable properties, slim.io runs deterministic validators in addition to pattern matching. These validators confirm structural integrity and substantially reduce false positives from random digit sequences.

Luhn algorithm
Credit Cards & SINs
Validates the trailing check digit against a weighted sum of the preceding digits. Eliminates roughly 90% of false positives from random number sequences that happen to match a card number pattern.
BIN range matching
Credit Cards
Cross-references the Bank Identification Number (first 6 digits) against known issuer ranges. Visa starts with 4, Mastercard 51–55, Amex 34 or 37, Discover 6011 / 65.
IBAN check digits
IBANs
Validates the two-digit check code using mod-97 arithmetic as specified in ISO 13616. Country-specific length rules are also enforced: a DE IBAN is always 22 characters, a GB IBAN 22, and so on.
SIN Luhn
Canadian SINs
Canadian Social Insurance Numbers pass through the same Luhn check digit validation applied to credit cards, providing format-level verification independent of pattern matching alone.
Outcome

What happens at the threshold

Once a final score is computed, the outcome is binary: the entity is either flagged and routed through your policy, or it is silently dropped with no side effects.

Flagged
Score ≥ 0.6
  • Entity is identified and its position is recorded
  • Configured policy rules are evaluated against the entity type
  • Action fires: mask, hash, tokenize, redact, or block
  • Detection is written to the audit log
×
Skipped
Score < 0.6
  • Entity is discarded with no action taken
  • No policy evaluation occurs
  • No log entry is written
  • Original data passes through unchanged
Optional · Step 4

LLM Assist review

When enabled, slim.io runs a second pass after pattern-based scoring. An LLM reviews each flagged entity in context and returns a verdict of true_positive or false_positive. False positives are removed before any action fires.

How it works
1
Pattern matching produces a scored candidate set. All entities at or above the 0.6 threshold are queued for review.
2
Each entity value is tokenized before the LLM sees it. The model receives the surrounding context and a type-tagged placeholder, never the raw sensitive value.
3
The LLM returns a verdict for each entity. Detections marked false_positive are dropped. The remaining set proceeds to policy and action.
4
The stage is fail-open: if the LLM endpoint errors or times out, all findings are preserved and forwarded unchanged.
Configuration
llm_assist_enabled true
tokenize_before_llm true
tokenization_mode type_tagged / fpe
batch_size 50
max_concurrency 4
timeout_per_finding_s 5
Privacy guarantee: Raw entity values are never sent to an LLM. The review operates on [SSN]-style tags and surrounding context only.
opt-in LLM Assist is disabled by default. Enable it per scan config when you need lower false-positive rates at the cost of a small latency increase per batch.