INDEX

Explanations

modern LLMs

The neuron is tuned to spot privacy‐/security‐policy language—i.e. words naming protective or regulatory actions (protect, notify, respect, attempt, etc.).

New Auto-Interp

Configuration

Prompts (Dashboard)

392,802 prompts, 256 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

م

0.79

ので

0.76

在地

0.72

sta

0.71

不大

0.70

Các

0.70

Jeśli

0.68

STATE

0.67

リ

0.67

不太

0.66

POSITIVE LOGITS

 recomb

0.95

 protease

0.93

 reductase

0.91

 bioqu

0.86

 тща

0.82

 soliton

0.82

 vál

0.80

 эпо

0.80

 synchrotron

0.79

жнее

0.79

Activations Density 0.001%