INDEX
Explanations
This neuron detects references to rules or restrictions (e.g., guidelines, policies, ethics, morality, filters).
New Auto-Interp
Negative Logits
asters
-0.07
-aos
-0.07
pek
-0.06
unions
-0.06
vite
-0.06
茂
-0.06
Midwest
-0.06
ランド
-0.06
MF
-0.06
show
-0.06
POSITIVE LOGITS
Vitamin
0.08
REFER
0.07
ISSUE
0.07
_demand
0.07
(contact
0.07
rients
0.07
athed
0.06
graceful
0.06
::*;↵
0.06
RoundedRectangleBorder
0.06
Activations Density 0.008%