INDEX
Explanations
phrases related to rules, regulation, and prohibition
terms related to restrictions or bans
New Auto-Interp
Negative Logits
eah
-0.81
eon
-0.74
neau
-0.68
visors
-0.68
Norn
-0.67
etics
-0.67
prise
-0.66
igma
-0.66
itched
-0.66
tone
-0.66
POSITIVE LOGITS
prohibited
0.80
bidden
0.77
forbids
0.76
prohibition
0.76
nudity
0.76
substances
0.74
interfere
0.73
dylib
0.72
forbidden
0.72
prohibit
0.71
Activations Density 0.053%