INDEX
Explanations
instances where an action is prohibited or restricted
terms related to restrictions or bans
New Auto-Interp
Negative Logits
eon
-0.74
arger
-0.74
temp
-0.72
along
-0.69
Roy
-0.67
illation
-0.67
LV
-0.66
ensional
-0.65
Transform
-0.65
PE
-0.64
POSITIVE LOGITS
prohibited
1.28
etheless
1.05
forbidden
0.98
bidden
0.90
forbids
0.85
prohibits
0.85
wana
0.81
avorite
0.80
permitted
0.79
tradem
0.78
Activations Density 0.012%