INDEX
Explanations
mentions of bans or prohibitions
New Auto-Interp
Negative Logits
IMAGES
-0.70
Generations
-0.67
Veter
-0.65
mberg
-0.63
eah
-0.63
Remastered
-0.62
lycer
-0.61
eon
-0.61
prise
-0.61
rious
-0.59
POSITIVE LOGITS
ishment
1.02
zai
0.83
hee
0.82
hammer
0.82
tering
0.81
unal
0.80
ish
0.77
viol
0.75
ishing
0.74
idding
0.73
Activations Density 0.660%