INDEX
Explanations
mentions of bans on various topics or items
references to bans or prohibitions
New Auto-Interp
Negative Logits
Generations
-0.78
IMAGES
-0.70
rious
-0.69
lycer
-0.66
Temper
-0.66
Io
-0.65
Barg
-0.63
Sea
-0.63
Editors
-0.63
PROG
-0.62
POSITIVE LOGITS
ishment
1.27
hammer
1.09
hee
0.89
nered
0.87
ishing
0.86
ish
0.85
jo
0.82
zai
0.82
tering
0.82
icip
0.82
Activations Density 0.017%