INDEX
Explanations
mentions of bans on specific activities or objects
references to prohibitions or restrictions
New Auto-Interp
Negative Logits
IMAGES
-0.77
Generations
-0.76
lycer
-0.67
Temper
-0.67
Barg
-0.66
Apostles
-0.65
Editors
-0.65
¯¯
-0.63
Directions
-0.63
rious
-0.62
POSITIVE LOGITS
ishment
1.22
hammer
1.04
hee
0.96
ishing
0.93
tering
0.90
nered
0.90
zai
0.88
ish
0.87
ished
0.87
icip
0.84
Activations Density 0.026%