INDEX
Explanations
mentions of explosives and bomb-related terminology
New Auto-Interp
Negative Logits
leon
-0.17
'Ñı
-0.15
vis
-0.15
icum
-0.14
dere
-0.14
ioso
-0.14
ulo
-0.14
ợi
-0.14
tails
-0.14
ãģ¨ãģĨ
-0.14
POSITIVE LOGITS
alette
0.17
Explos
0.15
arial
0.15
stretch
0.14
Heard
0.14
culus
0.14
oden
0.14
.expand
0.13
epend
0.13
sole
0.13
Activations Density 0.150%