INDEX
Explanations
references to weapons and military technology
New Auto-Interp
Negative Logits
hir
-0.16
uta
-0.15
ease
-0.15
xies
-0.15
.Experimental
-0.15
getManager
-0.15
836
-0.14
gers
-0.14
inges
-0.14
eing
-0.14
POSITIVE LOGITS
mith
0.20
etros
0.15
rest
0.15
reste
0.15
éĥ¨
0.14
olen
0.14
adal
0.14
دÙĩ
0.14
mando
0.14
398
0.14
Activations Density 0.014%