INDEX
Explanations
mentions of weapons
references to weapons
New Auto-Interp
Negative Logits
pace
-0.92
weet
-0.77
leep
-0.73
cess
-0.71
ilver
-0.70
aways
-0.67
gres
-0.67
agascar
-0.67
hu
-0.65
borough
-0.65
POSITIVE LOGITS
ized
1.09
ised
1.07
ry
1.06
izes
1.05
izer
1.02
ization
0.98
izations
0.93
iser
0.91
ises
0.91
isation
0.89
Activations Density 0.052%