INDEX
Explanations
references to nuclear weapons and related incidents
New Auto-Interp
Negative Logits
band
-0.16
newItem
-0.15
zy
-0.14
olie
-0.14
jb
-0.14
obe
-0.14
Grim
-0.14
jong
-0.13
band
-0.13
affection
-0.13
POSITIVE LOGITS
ovaly
0.14
aversable
0.14
ëģ¼
0.14
ssp
0.14
-transitional
0.14
errat
0.14
parity
0.14
pollo
0.13
empo
0.13
oky
0.13
Activations Density 0.022%