INDEX
Explanations
names related to individuals in a political context, particularly involving North Korea
New Auto-Interp
Negative Logits
replay
-0.65
MOT
-0.64
multipl
-0.64
destro
-0.64
Digest
-0.63
macros
-0.62
REAL
-0.61
ditch
-0.61
defin
-0.61
ATTLE
-0.61
POSITIVE LOGITS
sung
1.16
jin
1.09
Yang
1.04
wei
1.04
Hong
1.03
kai
0.99
Yu
0.97
Yan
0.97
hun
0.97
Tai
0.96
Activations Density 0.040%