INDEX
Explanations
references to political figures and events in East Asia
New Auto-Interp
Negative Logits
toe
-0.70
Pwr
-0.67
Normandy
-0.64
Reck
-0.63
COVER
-0.63
sburgh
-0.63
JUSTICE
-0.63
endez
-0.62
ally
-0.62
Scotland
-0.62
POSITIVE LOGITS
Yuan
1.33
jiang
1.31
Jing
1.30
jing
1.25
Zhu
1.21
wei
1.21
jin
1.21
Xin
1.21
Xiao
1.19
zhou
1.18
Activations Density 1.903%