INDEX
Explanations
World War II Japanese internment
New Auto-Interp
Negative Logits
abo
-0.12
ayi
-0.10
chast
-0.10
Kaplan
-0.09
paddle
-0.09
æ¯
-0.09
addle
-0.09
DIN
-0.08
Beam
-0.08
otre
-0.08
POSITIVE LOGITS
intern
0.23
Intern
0.19
Intern
0.18
intern
0.17
camps
0.15
Enemy
0.15
enemy
0.15
enemy
0.14
Japanese
0.14
Japanese
0.13
Activations Density 0.013%