INDEX
Explanations
references to Japanese and Korean entities or culture
New Auto-Interp
Negative Logits
Con
-0.34
ton
-0.30
con
-0.29
co
-0.29
le
-0.29
ndor
-0.29
ic
-0.28
formó
-0.28
he
-0.28
ro
-0.28
POSITIVE LOGITS
Japan
2.38
Japanese
2.25
Japan
2.20
japan
2.16
JAPAN
2.14
Japanese
2.06
Japón
1.98
japanese
1.95
Japon
1.91
japan
1.88
Activations Density 0.130%