INDEX
Explanations
references to Japan and Japanese culture
New Auto-Interp
Negative Logits
::::::::
-0.52
bri
-0.52
Sto
-0.51
des
-0.50
undu
-0.50
את
-0.50
Красно
-0.48
bed
-0.47
ter
-0.45
://
-0.44
POSITIVE LOGITS
Japan
1.97
Japan
1.84
Japanese
1.80
JAPAN
1.79
japan
1.74
Japanese
1.60
JAPAN
1.58
japanese
1.58
Japon
1.57
Japón
1.55
Activations Density 0.048%