INDEX
Explanations
references to Japan or Japanese culture
mentions of the word "Japanese."
New Auto-Interp
Negative Logits
fy
-0.69
gat
-0.68
Lyons
-0.67
uliffe
-0.66
Alexandria
-0.63
allah
-0.63
Bran
-0.62
Brees
-0.62
din
-0.62
alla
-0.61
POSITIVE LOGITS
Japanese
3.69
Japanese
3.35
Japan
2.55
Japan
2.55
Taiwanese
2.44
Chinese
2.21
Korean
2.21
Tokyo
2.15
Indonesian
2.06
Filipino
2.05
Activations Density 0.012%