INDEX
Explanations
mentions of the term "Japanese"
repeated references to "Japanese."
New Auto-Interp
Negative Logits
lain
-0.90
onies
-0.88
mble
-0.88
izons
-0.85
rost
-0.83
rent
-0.79
staking
-0.78
ithmetic
-0.77
̶
-0.76
chel
-0.76
POSITIVE LOGITS
yen
1.09
Yen
1.01
Japanese
0.81
imura
0.79
istani
0.79
oka
0.77
Earthquake
0.75
Â¥
0.75
£ı
0.74
Japanese
0.73
Activations Density 0.009%