INDEX
Explanations
mentions of geographical locations or proper nouns
the name "Yang" in various contexts
New Auto-Interp
Negative Logits
olicy
-0.91
aido
-0.89
essions
-0.76
[];
-0.74
hered
-0.73
ormal
-0.73
aye
-0.71
rd
-0.71
unct
-0.70
ĵĺ
-0.70
POSITIVE LOGITS
lda
0.84
gang
0.83
bank
0.75
assetsadobe
0.74
Olymp
0.72
Yang
0.72
hound
0.72
den
0.69
cki
0.68
é»Ĵ
0.68
Activations Density 0.030%