INDEX
Explanations
references to honors and achievements or awards
New Auto-Interp
Negative Logits
eer
-0.17
een
-0.16
ÙħاÙĦ
-0.16
hoot
-0.16
intree
-0.15
eel
-0.15
æĪ·
-0.14
iddi
-0.14
eva
-0.14
election
-0.14
POSITIVE LOGITS
orary
0.42
ours
0.34
orable
0.34
esty
0.31
ors
0.30
ored
0.30
oring
0.28
oured
0.28
ore
0.24
OURS
0.23
Activations Density 0.004%