INDEX
Explanations
words related to specific entities and actions in a diverse range of contexts
New Auto-Interp
Negative Logits
theless
-0.94
atility
-0.82
hower
-0.79
ijah
-0.69
Bron
-0.69
atile
-0.69
nesday
-0.68
ternity
-0.65
Klux
-0.64
Kimber
-0.63
POSITIVE LOGITS
ãĤĮ
1.29
ãģĻ
1.28
ãģĹ
1.26
ãģ
1.10
ãĤĵ
1.02
ãĤĭ
1.01
ãģª
0.99
çĶ
0.93
ãģ§
0.93
ãģ£
0.93
Activations Density 0.009%