INDEX
Explanations
words related to actions and roles in various contexts
New Auto-Interp
Negative Logits
-toggler
-0.17
agher
-0.15
umber
-0.15
onis
-0.15
ughter
-0.14
witter
-0.14
bject
-0.14
kyt
-0.14
ayo
-0.14
elow
-0.14
POSITIVE LOGITS
lava
0.17
OKIE
0.14
Dud
0.14
flare
0.14
vil
0.13
robe
0.13
Alic
0.13
473
0.13
quil
0.13
llib
0.13
Activations Density 0.038%