INDEX
Explanations
expressions of choice or decision-making
New Auto-Interp
Negative Logits
uality
-0.21
zzo
-0.15
da
-0.15
typeid
-0.15
toy
-0.15
atcher
-0.15
uling
-0.15
logue
-0.15
lico
-0.14
licht
-0.14
POSITIVE LOGITS
wisely
0.27
entially
0.25
Wis
0.23
lá»įc
0.23
between
0.21
sides
0.19
among
0.18
ItemAt
0.18
wis
0.17
'gc
0.17
Activations Density 0.030%