INDEX
Explanations
phrases and words related to decision-making and restrictions
New Auto-Interp
Negative Logits
ulla
-0.16
AF
-0.15
ML
-0.15
976
-0.14
ilha
-0.14
Du
-0.14
exit
-0.14
Tor
-0.14
InSection
-0.13
tor
-0.13
POSITIVE LOGITS
kara
0.14
hết
0.14
Mitar
0.14
ستر
0.14
oler
0.14
ontvangst
0.13
_sparse
0.13
cel
0.13
hift
0.13
utz
0.13
Activations Density 0.002%