INDEX
Explanations
words related to switching or toggling states or actions
New Auto-Interp
Negative Logits
strict
-0.16
207
-0.16
eus
-0.15
uly
-0.15
icious
-0.15
ร
-0.15
exion
-0.14
untas
-0.14
sson
-0.14
ifice
-0.14
POSITIVE LOGITS
ero
0.20
esa
0.19
et
0.18
aroo
0.17
Endian
0.16
sit
0.16
etu
0.16
INCREMENT
0.15
pery
0.15
tower
0.15
Activations Density 0.053%