INDEX
Explanations
adverbs that modify intensity or manner
New Auto-Interp
Negative Logits
ity
-0.15
trusted
-0.14
admittedly
-0.14
Count
-0.14
kke
-0.14
:checked
-0.14
ausal
-0.13
оÑī
-0.13
eniz
-0.13
muted
-0.13
POSITIVE LOGITS
ono
0.16
ingly
0.16
accurate
0.16
different
0.15
omb
0.15
aware
0.15
ео
0.15
ovich
0.14
obi
0.14
mát
0.14
Activations Density 0.068%