INDEX
Explanations
phrases that reference regular or repeated actions
New Auto-Interp
Negative Logits
linger
-0.18
ickey
-0.15
ligt
-0.15
awns
-0.14
ays
-0.14
angu
-0.14
ux
-0.14
ç©
-0.14
nable
-0.14
aml
-0.13
POSITIVE LOGITS
basis
0.25
whim
0.24
regular
0.23
scale
0.23
basis
0.21
consistent
0.20
sho
0.20
scale
0.19
dime
0.18
Sho
0.18
Activations Density 0.038%