INDEX
Explanations
phrases that refer to repeated actions or routines
New Auto-Interp
Negative Logits
ays
-0.17
linger
-0.16
cido
-0.15
ovÃŃ
-0.15
ux
-0.15
oque
-0.15
vasive
-0.14
ligt
-0.14
Äįer
-0.14
aten
-0.13
POSITIVE LOGITS
scale
0.26
sho
0.23
scale
0.22
Scale
0.21
Sho
0.21
Scale
0.19
regular
0.19
SCALE
0.18
whim
0.18
mission
0.18
Activations Density 0.040%