INDEX
Explanations
phrases indicating sequential actions or instructions
New Auto-Interp
Negative Logits
esk
-0.16
619
-0.15
esh
-0.15
iyon
-0.15
ulaire
-0.14
ênh
-0.14
={['-0.14
ignal
-0.14
esome
-0.14
ulti
-0.13
POSITIVE LOGITS
vo
0.53
Vo
0.44
vo
0.42
Vo
0.40
viol
0.35
VO
0.31
VO
0.31
prest
0.30
boom
0.27
away
0.27
Activations Density 0.147%