INDEX
Explanations
phrases indicating personal statements or feelings of the speaker
New Auto-Interp
Negative Logits
idor
-0.19
zew
-0.16
MOTE
-0.15
ãĤĥ
-0.15
neust
-0.15
ERSION
-0.15
ymi
-0.15
RTOS
-0.14
pedia
-0.14
ERCHANT
-0.14
POSITIVE LOGITS
going
0.23
done
0.19
Going
0.19
gonna
0.18
finished
0.18
sorry
0.18
tired
0.18
-going
0.17
Going
0.16
coming
0.16
Activations Density 0.207%