INDEX
Explanations
phrases indicating past experiences or actions
New Auto-Interp
Negative Logits
able
-0.19
now
-0.18
currently
-0.18
conde
-0.16
hereby
-0.16
yah
-0.16
OMET
-0.15
ands
-0.15
currently
-0.15
dsn
-0.15
POSITIVE LOGITS
originally
0.28
ness
0.27
hoped
0.24
earlier
0.24
nt
0.23
/is
0.21
Originally
0.20
Earlier
0.20
ron
0.19
origin
0.18
Activations Density 0.129%