INDEX
Explanations
phrases indicating direction or intention
New Auto-Interp
Negative Logits
/w
-0.15
Ïīδ
-0.15
imer
-0.15
ares
-0.14
-insert
-0.14
hop
-0.14
ÅĻÃŃklad
-0.14
ovali
-0.14
uforia
-0.13
ek
-0.13
POSITIVE LOGITS
GGLE
0.17
/from
0.17
ies
0.16
Tow
0.16
ement
0.16
toward
0.16
sWith
0.15
/about
0.15
towards
0.14
roots
0.14
Activations Density 0.026%