INDEX
Explanations
phrases indicating preparation or anticipation of future events
New Auto-Interp
Negative Logits
nd
-0.17
leans
-0.16
c
-0.16
hood
-0.16
allback
-0.15
rish
-0.15
dance
-0.14
-paper
-0.14
seau
-0.14
faction
-0.14
POSITIVE LOGITS
ä¼į
0.17
884
0.16
868
0.16
OnError
0.15
äng
0.15
urette
0.15
721
0.14
egen
0.14
740
0.14
нÑĮого
0.14
Activations Density 0.018%