INDEX
Explanations
phrases indicating significant impact or reach
New Auto-Interp
Negative Logits
jed
-0.15
owitz
-0.14
pad
-0.14
thesis
-0.14
lemen
-0.14
ermen
-0.14
er
-0.14
ness
-0.13
asi
-0.13
isque
-0.13
POSITIVE LOGITS
-going
0.27
going
0.27
going
0.26
-running
0.25
standing
0.25
reaching
0.24
running
0.23
-standing
0.23
running
0.23
Going
0.23
Activations Density 0.091%