INDEX
Explanations
phrases that refer to monitoring or paying attention
New Auto-Interp
Negative Logits
iro
-0.16
æĭľ
-0.15
inho
-0.15
-Ta
-0.15
trous
-0.14
iros
-0.14
775
-0.14
ittest
-0.13
760
-0.13
mediately
-0.13
POSITIVE LOGITS
eye
0.40
tabs
0.35
track
0.34
close
0.31
Eye
0.30
eye
0.29
-eye
0.27
Eye
0.27
watch
0.27
close
0.26
Activations Density 0.029%