INDEX
Explanations
repeated words or phrases that imply instructions or actions to follow
New Auto-Interp
Negative Logits
Stevenson
-0.18
CLICK
-0.18
click
-0.15
Click
-0.15
uttle
-0.15
clicks
-0.14
click
-0.14
ç·Ĵ
-0.14
Click
-0.14
-push
-0.14
POSITIVE LOGITS
learn
0.24
view
0.22
raquo
0.21
see
0.20
learn
0.20
Learn
0.17
:view
0.17
access
0.17
learns
0.17
view
0.16
Activations Density 0.047%