INDEX
Explanations
instructions or prompts to take specific actions, typically involving clicks on buttons or links
New Auto-Interp
Head Attr Weights
0:0.09
1:0.03
2:0.07
3:0.09
4:0.03
5:0.06
6:0.05
7:0.05
8:0.25
9:0.12
10:0.07
11:0.03
Negative Logits
istance
-1.37
pires
-1.37
ensable
-1.37
spection
-1.32
angering
-1.31
ctors
-1.21
ynthesis
-1.21
htaking
-1.20
aths
-1.18
nikov
-1.16
POSITIVE LOGITS
button
1.57
tab
1.28
Brow
1.27
NRS
1.26
buttons
1.22
UI
1.16
MSN
1.16
rex
1.14
Eag
1.14
slider
1.12
Activations Density 0.034%