INDEX
Explanations
elements related to user interaction and active items in a UI context
New Auto-Interp
Negative Logits
Pearce
-0.17
uce
-0.16
Elliott
-0.16
uga
-0.15
onec
-0.14
jah
-0.14
otle
-0.14
468
-0.14
cntl
-0.13
noc
-0.13
POSITIVE LOGITS
item
0.91
item
0.80
Item
0.76
-item
0.71
_item
0.71
Item
0.69
items
0.69
.item
0.65
(item
0.63
ITEM
0.63
Activations Density 0.077%