INDEX
Explanations
selections or choices being made from a group of options
instances of the word "selected"
New Auto-Interp
Negative Logits
loo
-0.77
plane
-0.72
Net
-0.69
alone
-0.66
pir
-0.64
paw
-0.63
cer
-0.62
cow
-0.62
threat
-0.61
ga
-0.61
POSITIVE LOGITS
selection
0.83
dinand
0.81
picked
0.81
Selection
0.78
randomly
0.77
selections
0.77
avorite
0.74
lime
0.73
"$:/
0.72
selected
0.71
Activations Density 0.033%