INDEX
Explanations
the word "chosen" with a high level of activation
instances of the word "chosen."
New Auto-Interp
Negative Logits
pat
-0.76
Net
-0.75
urst
-0.71
ptoms
-0.70
ilit
-0.67
vacc
-0.67
CDC
-0.66
monds
-0.66
itamin
-0.65
emic
-0.65
POSITIVE LOGITS
chosen
1.01
randomly
0.83
chooses
0.80
lists
0.80
choosing
0.78
chose
0.75
ACTIONS
0.75
Disciple
0.74
çĶŁ
0.73
selection
0.71
Activations Density 0.010%