INDEX
Explanations
instructions or prompts for making selections or choices
New Auto-Interp
Negative Logits
ViewFeatures
-0.70
etcode
-0.66
ecake
-0.64
Goodnight
-0.61
Harvey
-0.58
documented
-0.57
ocumented
-0.56
'../
-0.56
Unread
-0.56
sApp
-0.56
POSITIVE LOGITS
choose
1.67
choose
1.61
Choose
1.59
Choose
1.59
choosing
1.59
chooses
1.57
chosen
1.53
CHOOSE
1.50
choosing
1.50
Choosing
1.49
Activations Density 0.158%