INDEX
Explanations
phrases indicating choice or options
New Auto-Interp
Negative Logits
selected
-0.86
Selected
-0.84
Selected
-0.78
selected
-0.77
SELECTED
-0.72
selezion
-0.65
sélectionné
-0.63
Selecting
-0.55
sélectionnés
-0.54
seleccionadas
-0.54
POSITIVE LOGITS
choice
2.05
choice
1.82
Choice
1.73
CHOICE
1.66
Choice
1.63
choix
1.53
CHOICE
1.48
cho
1.46
choices
1.43
CHO
1.35
Activations Density 0.329%