INDEX
Explanations
instances of the word "choice" in various contexts
New Auto-Interp
Negative Logits
\\\\\\\\
-0.88
-0.76
ware
-0.65
flee
-0.65
convenience
-0.63
Carbuncle
-0.62
blanket
-0.61
Ire
-0.59
winner
-0.59
Desk
-0.58
POSITIVE LOGITS
inery
0.73
ertation
0.72
iger
0.70
emen
0.68
yss
0.67
aci
0.65
iosis
0.64
aptic
0.64
adian
0.63
endi
0.63
Activations Density 0.042%