INDEX
Explanations
phrases related to hypothetical or potential situations
phrases indicating potential outcomes or hypotheticals
New Auto-Interp
Negative Logits
Lawrence
-0.65
Ste
-0.64
Down
-0.62
Goose
-0.62
Contrast
-0.61
Feld
-0.61
reception
-0.61
contrasting
-0.61
Drawn
-0.60
Lod
-0.60
POSITIVE LOGITS
be
1.30
issue
1.00
say
0.96
thinking
0.95
minded
0.93
always
0.92
bes
0.90
dos
0.90
bodied
0.90
formation
0.90
Activations Density 0.024%