INDEX
Explanations
questions starting with "What would" or similar variations
conditional phrases or hypothetical situations
New Auto-Interp
Negative Logits
haus
-0.75
fort
-0.68
fox
-0.67
ibaba
-0.67
anie
-0.64
cule
-0.64
ledge
-0.64
belt
-0.64
hill
-0.63
skirts
-0.63
POSITIVE LOGITS
?]
0.78
happen
0.74
millenn
0.68
entail
0.63
theolog
0.61
deg
0.60
ENTS
0.59
reconc
0.59
corrections
0.59
?)
0.59
Activations Density 0.076%