INDEX
Explanations
phrases containing the word "what."
questions or statements that inquire about outcomes or results
New Auto-Interp
Negative Logits
"}],"
-0.69
itsch
-0.65
ashington
-0.64
chio
-0.64
UME
-0.62
acious
-0.62
question
-0.62
riter
-0.61
pg
-0.61
ansom
-0.60
POSITIVE LOGITS
happens
1.08
develops
0.82
happened
0.76
happen
0.75
unfolds
0.74
happ
0.73
shenanigans
0.70
reaction
0.69
fuss
0.67
fools
0.66
Activations Density 0.074%