INDEX
Explanations
comparisons between different options or choices
phrases indicating comparison or asking rhetorical questions about choices
New Auto-Interp
Negative Logits
bis
-0.70
KO
-0.66
iHUD
-0.62
Board
-0.61
poppy
-0.58
horizont
-0.57
Pir
-0.57
pi
-0.57
carriers
-0.57
gra
-0.57
POSITIVE LOGITS
?!
1.20
?]
1.12
?
1.09
?)
1.08
?),
1.08
!?
1.07
?).
0.99
?!"
0.99
?"
0.98
?'
0.98
Activations Density 0.159%