INDEX
Explanations
questions or requests for information
commands or phrases that inquire for information or prompt questions
New Auto-Interp
Negative Logits
cutting
-0.72
Tigers
-0.72
Ĥ¬
-0.68
pite
-0.64
Formation
-0.64
swing
-0.63
rient
-0.62
ffen
-0.62
clude
-0.61
abama
-0.61
POSITIVE LOGITS
naires
1.04
rhet
1.04
probing
0.98
questions
0.97
wered
0.88
asked
0.84
ask
0.80
answered
0.79
asking
0.79
politely
0.79
Activations Density 0.043%