INDEX
Explanations
questions directed to someone, possibly as part of a conversation or interview
occurrences of the word "ask" in various contexts
New Auto-Interp
Negative Logits
undo
-0.69
lim
-0.64
pite
-0.64
zinski
-0.62
anking
-0.60
cutting
-0.59
ovie
-0.59
rongh
-0.58
swing
-0.58
cyclop
-0.57
POSITIVE LOGITS
questions
1.03
rhet
1.02
wered
1.00
probing
0.95
answ
0.90
Questions
0.84
answered
0.83
permission
0.81
politely
0.81
naires
0.81
Activations Density 0.050%