INDEX
Explanations
questions being asked
the verb "asked" and related questioning dialogue
New Auto-Interp
Negative Logits
Ĥ¬
-0.76
marine
-0.69
âĶ
-0.64
âĸ¬
-0.64
âĶĢâĶĢ
-0.63
pite
-0.63
ordinate
-0.61
torches
-0.60
Scouting
-0.60
smoking
-0.60
POSITIVE LOGITS
rhet
1.08
questions
0.98
ioned
0.93
probing
0.90
naires
0.86
govtrack
0.86
asked
0.85
him
0.81
wered
0.80
ask
0.79
Activations Density 0.042%