INDEX
Explanations
words indicating confusion or being puzzled
terms related to confusion or puzzlement
New Auto-Interp
Negative Logits
amins
-0.80
igers
-0.72
roleum
-0.69
mens
-0.63
llan
-0.63
rio
-0.63
umption
-0.63
credits
-0.62
ppo
-0.62
ods
-0.61
POSITIVE LOGITS
baff
1.21
baffled
1.15
perplex
1.10
Puzz
1.04
ingly
1.03
bewild
0.95
vex
0.92
puzz
0.91
puzzled
0.89
puzzling
0.86
Activations Density 0.025%