INDEX
Explanations
mentions or references to people in various contexts
references to people and their opinions or behaviors
New Auto-Interp
Negative Logits
srfAttach
-0.77
ãĥ¯
-0.71
éŃĶ
-0.67
paralleled
-0.66
inth
-0.66
yx
-0.64
Rhodes
-0.64
actory
-0.63
predecessor
-0.63
orthy
-0.62
POSITIVE LOGITS
underestimate
0.99
clam
0.97
underest
0.96
misunderstanding
0.91
misunderstand
0.91
dying
0.89
noticing
0.89
flock
0.88
afraid
0.87
hating
0.87
Activations Density 0.231%