INDEX
Explanations
questions starting with "Who" asking about various topics or actions
references to the word "who" in questions
New Auto-Interp
Negative Logits
PORT
-0.70
strip
-0.64
Pillar
-0.62
saturation
-0.61
GV
-0.60
spiral
-0.60
Viking
-0.58
Ashton
-0.58
interstitial
-0.57
BOX
-0.56
POSITIVE LOGITS
cares
1.28
else
1.16
knows
1.16
soever
1.11
oping
1.09
ever
1.05
ops
1.04
oped
0.88
knew
0.87
osh
0.86
Activations Density 0.027%