INDEX
Explanations
references to people involved in various incidents or contexts
New Auto-Interp
Negative Logits
ooks
-0.74
akings
-0.72
eeds
-0.71
urches
-0.70
poons
-0.68
ernels
-0.68
brids
-0.68
Cups
-0.68
gears
-0.66
scripts
-0.66
POSITIVE LOGITS
who
1.17
whom
1.11
named
1.02
whose
1.00
who
0.88
classmate
0.86
friend
0.85
colleague
0.79
friend
0.77
acquaintance
0.77
Activations Density 0.119%