INDEX
Explanations
the pronoun "he" within sentences
the pronoun "he"
New Auto-Interp
Negative Logits
hips
-0.96
GEAR
-0.61
iatrics
-0.59
intercept
-0.56
Labrador
-0.56
Psychiat
-0.56
iott
-0.55
INGTON
-0.55
OPS
-0.54
helicopters
-0.54
POSITIVE LOGITS
rency
1.08
ller
1.03
nder
1.00
lling
0.99
atre
0.97
gan
0.93
lda
0.92
aven
0.92
ttes
0.91
ppo
0.90
Activations Density 0.033%