INDEX
Explanations
the word "whom" in the text
New Auto-Interp
Negative Logits
pad
-0.77
forcing
-0.73
DEN
-0.72
rid
-0.68
jad
-0.67
fix
-0.66
case
-0.65
lag
-0.64
tight
-0.64
termin
-0.64
POSITIVE LOGITS
soever
1.95
selves
0.86
dearly
0.79
whom
0.75
onga
0.70
alike
0.66
coh
0.63
shares
0.63
usalem
0.62
vou
0.62
Activations Density 0.007%