INDEX
Explanations
references to a specific individual named Hart
New Auto-Interp
Negative Logits
hoot
-0.19
routine
-0.17
itaire
-0.16
rael
-0.15
UREMENT
-0.15
rine
-0.15
andering
-0.15
ront
-0.14
rons
-0.14
bones
-0.14
POSITIVE LOGITS
nett
0.23
igan
0.23
kop
0.19
nell
0.19
sock
0.18
mann
0.18
mut
0.17
well
0.17
wig
0.17
kop
0.17
Activations Density 0.004%