INDEX
Explanations
references to specific individuals named "Car."
New Auto-Interp
Negative Logits
kup
-0.17
esor
-0.15
yte
-0.15
amat
-0.15
esser
-0.15
tors
-0.15
nore
-0.15
tics
-0.15
urma
-0.14
datal
-0.14
POSITIVE LOGITS
rying
0.35
oline
0.31
leton
0.31
olina
0.30
ibbean
0.30
roll
0.29
son
0.29
rots
0.29
lsen
0.28
rot
0.28
Activations Density 0.018%