INDEX
Explanations
references to the name "John."
New Auto-Interp
Negative Logits
coni
-0.18
mland
-0.16
enger
-0.16
lector
-0.15
itious
-0.14
lect
-0.14
udad
-0.14
ence
-0.14
andex
-0.14
mente
-0.14
POSITIVE LOGITS
athan
0.27
nie
0.20
sWith
0.20
sen
0.17
sons
0.17
mgr
0.16
nr
0.16
ning
0.16
p
0.15
ni
0.15
Activations Density 0.042%