INDEX
Explanations
mentions of the name "Louis."
New Auto-Interp
Negative Logits
ts
-0.17
evin
-0.17
thren
-0.16
unar
-0.16
_Syntax
-0.15
hausen
-0.15
artment
-0.15
TS
-0.15
hog
-0.15
truth
-0.14
POSITIVE LOGITS
iana
0.28
ette
0.23
anna
0.20
Paste
0.18
iane
0.17
anne
0.17
ively
0.17
ise
0.17
ettes
0.16
ian
0.16
Activations Density 0.012%