INDEX
Explanations
references to the character Liz and her interactions
New Auto-Interp
Negative Logits
er
-0.20
erse
-0.19
erin
-0.18
es
-0.18
esin
-0.17
ed
-0.17
esub
-0.15
ives
-0.15
isted
-0.14
.matcher
-0.14
POSITIVE LOGITS
zi
0.26
zy
0.22
quierda
0.20
zych
0.19
y
0.18
zen
0.18
deps
0.18
opher
0.17
quier
0.17
modo
0.17
Activations Density 0.020%