INDEX
Explanations
mentions of the name "Paul."
New Auto-Interp
Negative Logits
essor
-0.17
ακ
-0.15
ãĥĭãĥ¡
-0.15
evil
-0.15
unan
-0.15
edd
-0.14
reta
-0.14
evin
-0.14
yk
-0.14
exchange
-0.14
POSITIVE LOGITS
ine
0.31
son
0.25
sen
0.24
raj
0.23
INE
0.20
sson
0.17
SON
0.17
mie
0.17
s
0.17
ie
0.17
Activations Density 0.017%