INDEX
Explanations
references to people named Paul
New Auto-Interp
Negative Logits
isos
-0.17
yk
-0.16
oyo
-0.16
orse
-0.16
nici
-0.15
alus
-0.15
essor
-0.15
idle
-0.15
znik
-0.15
yms
-0.15
POSITIVE LOGITS
ine
0.32
sen
0.28
raj
0.24
son
0.21
INE
0.21
ina
0.21
ding
0.20
ien
0.20
inus
0.19
ie
0.18
Activations Density 0.014%