INDEX
Explanations
mentions of the name "Alex" or its variations in the text
New Auto-Interp
Negative Logits
otate
-0.19
pras
-0.17
geh
-0.17
iglia
-0.16
JECTION
-0.16
ly
-0.15
loe
-0.14
att
-0.14
ÑģÑı
-0.14
uche
-0.14
POSITIVE LOGITS
andra
0.31
andro
0.25
andr
0.25
andre
0.24
ander
0.22
ei
0.21
anders
0.19
jandro
0.18
opoulos
0.17
anian
0.17
Activations Density 0.013%