INDEX
Explanations
mentions of the word "San"
New Auto-Interp
Negative Logits
bay
-0.16
yny
-0.16
olas
-0.15
yth
-0.14
umes
-0.14
Rockefeller
-0.14
-uri
-0.14
lá
-0.14
arkan
-0.14
yu
-0.14
POSITIVE LOGITS
Antonio
0.44
Ant
0.30
Anton
0.30
anton
0.29
ant
0.29
Ant
0.28
Angelo
0.27
Marcos
0.26
Marcus
0.26
Antar
0.22
Activations Density 0.003%