INDEX
Explanations
references to saints and historical religious figures
New Auto-Interp
Negative Logits
quito
-0.17
Santo
-0.15
adel
-0.14
erdem
-0.14
ando
-0.14
upe
-0.14
uggy
-0.14
ingo
-0.14
ernals
-0.13
Independ
-0.13
POSITIVE LOGITS
phyl
0.20
Thr
0.19
Lydia
0.19
Ph
0.17
Asia
0.16
Petra
0.16
Syria
0.15
Memphis
0.15
Tham
0.15
apolis
0.15
Activations Density 0.121%