INDEX
Explanations
references to historical events and religious practices
New Auto-Interp
Negative Logits
#af
-0.16
Kansas
-0.14
americ
-0.14
aska
-0.14
θα
-0.14
اÙĦتس
-0.14
submar
-0.13
uang
-0.13
γα
-0.13
ActionCode
-0.13
POSITIVE LOGITS
Sele
0.31
Sadd
0.28
Temple
0.27
Phar
0.26
temple
0.25
Dias
0.25
Ess
0.24
Her
0.24
Jude
0.24
Ze
0.23
Activations Density 0.039%