INDEX
Explanations
references to religious or spiritual texts and figures
New Auto-Interp
Negative Logits
alled
-0.16
amil
-0.16
.dc
-0.15
aller
-0.15
elter
-0.15
lauf
-0.15
áng
-0.14
aison
-0.14
omorphic
-0.14
_firestore
-0.14
POSITIVE LOGITS
Colony
0.14
tid
0.14
279
0.14
ADDE
0.14
boys
0.13
nes
0.13
волÑı
0.13
ÙĪØ¨ÛĮ
0.13
unca
0.13
391
0.13
Activations Density 0.015%