INDEX
Explanations
references to religious organizations, specifically churches
New Auto-Interp
Negative Logits
sip
-0.15
546
-0.15
ovy
-0.15
enser
-0.14
ÏĦÏħ
-0.14
teki
-0.14
ç¹Ķ
-0.14
à¹ģà¸Ļ
-0.14
arro
-0.14
rief
-0.13
POSITIVE LOGITS
Christ
0.19
dzi
0.17
ÅŁk
0.16
Ashton
0.16
England
0.16
ozem
0.15
.cx
0.15
ấn
0.15
áºł
0.15
ernel
0.14
Activations Density 0.007%