INDEX
Explanations
mentions of religious institutions, particularly churches
New Auto-Interp
Negative Logits
eec
-0.15
aks
-0.15
ernaut
-0.15
ode
-0.14
emale
-0.14
aries
-0.14
Oaks
-0.14
crack
-0.14
ãĥ¼ãĥĦ
-0.14
ated
-0.14
POSITIVE LOGITS
yard
0.29
going
0.20
illon
0.19
(es
0.19
yards
0.19
wide
0.19
ouse
0.18
ill
0.18
go
0.17
elijke
0.17
Activations Density 0.023%