INDEX
Explanations
references to congregations and related religious terms
New Auto-Interp
Negative Logits
quin
-0.15
uga
-0.15
aken
-0.15
icable
-0.14
æĤł
-0.14
ight
-0.14
uggage
-0.14
ÑģÑĤика
-0.14
et
-0.14
phasis
-0.13
POSITIVE LOGITS
estion
0.21
Cong
0.18
regation
0.18
rats
0.18
ÏĥÏĦαν
0.17
Cong
0.17
ault
0.17
446
0.16
445
0.16
Lik
0.16
Activations Density 0.018%