INDEX
Explanations
references to religious texts and figures
New Auto-Interp
Negative Logits
stantiate
-0.16
odus
-0.15
azen
-0.15
exter
-0.15
ovah
-0.15
ERRU
-0.14
sterol
-0.14
ÙĨÙħ
-0.14
OMUX
-0.14
stadt
-0.14
POSITIVE LOGITS
Gal
0.38
Paul
0.37
Paul
0.33
Romans
0.31
Gal
0.31
Corinth
0.30
Rom
0.28
Col
0.28
paul
0.28
Rom
0.28
Activations Density 0.007%