INDEX
Explanations
phrases related to divine authority and religious instructions
New Auto-Interp
Negative Logits
ModelProperty
-0.19
ourg
-0.17
orca
-0.16
bourg
-0.16
pte
-0.16
Incontri
-0.15
egin
-0.15
I
-0.15
zew
-0.15
uth
-0.15
POSITIVE LOGITS
O
0.24
Oh
0.22
Ver
0.19
Beh
0.19
ver
0.19
therefore
0.19
oh
0.18
amen
0.18
Therefore
0.18
let
0.18
Activations Density 0.289%