INDEX
Explanations
references to worship and spiritual authority
New Auto-Interp
Negative Logits
<unused41>
-0.88
<unused43>
-0.88
<unused74>
-0.88
<unused79>
-0.88
<unused16>
-0.88
<unused8>
-0.88
<unused23>
-0.88
<unused28>
-0.88
<unused3>
-0.88
<pad>
-0.88
POSITIVE LOGITS
saying
0.31
this
0.30
here
0.30
They
0.28
the
0.27
los
0.26
etkili
0.26
says
0.25
ere
0.25
word
0.25
Activations Density 0.181%