INDEX
Explanations
references to religious texts and concepts related to salvation and authority
New Auto-Interp
Negative Logits
omy
-0.17
chn
-0.16
_regularizer
-0.15
Sant
-0.15
atatype
-0.15
天åłĤ
-0.15
Dante
-0.15
zes
-0.14
olley
-0.14
apan
-0.14
POSITIVE LOGITS
Acts
0.28
Acts
0.26
Barn
0.22
Damascus
0.20
Peter
0.18
Lydia
0.18
converts
0.18
Marty
0.17
Paul
0.17
wid
0.17
Activations Density 0.010%