INDEX
Explanations
religious figures or references
references to Jesus
New Auto-Interp
Negative Logits
circ
-0.71
Volks
-0.64
asio
-0.64
ulates
-0.63
£ı
-0.62
à¨
-0.60
inference
-0.60
nai
-0.59
pointer
-0.59
orescent
-0.59
POSITIVE LOGITS
itably
0.78
admitting
0.70
ideos
0.64
ouston
0.64
CG
0.63
HER
0.62
hips
0.62
Bots
0.60
Clara
0.60
Osw
0.60
Activations Density 0.000%