INDEX
Explanations
mentions of religious figures or leaders
mentions of religious leaders, specifically those with the title "Rev."
New Auto-Interp
Negative Logits
holders
-0.76
WAYS
-0.73
çĦ
-0.72
matical
-0.71
女
-0.68
Cth
-0.67
wic
-0.66
OTH
-0.64
TOTAL
-0.64
mats
-0.63
POSITIVE LOGITS
isions
1.22
olutions
1.12
olved
1.09
olves
1.06
olver
1.04
olt
1.02
ision
1.00
ived
0.99
ulsion
0.99
ivals
0.98
Activations Density 0.013%