INDEX
Explanations
mentions of religious leaders, particularly those associated with power and authority
mentions of influential leaders or figures
New Auto-Interp
Negative Logits
asar
-0.87
agements
-0.87
inates
-0.84
inator
-0.77
resses
-0.77
inators
-0.77
ress
-0.73
agement
-0.71
inating
-0.71
arians
-0.71
POSITIVE LOGITS
GOODMAN
0.69
Score
0.68
å£
0.68
Beh
0.68
fixme
0.68
eous
0.64
EVA
0.63
@#&
0.63
edient
0.62
Vaughn
0.62
Activations Density 0.096%