INDEX
Explanations
mentions of religious figures, specifically the Pope
mentions of the Pope
New Auto-Interp
Negative Logits
nesota
-0.79
ript
-0.77
é¾
-0.76
yrinth
-0.70
stract
-0.70
rha
-0.69
ahime
-0.68
ilater
-0.68
ÑĮ
-0.68
schild
-0.68
POSITIVE LOGITS
Francis
1.26
Benedict
0.96
Clement
0.79
Father
0.79
Pope
0.78
esses
0.78
pope
0.77
Father
0.75
otle
0.74
Pope
0.73
Activations Density 0.014%