INDEX
Explanations
mentions of religious figures, specifically Popes
mentions of Popes and the Vatican
New Auto-Interp
Negative Logits
ript
-0.81
yrinth
-0.78
é¾
-0.74
schild
-0.73
skirts
-0.72
mble
-0.70
nesota
-0.68
rha
-0.68
nel
-0.67
hooting
-0.67
POSITIVE LOGITS
Francis
1.29
Benedict
1.02
esses
0.84
Clement
0.83
infall
0.81
Franc
0.79
Pope
0.79
pope
0.79
otle
0.78
Leo
0.74
Activations Density 0.024%