INDEX
Explanations
mentions of the Pope
references to the Pope
New Auto-Interp
Negative Logits
ript
-0.87
nesota
-0.86
yrinth
-0.82
awks
-0.74
rg
-0.73
é¾
-0.73
ahime
-0.72
mental
-0.71
erry
-0.71
ership
-0.70
POSITIVE LOGITS
Francis
1.14
Pope
0.95
Pope
0.88
Benedict
0.86
pont
0.83
pope
0.78
Father
0.75
orio
0.73
angelo
0.72
Pablo
0.72
Activations Density 0.007%