INDEX
Explanations
mentions of Popes
mentions of the Pope
New Auto-Interp
Negative Logits
agents
-0.73
agent
-0.68
iries
-0.66
dat
-0.64
segreg
-0.62
engineering
-0.61
competitive
-0.60
lan
-0.60
milliseconds
-0.58
sustain
-0.58
POSITIVE LOGITS
Pope
3.77
Pope
3.69
pope
2.85
Vatican
2.08
Archbishop
1.88
atican
1.80
pont
1.79
Catholics
1.78
Cardinal
1.76
Catholicism
1.63
Activations Density 0.017%