INDEX
Explanations
references to religious beliefs, practices, and conflicts
New Auto-Interp
Negative Logits
Lans
-0.97
ufact
-0.84
hyde
-0.77
agher
-0.76
crop
-0.75
Runner
-0.75
ioxide
-0.73
20439
-0.72
assetsadobe
-0.70
aunder
-0.70
POSITIVE LOGITS
liberty
1.10
affiliation
1.07
affili
1.06
zeal
0.99
beliefs
0.99
ferv
0.97
freedom
0.96
extremism
0.95
liberties
0.94
persecution
0.91
Activations Density 0.039%