INDEX
Explanations
names or titles related to a specific religious or political figure
mentions of political or religious leaders, particularly focusing on their names
New Auto-Interp
Negative Logits
Angels
-0.78
Sussex
-0.70
Oregon
-0.69
Virgin
-0.68
Hawth
-0.67
Rivals
-0.66
CAP
-0.64
Hurricanes
-0.63
fox
-0.62
NHS
-0.61
POSITIVE LOGITS
ollah
1.27
enei
0.98
Kham
0.88
Hussein
0.84
itory
0.78
onite
0.77
Seym
0.76
ength
0.74
Reloaded
0.74
istani
0.74
Activations Density 0.008%