INDEX
Explanations
mentions and titles of government officials, particularly those holding ministerial positions
references to government ministers and their titles
New Auto-Interp
Negative Logits
torch
-0.65
theaters
-0.64
Samurai
-0.63
Fighter
-0.62
vent
-0.60
theater
-0.60
Savior
-0.59
WAR
-0.57
rendition
-0.57
bladder
-0.56
POSITIVE LOGITS
ial
1.32
arians
1.06
arian
1.05
ials
1.00
ially
0.97
onse
0.90
icka
0.86
iate
0.81
isms
0.80
ery
0.79
Activations Density 0.030%