INDEX
Explanations
references to different countries and regions
references to government or political entities, particularly those containing "Republic of."
New Auto-Interp
Negative Logits
anos
-0.77
setting
-0.75
afety
-0.74
nets
-0.73
grave
-0.71
angelo
-0.69
abus
-0.68
bath
-0.67
netflix
-0.66
icidal
-0.65
POSITIVE LOGITS
Congo
0.89
foundland
0.87
Latvia
0.80
Yugoslavia
0.79
Emirates
0.72
Lie
0.71
Ò
0.71
Somalia
0.70
Korea
0.70
Macedonia
0.70
Activations Density 0.048%