INDEX
Explanations
mentions of war and conflict
New Auto-Interp
Negative Logits
myſelf
-0.96
تضيفلها
-0.94
ſelf
-0.90
beſt
-0.87
ؤلاء
-0.87
citoy
-0.85
pleaſure
-0.85
unſ
-0.84
auffi
-0.83
незавершена
-0.83
POSITIVE LOGITS
war
3.15
War
3.01
War
2.81
WAR
2.75
war
2.71
WAR
2.36
wars
2.25
guerra
2.03
Wars
1.90
Wars
1.80
Activations Density 0.049%