INDEX
Explanations
references to war and conflict
New Auto-Interp
Negative Logits
uled
-0.15
ÃŃr
-0.14
entai
-0.14
ÂŃi
-0.14
اÛĮر
-0.13
freak
-0.13
arrants
-0.13
.xy
-0.13
269
-0.13
606
-0.13
POSITIVE LOGITS
conflict
0.44
conflicts
0.38
host
0.36
war
0.34
Conflict
0.33
conf
0.31
-conf
0.30
Conflict
0.28
wars
0.28
confl
0.26
Activations Density 0.156%