INDEX
Explanations
references to armed conflict and wars
New Auto-Interp
Negative Logits
æĿ¥çļĦ
-0.16
ำ
-0.15
ombat
-0.15
bih
-0.15
å¹ħ
-0.15
atak
-0.15
ogh
-0.15
usk
-0.15
بÙĬØ©
-0.14
slu
-0.14
POSITIVE LOGITS
break
0.60
broke
0.60
breakout
0.57
breaking
0.57
breaks
0.55
Break
0.53
break
0.48
Break
0.46
-break
0.46
_break
0.45
Activations Density 0.028%