INDEX
Explanations
phrases related to conflicts and military actions
New Auto-Interp
Negative Logits
ç·
-0.94
olor
-0.79
iterranean
-0.74
awan
-0.73
loe
-0.73
REDACTED
-0.73
>[
-0.72
women
-0.71
æĸ¹
-0.71
çͰ
-0.70
POSITIVE LOGITS
favour
0.88
afar
0.82
sight
0.81
favor
0.80
vitro
0.79
clusively
0.73
spite
0.73
senseless
0.72
sheer
0.72
wards
0.72
Activations Density 16.714%