INDEX
Explanations
titles or mentions of war-related content
references to the concept of war
New Auto-Interp
Negative Logits
sembly
-0.90
essee
-0.88
Ħ¢
-0.85
aunder
-0.79
ħĭ
-0.79
issance
-0.77
İĭ
-0.72
incorpor
-0.72
aminer
-0.70
livest
-0.67
POSITIVE LOGITS
rior
1.30
riors
1.23
fare
1.22
lord
1.19
lords
1.19
locks
1.07
restling
0.98
ping
0.94
zone
0.93
ped
0.93
Activations Density 0.028%