INDEX
Explanations
references to military or aggressive actions like raids
New Auto-Interp
Negative Logits
едж
-0.15
lie
-0.15
agues
-0.14
plata
-0.14
provid
-0.14
offer
-0.14
/design
-0.14
tdown
-0.13
assen
-0.13
yped
-0.13
POSITIVE LOGITS
ingly
0.17
WAR
0.15
ertools
0.15
Rage
0.14
ÐłÐµÐ³
0.14
ifo
0.14
oste
0.14
enate
0.14
å°Ħ
0.13
@"
0.13
Activations Density 0.011%