INDEX
Explanations
references to attacks or aggressive actions
attack or attacked
New Auto-Interp
Negative Logits
sidemargin
-0.54
незавершена
-0.50
eleste
-0.49
verständlich
-0.47
%)$
-0.47
compliance
-0.46
ніципалі
-0.46
zufolge
-0.45
Reverb
-0.45
mentorship
-0.44
POSITIVE LOGITS
attacked
0.99
attacking
0.94
Attacking
0.77
atacado
0.74
atacar
0.73
attacc
0.58
menyerang
0.55
ataca
0.54
ATTACK
0.54
attack
0.54
Activations Density 0.010%