INDEX
Explanations
weapons and violent actions
terms associated with violence and conflict
New Auto-Interp
Negative Logits
natureconservancy
-0.66
ĻĤ
-0.65
algia
-0.64
arrang
-0.63
confidentiality
-0.62
Salary
-0.60
Sanctuary
-0.59
Priv
-0.58
ffen
-0.58
Differences
-0.55
POSITIVE LOGITS
onto
1.51
into
1.42
toward
1.16
towards
1.13
overboard
1.11
INTO
1.06
into
1.03
darts
0.97
projectiles
0.96
Into
0.96
Activations Density 0.285%