INDEX
Explanations
negative terms associated with competition or conflict
terms related to conflict or opposition
New Auto-Interp
Negative Logits
ples
-0.67
ination
-0.64
effected
-0.63
inated
-0.61
acting
-0.59
informed
-0.58
izations
-0.58
romy
-0.58
val
-0.56
uter
-0.56
POSITIVE LOGITS
bilt
0.71
mares
0.67
pole
0.65
emonic
0.64
cliffe
0.64
ngth
0.64
ãĤ¤ãĥĪ
0.64
¯¯¯¯
0.62
rawler
0.61
weed
0.61
Activations Density 0.188%