INDEX
Explanations
words related to violence or conflict
New Auto-Interp
Negative Logits
eclipse
-0.65
rador
-0.65
manship
-0.62
agre
-0.62
cycle
-0.62
İĭ
-0.60
Pru
-0.60
Sessions
-0.58
prin
-0.57
rematch
-0.57
POSITIVE LOGITS
Brien
0.87
Allah
0.83
Malley
0.80
Donnell
0.80
thur
0.76
bor
0.75
Angelo
0.71
cue
0.71
Mech
0.71
Leary
0.70
Activations Density 0.023%