INDEX
Explanations
phrases related to controversial incidents or cases
New Auto-Interp
Negative Logits
aye
-0.70
liking
-0.70
vowel
-0.67
hhhh
-0.67
weights
-0.67
idity
-0.65
vow
-0.64
urances
-0.64
salaries
-0.63
interfaces
-0.63
POSITIVE LOGITS
unsolved
1.25
tragic
1.20
unfolding
1.17
recounted
1.03
tragedies
1.01
traumat
0.97
traumatic
0.96
heartbreaking
0.96
sympt
0.96
Scene
0.95
Activations Density 0.330%