INDEX
Explanations
phrases related to negative events, such as deaths and arrests
occurrences of death or destruction and their impacts
New Auto-Interp
Negative Logits
mie
-0.69
issance
-0.69
yip
-0.67
erity
-0.66
anny
-0.60
00
-0.59
Tire
-0.59
¯
-0.58
chief
-0.56
roo
-0.56
POSITIVE LOGITS
themselves
0.97
outright
0.86
respectively
0.81
leases
0.77
apiece
0.77
onyms
0.74
osponsors
0.74
individually
0.73
voluntarily
0.73
redund
0.72
Activations Density 0.483%