INDEX
Explanations
verbs related to rejecting or disregarding something
New Auto-Interp
Negative Logits
ood
-0.66
olics
-0.62
tta
-0.61
breeding
-0.61
iste
-0.61
tti
-0.60
ebus
-0.60
hedral
-0.59
stead
-0.59
heric
-0.58
POSITIVE LOGITS
igated
0.88
responsibility
0.87
outright
0.85
aside
0.85
accusations
0.81
igating
0.78
complaints
0.76
ively
0.75
charges
0.73
manslaughter
0.73
Activations Density 0.069%