INDEX
Negative Logits
hurts
0.54
craziness
0.54
hippie
0.54
frenzy
0.52
tricks
0.51
骂
0.51
junkie
0.51
big
0.50
scared
0.50
hurting
0.49
POSITIVE LOGITS
Furthermore
0.79
Therefore
0.71
Moreover
0.70
Furthermore
0.67
Researchers
0.67
Subsequently
0.66
Therefore
0.66
policymakers
0.65
Researchers
0.63
Investigations
0.63
Activations Density 0.108%