INDEX
Negative Logits
congr
-0.69
etts
-0.67
thanked
-0.63
knit
-0.62
congratulated
-0.61
Roots
-0.60
erning
-0.59
Redd
-0.57
recognizes
-0.56
nect
-0.56
POSITIVE LOGITS
incorrectly
1.52
inconsist
1.50
wrong
1.45
poorly
1.41
incorrect
1.41
inappropriately
1.35
improperly
1.33
unnecessarily
1.30
unsu
1.29
inaccurate
1.29
Activations Density 1.029%