INDEX
Negative Logits
Statement
0.45
Presumably
0.45
ostensibly
0.41
Statement
0.40
Establish
0.39
Statements
0.39
egreg
0.39
excessively
0.39
установ
0.38
おそらく
0.38
POSITIVE LOGITS
admits
0.86
thinks
0.84
chuckled
0.84
chuckle
0.79
admitted
0.77
concedes
0.77
said
0.73
laughed
0.72
laugh
0.70
joked
0.70
Activations Density 0.004%