INDEX
Explanations
references to people or their statements
New Auto-Interp
Negative Logits
supuestamente
-0.67
allegedly
-0.65
########.
-0.61
purported
-0.61
supposedly
-0.60
reportedly
-0.59
oa̍t
-0.53
のでしょう
-0.50
Pursuant
-0.49
stated
-0.48
POSITIVE LOGITS
GenerationType
0.94
shrugs
0.78
admits
0.75
laughs
0.73
laughed
0.69
laugh
0.67
concedes
0.67
shrugged
0.65
chuckled
0.65
admit
0.64
Activations Density 0.129%