INDEX
Explanations
expressions emphasizing certainty or assurance
New Auto-Interp
Negative Logits
845
-0.17
strar
-0.17
Homo
-0.15
Trial
-0.15
rial
-0.15
isz
-0.15
546
-0.15
ầm
-0.14
967
-0.14
rint
-0.14
POSITIVE LOGITS
fair
0.25
honest
0.22
frank
0.20
candid
0.20
fair
0.19
ped
0.19
Fair
0.18
precise
0.18
perfectly
0.17
blunt
0.17
Activations Density 0.020%