INDEX
Explanations
expressions indicating honesty or frankness
expressions of honesty or frankness in opinions
New Auto-Interp
Negative Logits
etting
-0.71
Landing
-0.70
Blades
-0.70
ied
-0.62
lings
-0.61
arthy
-0.61
tailed
-0.61
akh
-0.59
vich
-0.59
lav
-0.59
POSITIVE LOGITS
speaking
0.97
é¾įåĸļ士
0.86
zers
0.82
honestly
0.74
ometry
0.73
speaking
0.71
doubted
0.67
admit
0.67
odox
0.67
surprised
0.66
Activations Density 0.034%