INDEX
Explanations
instances of honesty and expressions of personal opinion
New Auto-Interp
Negative Logits
AppBundle
-0.62
=’
-0.60
PMailer
-0.58
aguya
-0.57
Teach
-0.56
kanya
-0.56
Poehler
-0.55
émie
-0.54
домо
-0.54
Villiers
-0.54
POSITIVE LOGITS
Honestly
0.85
Frankly
0.85
honestly
0.83
frankly
0.82
Honestly
0.79
honestly
0.75
disambiguazione
0.70
ngl
0.69
though
0.66
说实话
0.64
Activations Density 0.131%