INDEX
Explanations
phrases related to feedback or evaluation
expressions of frustration or dissatisfaction
New Auto-Interp
Negative Logits
cedented
-0.61
imet
-0.56
unlawfully
-0.56
Enlarge
-0.56
unlawful
-0.53
ridor
-0.53
\":
-0.53
interstitial
-0.52
ultraviolet
-0.51
jointly
-0.51
POSITIVE LOGITS
honestly
0.97
anyways
0.90
Anyway
0.85
however
0.82
admittedly
0.81
frankly
0.78
Anyway
0.77
anyway
0.75
tho
0.74
pity
0.73
Activations Density 1.008%