INDEX
Explanations
phrases indicating dissatisfaction or issues with experiences
New Auto-Interp
Negative Logits
nty
-0.15
εβ
-0.14
iyon
-0.14
unik
-0.13
iminal
-0.13
nts
-0.13
ubl
-0.13
dokonce
-0.13
uien
-0.13
eor
-0.13
POSITIVE LOGITS
very
0.57
much
0.50
too
0.47
very
0.43
TOO
0.43
nearly
0.42
terribly
0.41
molto
0.40
much
0.39
muito
0.38
Activations Density 0.174%