INDEX
Explanations
words expressing strong opinions or evaluations
New Auto-Interp
Negative Logits
ymb
-0.17
Rank
-0.15
âr
-0.15
invit
-0.15
945
-0.14
änn
-0.14
oga
-0.14
ør
-0.14
Calendar
-0.14
uppy
-0.14
POSITIVE LOGITS
words
0.18
ighton
0.16
_words
0.15
WORD
0.14
Thema
0.14
_WORDS
0.14
adeon
0.14
word
0.14
ÎĨ
0.14
words
0.14
Activations Density 0.068%