INDEX
Explanations
phrases indicating a mixture of positive and negative experiences or evaluations
New Auto-Interp
Negative Logits
以ä¸Ĭ
-0.14
sik
-0.14
jvu
-0.14
ALWAYS
-0.14
ATEST
-0.13
олаг
-0.13
NOW
-0.13
rát
-0.13
аÑĢам
-0.13
.Override
-0.13
POSITIVE LOGITS
pretty
0.59
quite
0.54
pretty
0.51
Pretty
0.45
quite
0.43
Pretty
0.43
very
0.42
fairly
0.40
rather
0.39
Quite
0.39
Activations Density 0.887%