INDEX
Explanations
phrases indicating uncertainty or the nature of existence
New Auto-Interp
Negative Logits
бÑĥдÑĮ
-0.17
ợ
-0.15
happiest
-0.14
happy
-0.14
igor
-0.13
تÙĩا
-0.13
å»·
-0.13
#'
-0.13
Nicol
-0.13
शन
-0.13
POSITIVE LOGITS
very
0.24
really
0.20
quite
0.19
very
0.16
indeed
0.16
like
0.15
sehr
0.15
really
0.15
Very
0.15
Very
0.14
Activations Density 0.099%