INDEX
Explanations
statements emphasizing positivity and gratitude
New Auto-Interp
Negative Logits
.generated
-0.15
bomb
-0.15
bol
-0.14
bol
-0.14
584
-0.13
itou
-0.13
linker
-0.13
undeniable
-0.13
upe
-0.13
dominated
-0.13
POSITIVE LOGITS
why
0.26
what
0.23
something
0.22
why
0.22
exactly
0.20
something
0.19
what
0.19
Characteristic
0.18
characteristic
0.18
precisely
0.17
Activations Density 0.121%