INDEX
Explanations
expressions of wonder and appreciation in various contexts
New Auto-Interp
Negative Logits
nice
-0.26
nice
-0.19
Nice
-0.19
interesting
-0.19
Ñĥда
-0.18
pleasant
-0.18
fine
-0.17
interesting
-0.17
attractive
-0.17
Nice
-0.17
POSITIVE LOGITS
simply
0.27
jaw
0.26
beyond
0.25
phen
0.24
Amazing
0.23
mind
0.23
Phen
0.23
jaw
0.23
phen
0.23
incredible
0.23
Activations Density 0.307%