INDEX
Explanations
expressions of gratitude and positive experiences
New Auto-Interp
Negative Logits
wis
-0.15
WTF
-0.15
शà¤ķ
-0.14
iq
-0.14
ấn
-0.14
impress
-0.14
uko
-0.14
Gest
-0.13
.names
-0.13
vys
-0.13
POSITIVE LOGITS
Fab
0.34
Fab
0.32
fab
0.31
Wonder
0.29
great
0.28
wonderful
0.26
grand
0.25
wonder
0.25
fab
0.24
Wonderful
0.24
Activations Density 0.228%