INDEX
Explanations
phrases related to having fun and positive experiences
New Auto-Interp
Negative Logits
arl
-0.15
zg
-0.15
orz
-0.14
æ²ī
-0.14
eneg
-0.13
zd
-0.13
chluss
-0.13
ÑģÑĤоÑı
-0.12
DED
-0.12
orgh
-0.12
POSITIVE LOGITS
fun
0.42
FUN
0.33
Fun
0.32
conversations
0.31
fun
0.30
Fun
0.29
discussions
0.28
sex
0.27
dinner
0.26
lunch
0.26
Activations Density 0.167%