INDEX
    Explanations

    phrases related to having fun and positive experiences

    New Auto-Interp
    Negative Logits
    arl
    -0.15
     zg
    -0.15
    orz
    -0.14
    æ²ī
    -0.14
    eneg
    -0.13
    zd
    -0.13
    chluss
    -0.13
    ÑģÑĤоÑı
    -0.12
    DED
    -0.12
    orgh
    -0.12
    POSITIVE LOGITS
     fun
    0.42
     FUN
    0.33
     Fun
    0.32
     conversations
    0.31
    fun
    0.30
    Fun
    0.29
     discussions
    0.28
     sex
    0.27
     dinner
    0.26
     lunch
    0.26
    Act Density 0.167%

    No Known Activations