INDEX
    Explanations

    mentions of personal relationships and social interactions

    New Auto-Interp
    Negative Logits
     establishment
    -0.16
    icie
    -0.16
    zia
    -0.16
    uve
    -0.15
    avana
    -0.15
     shocked
    -0.15
    ìĥģ
    -0.15
    sak
    -0.15
    432
    -0.15
     astonished
    -0.14
    POSITIVE LOGITS
     thinking
    0.19
    éİ®
    0.16
    847
    0.15
    ãĥ³ãĥĸ
    0.15
    thinking
    0.15
    zano
    0.14
    aland
    0.14
    started
    0.14
     started
    0.14
     involved
    0.14
    Act Density 0.053%

    No Known Activations