INDEX
    Explanations

    phrases related to social interactions and relationships

    New Auto-Interp
    Negative Logits
    ofire
    -0.15
     prive
    -0.14
    ynet
    -0.13
    gamber
    -0.13
     Britt
    -0.13
    _mD
    -0.13
    ovatel
    -0.12
    adele
    -0.12
    ovice
    -0.12
     exh
    -0.12
    POSITIVE LOGITS
     get
    0.18
     done
    0.16
    rix
    0.15
     went
    0.15
     going
    0.15
     stay
    0.15
    going
    0.14
     doing
    0.14
     gone
    0.14
    doing
    0.14
    Act Density 0.623%

    No Known Activations