INDEX
    Explanations

    references to friendship and social relationships

    New Auto-Interp
    Negative Logits
    eria
    -0.17
    erse
    -0.16
     Himself
    -0.15
     herself
    -0.15
    urve
    -0.15
    ãĤ¤ãĤ¯
    -0.15
     himself
    -0.15
    alth
    -0.14
     themselves
    -0.14
    benh
    -0.14
    POSITIVE LOGITS
    lier
    0.40
    liness
    0.33
    liest
    0.32
    /ac
    0.32
    lies
    0.27
     whom
    0.26
    /lo
    0.25
     circle
    0.25
    /f
    0.25
    ships
    0.25
    Act Density 0.075%

    No Known Activations