INDEX
    Explanations

    references to going out or participating in social activities

    New Auto-Interp
    Negative Logits
     off
    -0.17
    chr
    -0.17
     ing
    -0.15
    plex
    -0.15
    osc
    -0.15
    ноз
    -0.15
     Chr
    -0.14
    ove
    -0.14
     go
    -0.14
     inst
    -0.14
    POSITIVE LOGITS
    wards
    0.21
    doors
    0.18
    SIDE
    0.17
    Svc
    0.16
    Into
    0.16
    placement
    0.15
     Wass
    0.15
    á»IJ
    0.15
    кÑĢаÑĹ
    0.15
    ITTE
    0.15
    Act Density 0.043%

    No Known Activations