INDEX
    Explanations

    phrases indicating control and manipulation

    New Auto-Interp
    Negative Logits
     Cop
    -0.17
     kop
    -0.15
    opy
    -0.15
    ÑĥÑĢи
    -0.15
    ARAM
    -0.14
    Cop
    -0.14
     Freund
    -0.14
     Pink
    -0.14
    çı
    -0.14
    ensen
    -0.13
    POSITIVE LOGITS
     Comments
    0.17
     Comment
    0.17
    .Comment
    0.17
    umas
    0.16
     komment
    0.16
     COMMENTS
    0.15
     commenting
    0.15
    fifo
    0.15
     comments
    0.15
     comment
    0.15
    Act Density 0.017%

    No Known Activations