INDEX
    Explanations

    the presence of specific confirmation or acknowledgment phrases in various contexts

    New Auto-Interp
    Negative Logits
     neceſſ
    -0.52
     deleteUser
    -0.52
     getSize
    -0.51
    ſelf
    -0.48
     houſe
    -0.46
    uxxxx
    -0.45
     pleaſure
    -0.45
     myſelf
    -0.44
    Figure
    -0.44
     nettsted
    -0.44
    POSITIVE LOGITS
     от
    1.60
     від
    1.32
    От
    1.04
     От
    1.02
     from
    0.94
    от
    0.91
    Від
    0.88
     Від
    0.82
     từ
    0.82
     od
    0.82
    Act Density 0.001%

    No Known Activations