INDEX
    Explanations

    phrases indicating preferences, desires, or intentions

    New Auto-Interp
    Negative Logits
    ione
    -0.16
    inos
    -0.14
     form
    -0.14
    iday
    -0.14
    íıŃ
    -0.14
    cid
    -0.13
    ÑĢава
    -0.13
    imson
    -0.13
    831
    -0.13
    Visibility
    -0.13
    POSITIVE LOGITS
    peÄį
    0.17
    oir
    0.16
    лиÑĤ
    0.15
    uge
    0.15
    reau
    0.14
    íĥĢ
    0.14
    Labour
    0.14
    orr
    0.14
     Saunders
    0.14
    _CHILD
    0.14
    Act Density 0.363%

    No Known Activations