INDEX
    Explanations

    quantities and descriptions of objects or features

    New Auto-Interp
    Negative Logits
     Hlav
    -0.16
     дело
    -0.15
    iddet
    -0.15
     scal
    -0.15
    atoire
    -0.14
    .scal
    -0.14
    athed
    -0.14
     thing
    -0.14
    pek
    -0.14
    born
    -0.13
    POSITIVE LOGITS
    ãĥ£
    0.17
    attles
    0.15
    oyer
    0.14
    fout
    0.14
    ugu
    0.13
    »
    0.13
    -mf
    0.13
    ormsg
    0.13
    ãĥ¥
    0.13
    Ìĥ
    0.13
    Act Density 0.119%

    No Known Activations