INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    apo
    -0.17
    isé
    -0.16
    if
    -0.15
    itz
    -0.15
    it
    -0.14
    ãĥ³ãĥij
    -0.14
     Extension
    -0.14
     Fro
    -0.14
    afort
    -0.14
    extension
    -0.13
    POSITIVE LOGITS
    teen
    0.16
    evice
    0.16
    .gl
    0.15
    _reporting
    0.15
    gether
    0.15
    ัà¸Ļà¸ĺ
    0.15
    ogh
    0.15
    sume
    0.15
    entina
    0.14
    ìĮ
    0.14
    Act Density 0.009%

    No Known Activations