INDEX
    Explanations

    references to claims and findings that question the validity of information

    New Auto-Interp
    Negative Logits
    ogen
    -0.15
    peq
    -0.15
    Äı
    -0.15
     æ©
    -0.14
    llen
    -0.14
    ouble
    -0.14
     sprav
    -0.14
    alus
    -0.14
    ellt
    -0.13
    ç©į
    -0.13
    POSITIVE LOGITS
     made
    0.30
    made
    0.28
    Made
    0.25
     Made
    0.25
     about
    0.25
    -made
    0.21
     MADE
    0.20
    about
    0.18
     regarding
    0.18
    åħ³äºİ
    0.17
    Act Density 0.176%

    No Known Activations