INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    iaz
    -0.18
    ''"
    -0.16
     eldre
    -0.15
    edula
    -0.14
    atab
    -0.14
     Katz
    -0.14
    ickey
    -0.14
    بÙĪØ§Ø³Ø·Ø©
    -0.14
    licer
    -0.14
    atalog
    -0.14
    POSITIVE LOGITS
     prec
    0.16
    wind
    0.16
    æĩ
    0.15
    ırak
    0.14
    DC
    0.14
    avin
    0.14
    CHA
    0.13
    loat
    0.13
    ment
    0.13
     RMS
    0.13
    Act Density 0.001%

    No Known Activations