INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    0.50
    "
    0.48
    '
    0.48
    0.39
    0.34
    0.34
    0.34
     emol
    0.32
    -
    0.32
    ות
    0.31
    POSITIVE LOGITS
     
    0.49
     a
    0.37
    arı
    0.36
     is
    0.34
    łoż
    0.33
    ık
    0.33
    اي
    0.33
     dość
    0.33
    ht
    0.32
    0.32
    Act Density 0.379%

    No Known Activations