INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ע
    0.48
    ב
    0.46
    ש
    0.46
    İ
    0.45
    wilderness
    0.45
    లి
    0.44
    פ
    0.44
    0.44
    ם
    0.43
    די
    0.43
    POSITIVE LOGITS
     }}^{\
    0.52
    gF
    0.45
     Kyi
    0.45
    .’’
    0.42
     Paglinawan
    0.42
     maneras
    0.41
    下一步
    0.41
    yo
    0.40
     ml
    0.39
    }}}{
    0.39
    Act Density 0.006%

    No Known Activations