INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    í
    0.52
     are
    0.49
    goers
    0.46
     be
    0.45
    ों
    0.44
    ні
    0.44
     for
    0.43
     presente
    0.42
    s
    0.42
    ،
    0.42
    POSITIVE LOGITS
    .
    0.74
    9
    0.73
    Е
    0.64
    0.64
    0.63
    0.61
    0.61
    ない
    0.58
    Р
    0.58
    В
    0.57
    Act Density 0.099%

    No Known Activations