INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     the
    -1.62
     a
    -1.36
     an
    -1.31
     both
    -1.05
     all
    -1.05
     some
    -1.05
     either
    -1.03
     another
    -1.02
     its
    -0.99
     even
    -0.97
    POSITIVE LOGITS
    <bos>
    2.03
    a
    0.96
    '
    0.95
    e
    0.91
    i
    0.88
    A
    0.86
    p
    0.84
     nakalista
    0.83
    o
    0.83
    ا
    0.83
    Act Density 1.167%

    No Known Activations