INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     the
    -1.18
     their
    -0.82
     its
    -0.73
     his
    -0.73
     a
    -0.71
     either
    -0.70
     our
    -0.68
     some
    -0.67
     your
    -0.67
     any
    -0.67
    POSITIVE LOGITS
    <bos>
    1.06
     estimés
    0.80
     تانيه
    0.78
    expandindo
    0.76
     يتيمه
    0.75
    0.75
     principaux
    0.72
    NameInMap
    0.71
    rungsseite
    0.69
    RegressionTest
    0.68
    Act Density 0.063%

    No Known Activations