INDEX
    Explanations

    statements of identity or descriptions

    New Auto-Interp
    Negative Logits
    osh
    -0.16
    340
    -0.16
    oldt
    -0.15
    lessly
    -0.14
    629
    -0.14
    (IC
    -0.14
    oking
    -0.13
    oyal
    -0.13
    541
    -0.13
     Latter
    -0.13
    POSITIVE LOGITS
     Leban
    0.14
     داÙħ
    0.14
    pub
    0.14
    args
    0.14
     Rin
    0.14
    inces
    0.14
    yre
    0.13
    jde
    0.13
    ision
    0.13
    Ñģе
    0.13
    Act Density 0.225%

    No Known Activations