INDEX
    Explanations

    words indicating measurements or quantities

    New Auto-Interp
    Negative Logits
    tees
    -0.18
    ingly
    -0.17
    esz
    -0.16
    itious
    -0.16
    اÛĮÙĩ
    -0.15
    ties
    -0.15
    esine
    -0.15
    432
    -0.15
    259
    -0.14
    íĭ±
    -0.14
    POSITIVE LOGITS
    erva
    0.16
    ney
    0.16
    ning
    0.16
    net
    0.15
    ner
    0.15
    _attached
    0.14
    eral
    0.14
    Ģ
    0.14
    ninger
    0.14
    nie
    0.14
    Act Density 0.049%

    No Known Activations