INDEX
    Explanations

    state or quality described

    New Auto-Interp
    Negative Logits
     Christ
    -0.06
    ones
    -0.06
     glean
    -0.06
     skirt
    -0.06
    рин
    -0.06
    ertas
    -0.06
     metavar
    -0.06
    /gin
    -0.06
     Him
    -0.06
    rah
    -0.06
    POSITIVE LOGITS
    0.07
     uw
    0.06
     випадку
    0.06
    shift
    0.06
    ٫
    0.06
     sposób
    0.06
     streamlined
    0.06
    creator
    0.06
    _Impl
    0.06
     stě
    0.06
    Act Density 0.047%

    No Known Activations