INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     pint
    -0.07
    "That
    -0.06
    rstrip
    -0.06
     dozen
    -0.06
     erotisch
    -0.06
     což
    -0.06
     Whatever
    -0.06
     Coming
    -0.06
     사무
    -0.06
     zlat
    -0.06
    POSITIVE LOGITS
     berg
    0.07
    _properties
    0.07
    emple
    0.07
    -vars
    0.06
    ;a
    0.06
    -single
    0.06
     giriş
    0.06
    _definitions
    0.06
    .disabled
    0.06
    0.06
    Act Density 0.008%

    No Known Activations