INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     pseud
    0.53
     delim
    0.50
     delineate
    0.50
     bed
    0.49
     sufr
    0.49
     тяжё
    0.48
     pooch
    0.48
     хирур
    0.47
     preclude
    0.46
     refin
    0.46
    POSITIVE LOGITS
    L
    0.70
    M
    0.57
     la
    0.57
    O
    0.56
    K
    0.55
    France
    0.55
    \%,
    0.53
    R
    0.53
    S
    0.53
    B
    0.52
    Act Density 0.023%

    No Known Activations