INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Beam
    -0.07
    _best
    -0.06
     Thinking
    -0.06
    Nh
    -0.06
    bounding
    -0.06
    member
    -0.06
     contiguous
    -0.06
    _text
    -0.06
    Ro
    -0.06
    ière
    -0.06
    POSITIVE LOGITS
     spoilers
    0.07
    ϊ
    0.07
    0.07
     Cyr
    0.07
     чаще
    0.07
    .Inst
    0.06
    ्तव
    0.06
    _CONFIG
    0.06
    终于
    0.06
    .');
    0.06
    Act Density 0.005%

    No Known Activations