INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     untouched
    -0.09
     limiting
    -0.08
     retain
    -0.08
     bedr
    -0.08
     retaining
    -0.08
    oires
    -0.08
     teint
    -0.07
     consult
    -0.07
    天然
    -0.07
    Merged
    -0.07
    POSITIVE LOGITS
     Moc
    0.08
     premise
    0.08
     vcs
    0.08
    和值
    0.08
    idzi
    0.08
     premises
    0.07
     правила
    0.07
    0.07
     Mits
    0.07
     Character
    0.07
    Act Density 0.020%

    No Known Activations