INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ker
    -0.06
     instruct
    -0.06
     confidence
    -0.06
    -0.06
     Clipboard
    -0.05
     conceivable
    -0.05
     Lite
    -0.05
    Caption
    -0.05
     AD
    -0.05
     ít
    -0.05
    POSITIVE LOGITS
     swirling
    0.07
    _power
    0.07
    ครอบ
    0.07
     hookup
    0.07
     ['#
    0.07
    -linear
    0.06
    -total
    0.06
     []:↵
    0.06
     glyphs
    0.06
    وزی
    0.06
    Act Density 0.029%

    No Known Activations