INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    v
    0.63
    b
    0.62
     
    0.57
     United
    0.54
     American
    0.53
    eresa
    0.53
     It
    0.52
     Americans
    0.52
    eye
    0.52
     !
    0.51
    POSITIVE LOGITS
    łki
    0.59
     forgo
    0.53
     stér
    0.52
    錯誤
    0.52
    0.52
    -__
    0.52
    гу
    0.51
    BSITE
    0.51
    গ্রাফ
    0.50
     fortr
    0.50
    Act Density 0.001%

    No Known Activations