INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    PBS
    -0.07
     PBS
    -0.07
    32
    -0.06
     arrog
    -0.06
     equip
    -0.06
     +=↵
    -0.06
     bona
    -0.06
     نع
    -0.06
    -0.06
    _Window
    -0.06
    POSITIVE LOGITS
     interested
    0.08
     ulaş
    0.08
    ental
    0.08
     transformer
    0.07
    -induced
    0.07
     procent
    0.07
     adolescence
    0.07
     연구
    0.07
    ült
    0.06
    ное
    0.06
    Act Density 0.012%

    No Known Activations