INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Surprise
    -0.07
    _code
    -0.07
     Progress
    -0.06
     جر
    -0.06
    imit
    -0.06
    iger
    -0.06
     consumer
    -0.06
     Nord
    -0.06
    _age
    -0.06
     Channel
    -0.06
    POSITIVE LOGITS
    убли
    0.07
     chúng
    0.07
     Eudicots
    0.06
    _HC
    0.06
    νονται
    0.06
     план
    0.06
    0.06
    0.06
    -stars
    0.06
     trúc
    0.06
    Act Density 0.012%

    No Known Activations