INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     кор
    0.55
    로우
    0.55
    که
    0.54
    하다
    0.54
     vont
    0.54
    خ
    0.54
     específicas
    0.54
    0.54
     леп
    0.53
     gravel
    0.53
    POSITIVE LOGITS
     over
    0.80
    ेद
    0.77
    over
    0.73
     across
    0.61
    ap
    0.60
     It
    0.60
    Type
    0.58
     interact
    0.58
    𝐞
    0.58
     can
    0.58
    Act Density 0.001%

    No Known Activations