INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Sinne
    0.55
    νια
    0.53
     necessario
    0.50
    Lo
    0.49
    \")
    0.49
    Quin
    0.49
    א
    0.48
     beragam
    0.48
    ')"
    0.47
    اح
    0.47
    POSITIVE LOGITS
    re
    0.82
    l
    0.74
    ra
    0.73
    m
    0.73
    magazine
    0.72
    ur
    0.70
    ли
    0.69
    g
    0.68
     Magazines
    0.67
     de
    0.66
    Act Density 0.005%

    No Known Activations