INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    in
    0.88
    A
    0.67
    ha
    0.66
    b
    0.65
    You
    0.65
    u
    0.65
    g
    0.65
    inase
    0.63
    These
    0.63
    w
    0.62
    POSITIVE LOGITS
    ك
    0.78
    ૈય
    0.65
     funktionieren
    0.65
     антенна
    0.65
     vattati
    0.64
    <unused2164>
    0.63
     mantener
    0.63
     magnifique
    0.63
     Spielen
    0.62
     ambiente
    0.62
    Act Density 0.001%

    No Known Activations