INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    0
    1.77
           
    1.67
    iv
    1.62
    8
    1.56
    9
    1.53
    istiche
    1.46
    imoto
    1.45
    $('#
    1.41
            
    1.38
    imuth
    1.37
    POSITIVE LOGITS
    ל
    1.74
    1.69
    1.69
     šte
    1.61
     Од
    1.60
    1.60
     Meu
    1.55
     także
    1.54
     Ар
    1.53
    𝙊
    1.52
    Act Density 0.001%

    No Known Activations