INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ون
    0.44
    0.44
    an
    0.39
    ла
    0.36
    f
    0.35
    ע
    0.34
    و
    0.34
    ர்
    0.33
    ро
    0.33
    وست
    0.33
    POSITIVE LOGITS
    I
    0.42
     be
    0.35
    D
    0.35
    EN
    0.33
     I
    0.32
     políticos
    0.31
     t
    0.31
     was
    0.30
    {
    0.30
    Y
    0.29
    Act Density 0.319%

    No Known Activations