INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ل
    0.95
    er
    0.90
    d
    0.83
    ol
    0.82
    ي
    0.81
    us
    0.77
    0.74
    0.73
    as
    0.73
    l
    0.73
    POSITIVE LOGITS
    .
    0.68
     Odinga
    0.63
    ]
    0.59
    0.59
     repaso
    0.58
    }
    0.58
    ()}
    0.57
    (
    0.55
     ataque
    0.54
     I
    0.54
    Act Density 0.063%

    No Known Activations